Abstract
We study the problem of classifying images into a given, pre-determined taxonomy. This task can be elegantly translated into the structured learning framework. However, despite its power, structured learning has known limits in scalability due to its high memory requirements and slow training process. We propose an efficient approximation of the structured learning approach by an ensemble of local support vector machines (SVMs) that can be trained efficiently with standard techniques. A first theoretical discussion and experiments on toy-data allow to shed light onto why taxonomy-based classification can outperform taxonomy-free approaches and why an appropriately combined ensemble of local SVMs might be of high practical use. Further empirical results on subsets of Caltech256 and VOC2006 data indeed show that our local SVM formulation can effectively exploit the taxonomy structure and thus outperforms standard multi-class classification algorithms while it achieves on par results with taxonomy-based structured algorithms at a significantly decreased computing time.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
Blaschko, M. B., & Gretton, A. (2009). Learning taxonomies by dependence maximization. In Advances in neural information processing systems.
Bosch, A. (2007). Image classification for a large number of object categories. Ph.D. thesis, University of Girona.
Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the conference on information and knowledge management.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. In Machine Learning (pp. 273–297).
Dollár, P., Babenko, B., Belongie, S. J., Perona, P., & Tu, Z. (2008). Multiple component learning for object detection. In ECCV (pp. 211–224).
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (voc2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (voc2009) results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
Fan, X. (2005). Efficient multiclass object detection by a hierarchy of classifiers. In CVPR (pp. 716–723).
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. A. (2009). Describing objects by their attributes. In CVPR (pp. 1778–1785).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(1).
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.
Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
Griffin, G., & Perona, P. (2008). Learning and using taxonomies for fast visual categorization. In IEEE conference on computer vision and pattern recognition (CVPR).
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset (Technical Report 7694). California Institute of Technology.
Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification for multi–class classification and ranking. In Advances in neural information processing systems.
Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press.
Kishida, K. (2005). Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments (Technical report). National Institute of Informatics, Japan.
Lafferty, J., Zhu, X., & Liu, Y. (2004). Kernel conditional random fields: representation and clique selection. In Proceedings of the international conference on machine learning.
Lampert, C. H., & Blaschko, M. B. (2008). A multiple kernel learning approach to joint multi-class object detection. In Proceedings of the 30th DAGM symposium on pattern recognition.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR (pp. 951–958).
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 2169–2178). New York, USA.
Lowe, D. (2004). Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Marszalek, M., & Schmid, C. (2007). Semantic hierarchies for visual object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Marszalek, M., & Schmid, C. (2008). Constructing category hierarchies for visual recognition. In Proceedings of the European conference on computer vision.
Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.
Müller, K. R., Mika, S., Rätsch, G., Tsuda, S., & Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–202.
Ommer, B., & Buhmann, J. M. (2010). Learning the compositional nature of visual object categories for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 501–516.
Ommer, B., Sauter, M., & Buhmann, J. M. (2006). Learning top-down grouping of compositional hierarchies for recognition. In CVPRW’06: proceedings of the 2006 conference on computer vision and pattern recognition workshop (p. 194), Washington, DC, USA. Los Alamitos: IEEE Comput. Soc.
Platt, J. (1999). In Probabilistic outputs for support vector machine and comparison to regularized likelihood methods.
Qi, G. J., Hur, X. S., & Zhang, H. J. (2009). Learning semantic distance from community-tagged media collection. In MM’09: proceedings of the seventeen ACM international conference on Multimedia (pp. 243–252).
Schölkopf, B., & Smola, A. J. (2001). Learning with Kernels: support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. Cambridge: MIT Press.
Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In IEEE conference on computer vision (ICCV’09).
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.
Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., de Bona, F., Binder, A., Gehl, C., & Franc, V. (2010). The SHOGUN machine learning toolbox. Journal of Machine Learning Research, 11, 1799–1802.
Tahir, M., van de Sande, K., Uijlings, J., Yan, F., Li, X., Mikolajczyk, K., Kittler, J., Gevers, T., & Smeulders, A. (2008). SurreyUVA SRKDA method. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/workshop/tahir.pdf.
Taskar, B., Guestrin, C., & Koller, D. (2004). Max–margin Markov networks. In Advances in neural information processing systems.
Tibshirani, R., & Hastie, T. (2007). Margin trees for high-dimensional classification. JMLR, 8, 637–652.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2009.154.
Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In ESANN (pp. 219–224).
Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of IEEE conference on computer vision and pattern recognition, IEEE (pp. 1–8).
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Zweig, A., & Weinshall, D. (2007). Exploiting object hierarchy: combining models from different category levels. In ICCV (pp. 1–8).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Binder, A., Müller, KR. & Kawanabe, M. On Taxonomies for Multi-class Image Categorization. Int J Comput Vis 99, 281–301 (2012). https://doi.org/10.1007/s11263-010-0417-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0417-8