Abstract
Bag-of-visual-words is a popular image representation that produces high matching accuracy and efficiency. While vocabulary size impacts on matching accuracy, existing research usually selects the vocabulary size empirically. Research on representative local descriptors shows that with similarity based clustering, the intra-cluster similarity extent of descriptors plays the same role in straightforward matching as vocabulary size in visual words matching. Based on this observation, we propose to use similarity based clustering to determine the optimal vocabulary size for a given dataset in visual words matching. Preliminary experiments with three datasets produce encouraging results and demonstrate the potential of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Deselaers, T., Keysers, D., Ney, H.: Features for Image Retrieval: an Experimental Comparison. Inf. Retr. 11(2), 77–107 (2008)
Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Machine Intell. 27(10), 1615–1630 (2005)
Ke, Y., Sukthankar, R.: PCA-SIFT: a More Distinctive Representation for Local Image Descriptors. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 511–517. IEEE Press, New York (2004)
Brown, M., Szeliski, R., Winder, S.: Multi-Image Matching Using Multi-Scale Oriented Patches. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 510–517. IEEE Press, New York (2005)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust Wide-Baseline Stereo from Maximally Stable Extremal Regions. In: 13th British Machine Vision Conference, vol. 1, pp. 384–393. British Machine Vision Association, London (2002)
Tuytelaars, T., Gool, L.V.: Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions. In: 11th British Machine Vision Conference, pp. 412–425. British Machine Vision Association, London (2000)
Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A Comparison of Affine Region Detectors. Int. J. Comput. Vis. 65(1-2), 43–72 (2006)
Belongie, S., Malik, J., Puzicha, J.: Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Machine Intell. 24(4), 509–522 (2002)
Gool, L.V., Moons, T., Ungureanu, D.: Affine/Photometric Invariants for Planar Intensity Patterns. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1064, pp. 228–241. Springer, Heidelberg (1996)
Freeman, W.T., Adelson, E.H.: The Design and Use of Steerable Filters. IEEE Trans. Pattern Anal. Machine Intell. 13(9), 891–906 (1991)
Lazebnik, S., Schmid, C., Ponce, J.: Sparse Texture Representation Using Affine-Invariant Neighborhoods. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 319–324. IEEE Press, New York (2003)
Zhang, W., Kosecka, J.: Hierarchical Building Recognition. Image Vis. Comput. 26(5), 704–716 (2007)
Sivic, J., Zisserman, A.: Video Google: a Text Retrieval Approach to Object Matching in Videos. In: 9th IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE Press, New York (2003)
Deselaers, T., Keysers, D., Ney, H.: Discriminative Training for Object Recognition Using Image Patches. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 157–162. IEEE Press, New York (2005)
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple Object Class Detection with a Generative Model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 26–36. IEEE Press, New York (2006)
Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.W.: Evaluating Bag-of-Visual-Words Representations in Scene Classification. In: 9th ACM SIGMM International workshop on Multimedia Information Retrieval, pp. 197–206. ACM Press, New York (2007)
Li, T., Mei, T., Kweon, I.S.: Learning Optimal Compact Codebook for Efficient Object Categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6. IEEE Press, New York (2008)
Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-Visual-Words Models for Adult Image Classification and Filtering. In: International Conference on Pattern Recognition, pp. 1–4. IAPR, Tampa (2008)
Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465. IEEE Press, New York (2005)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE Press, New York (2006)
Nister, D., Stewenius, H.: Scalable Recognition with a Vocabulary Tree. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 2161–2168. IEEE Press, New York (2006)
Dorko, G., Schmid, C.: Selection of Scale-Invariant Parts for Object Class Recognition. In: 9th IEEE International Conference on Computer Vision, vol. 1, pp. 634–639. IEEE Press, New York (2003)
Shao, H., Svoboda, T., Gool, L.V.: ZUBUD-Zurich Building Database for Image Based Recognition. Technical report No. 260, Swiss Federal Institute of Technology (2003)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: An in-depth Study. Technical report, INRIA (2003)
Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point Matching Help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81. ACM Press, New York (2006)
Hou, J., Qi, N., Kang, J.: Image Matching Based on Representative Local Descriptors. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, Y.-P.P. (eds.) MMM 2010. LNCS, vol. 5916, pp. 303–313. Springer, Heidelberg (2010)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision. IEEE Press, New York (2004)
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In: ACM International Conference on Image and Video Retrieval, pp. 1–9. ACM Press, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hou, J., Kang, J., Qi, N. (2010). On Vocabulary Size in Bag-of-Visual-Words Representation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-15702-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)