On Vocabulary Size in Bag-of-Visual-Words Representation

Hou, Jian; Kang, Jianxin; Qi, Naiming

doi:10.1007/978-3-642-15702-8_38

Jian Hou²²,
Jianxin Kang^22,23 &
Naiming Qi²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6297))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1587 Accesses
5 Citations

Abstract

Bag-of-visual-words is a popular image representation that produces high matching accuracy and efficiency. While vocabulary size impacts on matching accuracy, existing research usually selects the vocabulary size empirically. Research on representative local descriptors shows that with similarity based clustering, the intra-cluster similarity extent of descriptors plays the same role in straightforward matching as vocabulary size in visual words matching. Based on this observation, we propose to use similarity based clustering to determine the optimal vocabulary size for a given dataset in visual words matching. Preliminary experiments with three datasets produce encouraging results and demonstrate the potential of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Novel Visual Word Assignment Model for Content-Based Image Retrieval

Robust Visual Vocabulary Based On Grid Clustering

Partitioned K-Means Clustering for Fast Construction of Unbiased Visual Vocabulary

References

Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Deselaers, T., Keysers, D., Ney, H.: Features for Image Retrieval: an Experimental Comparison. Inf. Retr. 11(2), 77–107 (2008)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Machine Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: a More Distinctive Representation for Local Image Descriptors. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 511–517. IEEE Press, New York (2004)
Google Scholar
Brown, M., Szeliski, R., Winder, S.: Multi-Image Matching Using Multi-Scale Oriented Patches. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 510–517. IEEE Press, New York (2005)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust Wide-Baseline Stereo from Maximally Stable Extremal Regions. In: 13th British Machine Vision Conference, vol. 1, pp. 384–393. British Machine Vision Association, London (2002)
Google Scholar
Tuytelaars, T., Gool, L.V.: Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions. In: 11th British Machine Vision Conference, pp. 412–425. British Machine Vision Association, London (2000)
Google Scholar
Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)
Chapter Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A Comparison of Affine Region Detectors. Int. J. Comput. Vis. 65(1-2), 43–72 (2006)
Article Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Machine Intell. 24(4), 509–522 (2002)
Article Google Scholar
Gool, L.V., Moons, T., Ungureanu, D.: Affine/Photometric Invariants for Planar Intensity Patterns. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1064, pp. 228–241. Springer, Heidelberg (1996)
Chapter Google Scholar
Freeman, W.T., Adelson, E.H.: The Design and Use of Steerable Filters. IEEE Trans. Pattern Anal. Machine Intell. 13(9), 891–906 (1991)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Sparse Texture Representation Using Affine-Invariant Neighborhoods. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 319–324. IEEE Press, New York (2003)
Google Scholar
Zhang, W., Kosecka, J.: Hierarchical Building Recognition. Image Vis. Comput. 26(5), 704–716 (2007)
Article Google Scholar
Sivic, J., Zisserman, A.: Video Google: a Text Retrieval Approach to Object Matching in Videos. In: 9th IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE Press, New York (2003)
Chapter Google Scholar
Deselaers, T., Keysers, D., Ney, H.: Discriminative Training for Object Recognition Using Image Patches. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 157–162. IEEE Press, New York (2005)
Google Scholar
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple Object Class Detection with a Generative Model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 26–36. IEEE Press, New York (2006)
Google Scholar
Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.W.: Evaluating Bag-of-Visual-Words Representations in Scene Classification. In: 9th ACM SIGMM International workshop on Multimedia Information Retrieval, pp. 197–206. ACM Press, New York (2007)
Chapter Google Scholar
Li, T., Mei, T., Kweon, I.S.: Learning Optimal Compact Codebook for Efficient Object Categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6. IEEE Press, New York (2008)
Google Scholar
Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-Visual-Words Models for Adult Image Classification and Filtering. In: International Conference on Pattern Recognition, pp. 1–4. IAPR, Tampa (2008)
Google Scholar
Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465. IEEE Press, New York (2005)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE Press, New York (2006)
Google Scholar
Nister, D., Stewenius, H.: Scalable Recognition with a Vocabulary Tree. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 2161–2168. IEEE Press, New York (2006)
Google Scholar
Dorko, G., Schmid, C.: Selection of Scale-Invariant Parts for Object Class Recognition. In: 9th IEEE International Conference on Computer Vision, vol. 1, pp. 634–639. IEEE Press, New York (2003)
Chapter Google Scholar
Shao, H., Svoboda, T., Gool, L.V.: ZUBUD-Zurich Building Database for Image Based Recognition. Technical report No. 260, Swiss Federal Institute of Technology (2003)
Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: An in-depth Study. Technical report, INRIA (2003)
Google Scholar
Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point Matching Help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81. ACM Press, New York (2006)
Google Scholar
Hou, J., Qi, N., Kang, J.: Image Matching Based on Representative Local Descriptors. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, Y.-P.P. (eds.) MMM 2010. LNCS, vol. 5916, pp. 303–313. Springer, Heidelberg (2010)
Chapter Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision. IEEE Press, New York (2004)
Google Scholar
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In: ACM International Conference on Image and Video Retrieval, pp. 1–9. ACM Press, New York (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Astronautics, Harbin Institute of Technology, Harbin, China, 150001
Jian Hou, Jianxin Kang & Naiming Qi
School of Engineering, Northeast Agriculture University, Harbin, China, 150030
Jianxin Kang

Authors

Jian Hou
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Kang
View author publications
You can also search for this author in PubMed Google Scholar
Naiming Qi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Nottingham, Jubilee Campus, NG8 1BB, Nottingham, UK
Guo** Qiu
The Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hong Kong, China
Kin Man Lam
Faculty of System Design, Tokyo Metropolitan University, 6-6, Asahigaoka, 191-0065, Hino-city, Tokyo
Hitoshi Kiya
Shanghai Key Laboratory of Intelligent Information Processing, Department of Computer Science & Engineering, Fudan University, Shanghai, China
**ang-Yang Xue
Department of Electrical Engineering, University of Southern California, 90089-2564, Los Angeles, CA
C.-C. Jay Kuo
LIACS Media Lab, Leiden University,
Michael S. Lew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, J., Kang, J., Qi, N. (2010). On Vocabulary Size in Bag-of-Visual-Words Representation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-15702-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Vocabulary Size in Bag-of-Visual-Words Representation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Novel Visual Word Assignment Model for Content-Based Image Retrieval

Robust Visual Vocabulary Based On Grid Clustering

Partitioned K-Means Clustering for Fast Construction of Unbiased Visual Vocabulary

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Vocabulary Size in Bag-of-Visual-Words Representation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Novel Visual Word Assignment Model for Content-Based Image Retrieval

Robust Visual Vocabulary Based On Grid Clustering

Partitioned K-Means Clustering for Fast Construction of Unbiased Visual Vocabulary

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation