Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Li, Ran; Chen, Guangliang

doi:10.1007/978-3-031-49018-7_50

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14469))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

424 Accesses

Abstract

Spectral clustering is a popular and effective method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by assuming a small batch of data drawn from the full set and develop an efficient procedure that learns both the nonlinear embedding and clustering map from the sample and extends them easily to the rest of the data as they are gradually loaded. We then introduce an automatic approach to selecting the optimal value of the sample size. The combination of the two steps leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. Experiments are conducted on benchmark data sets to demonstrate the fast speed and excellent accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.

The authors thank the anonymous reviewers for careful reviews and useful feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 82.38; Price includes VAT (Germany)

Softcover Book: EUR 104.85; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LiteWSC: A Lightweight Framework for Web-Scale Spectral Clustering

A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity

Spectral Clustering Trough Topological Learning for Large Datasets

Notes

1.
When both conditions are violated, one can apply principal component analysis (PCA) to reduce the dimensionality of the data such that the first condition is met.
2.
To compute this percentage, we need to find the best map between the output labels and the original labels. This is done by using the Kuhn-Munkres algorithm as in [1].
3.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
4.
Available at http://qwone.com/~jason/20Newsgroups/; we also used the bydate version.
5.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

Cai, D., Chen, X.: Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45(8), 1669–1680 (2015)
Article Google Scholar
Chen, G.: Scalable spectral clustering with cosine similarity. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Bei**g, China (2018)
Google Scholar
Chen, G.: A general framework for scalable spectral clustering based on document models. Pattern Recogn. Lett. 125, 488–493 (2019)
Article Google Scholar
Chen, G., Lerman, G.: Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Found. Comput. Math. (2009). https://doi.org/10.1007/s10208-009-9043-7
Article MathSciNet MATH Google Scholar
Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) Algorithmic Learning Theory, pp. 367–381. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40935-6_26
Chapter Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Dissimilarity and distance measures for continuous data, pp. 51–52. Wiley, Boston, MA (2011)
Google Scholar
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grou** using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Google Scholar
Huang, D., Wang, C.D., Wu, J.S., Lai, J., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. (TKDE) 32, 1212–1226 (2020)
Article Google Scholar
Li, M., Lian, X.C., Kwok, J.T., Lu, B.L.: Time and space efficient spectral clustering via column sampling. In: CVPR 2011, pp. 2297–2304 (2011). https://doi.org/10.1109/CVPR.2011.5995425
Meila, M., Shi, J.: A random walks view of spectral segmentation. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (2001)
Google Scholar
Moazzen, Y., Tasdemir, K.: Sampling based approximate spectral clustering ensemble for partitioning data sets. In: Proceedings of the 23rd International Conference on Pattern Recognition (2016)
Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14, pp. 849–856 (2001)
Google Scholar
Pham, K., Chen, G.: Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs. In: Proceedings of the 12th Workshop on Graph-based Natural Language Processing (TextGraphs-12), pp. 28–37. Association for Computational Linguistics (2018)
Google Scholar
Sakai, T., Imiya, A.: Fast spectral clustering with random projection and sampling. In: Perner, P. (ed.) MLDM 2009. LNCS (LNAI), vol. 5632, pp. 372–384. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03070-3_28
Chapter Google Scholar
Shaham, U., Stanton, K., Li, H., Basri, R., Nadler, B., Kluger, Y.: Spectralnet: spectral clustering using deep neural networks. In: International Conference on Learning Representations (2018)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Tasdemir, K.: Vector quantization based approximate spectral clustering of large datasets. Pattern Recogn. 45(8), 3034–3044 (2012)
Article Google Scholar
Wang, L., Leckie, C., Kotagiri, R., Bezdek, J.: Approximate pairwise clustering for large data sets via sampling plus extension. Pattern Recogn. 44, 222–235 (2011)
Article Google Scholar
Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Approximate spectral clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 134–146. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_15
Chapter Google Scholar
Yan, D., Huang, L., Jordan, M.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

San José State University, California, USA
Ran Li
Hope College, Holland, MI, USA
Guangliang Chen

Authors

Ran Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangliang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Li .

Editor information

Editors and Affiliations

Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Coimbra, Portugal
Verónica Vasconcelos
Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Coimbra, Portugal
Inês Domingues
Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Coimbra, Portugal
Simão Paredes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, R., Chen, G. (2024). Fast, Memory-Efficient Spectral Clustering with Cosine Similarity. In: Vasconcelos, V., Domingues, I., Paredes, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2023. Lecture Notes in Computer Science, vol 14469. Springer, Cham. https://doi.org/10.1007/978-3-031-49018-7_50

Download citation

DOI: https://doi.org/10.1007/978-3-031-49018-7_50
Published: 27 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49017-0
Online ISBN: 978-3-031-49018-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LiteWSC: A Lightweight Framework for Web-Scale Spectral Clustering

A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity

Spectral Clustering Trough Topological Learning for Large Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LiteWSC: A Lightweight Framework for Web-Scale Spectral Clustering

A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity

Spectral Clustering Trough Topological Learning for Large Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation