Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

  • Conference paper
  • First Online:
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14469))

Included in the following conference series:

  • 424 Accesses

Abstract

Spectral clustering is a popular and effective method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by assuming a small batch of data drawn from the full set and develop an efficient procedure that learns both the nonlinear embedding and clustering map from the sample and extends them easily to the rest of the data as they are gradually loaded. We then introduce an automatic approach to selecting the optimal value of the sample size. The combination of the two steps leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. Experiments are conducted on benchmark data sets to demonstrate the fast speed and excellent accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.

The authors thank the anonymous reviewers for careful reviews and useful feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 82.38
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 104.85
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    When both conditions are violated, one can apply principal component analysis (PCA) to reduce the dimensionality of the data such that the first condition is met.

  2. 2.

    To compute this percentage, we need to find the best map between the output labels and the original labels. This is done by using the Kuhn-Munkres algorithm as in [1].

  3. 3.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

  4. 4.

    Available at http://qwone.com/~jason/20Newsgroups/; we also used the bydate version.

  5. 5.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

  1. Cai, D., Chen, X.: Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45(8), 1669–1680 (2015)

    Article  Google Scholar 

  2. Chen, G.: Scalable spectral clustering with cosine similarity. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Bei**g, China (2018)

    Google Scholar 

  3. Chen, G.: A general framework for scalable spectral clustering based on document models. Pattern Recogn. Lett. 125, 488–493 (2019)

    Article  Google Scholar 

  4. Chen, G., Lerman, G.: Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Found. Comput. Math. (2009). https://doi.org/10.1007/s10208-009-9043-7

    Article  MathSciNet  MATH  Google Scholar 

  5. Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) Algorithmic Learning Theory, pp. 367–381. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40935-6_26

    Chapter  Google Scholar 

  6. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Dissimilarity and distance measures for continuous data, pp. 51–52. Wiley, Boston, MA (2011)

    Google Scholar 

  7. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grou** using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)

    Google Scholar 

  8. Huang, D., Wang, C.D., Wu, J.S., Lai, J., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. (TKDE) 32, 1212–1226 (2020)

    Article  Google Scholar 

  9. Li, M., Lian, X.C., Kwok, J.T., Lu, B.L.: Time and space efficient spectral clustering via column sampling. In: CVPR 2011, pp. 2297–2304 (2011). https://doi.org/10.1109/CVPR.2011.5995425

  10. Meila, M., Shi, J.: A random walks view of spectral segmentation. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (2001)

    Google Scholar 

  11. Moazzen, Y., Tasdemir, K.: Sampling based approximate spectral clustering ensemble for partitioning data sets. In: Proceedings of the 23rd International Conference on Pattern Recognition (2016)

    Google Scholar 

  12. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14, pp. 849–856 (2001)

    Google Scholar 

  13. Pham, K., Chen, G.: Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs. In: Proceedings of the 12th Workshop on Graph-based Natural Language Processing (TextGraphs-12), pp. 28–37. Association for Computational Linguistics (2018)

    Google Scholar 

  14. Sakai, T., Imiya, A.: Fast spectral clustering with random projection and sampling. In: Perner, P. (ed.) MLDM 2009. LNCS (LNAI), vol. 5632, pp. 372–384. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03070-3_28

    Chapter  Google Scholar 

  15. Shaham, U., Stanton, K., Li, H., Basri, R., Nadler, B., Kluger, Y.: Spectralnet: spectral clustering using deep neural networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  16. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  17. Tasdemir, K.: Vector quantization based approximate spectral clustering of large datasets. Pattern Recogn. 45(8), 3034–3044 (2012)

    Article  Google Scholar 

  18. Wang, L., Leckie, C., Kotagiri, R., Bezdek, J.: Approximate pairwise clustering for large data sets via sampling plus extension. Pattern Recogn. 44, 222–235 (2011)

    Article  Google Scholar 

  19. Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Approximate spectral clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 134–146. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_15

    Chapter  Google Scholar 

  20. Yan, D., Huang, L., Jordan, M.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ran Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, R., Chen, G. (2024). Fast, Memory-Efficient Spectral Clustering with Cosine Similarity. In: Vasconcelos, V., Domingues, I., Paredes, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2023. Lecture Notes in Computer Science, vol 14469. Springer, Cham. https://doi.org/10.1007/978-3-031-49018-7_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49018-7_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49017-0

  • Online ISBN: 978-3-031-49018-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation