Calculating a Distributional Similarity Kernel using the Nyström Extension

  • Conference paper
  • First Online:
Challenges at the Interface of Data Analysis, Computer Science, and Optimization
  • 2506 Accesses

Abstract

The analysis of distributional similarities induced by word co-occurrences is an established tool for extracting semantically related words from a large text corpus. Based on the co-occurrence matrix C the basic kernel matrix K = CC T reflects word–word similarities. In order to considerably improve the results, a similarity kernel matrix is expressed as \(G\,=\,{U}_{k}{U}_{k}^{T}\), where U k are the first k eigenvectors of the eigendecomposition K = UΣU T. Clearly, the bottleneck of this technique is the high computational demand for calculating the eigendecomposition. In our study we speed up the calculation of the low-rank similarity kernel by means of the Nyström extension. We address in detail the inherent challenge of the Nyström method, namely selecting appropriate kernel matrix columns in such a way that the fast approximation process yields satisfactory results. To illustrate the effectiveness of our method, we have built a thesaurus containing 32,000 entries based on 0.5 billion corpus words (nouns, verbs, adjectives and adverbs) extracted from the Project Gutenberg text collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 106.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    www.gutenberg.org

References

  • Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175

    MathSciNet  MATH  Google Scholar 

  • Fellbaum C (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp 304–311

    Google Scholar 

  • Landauer TK, Dumais ST (1997) A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 104:211–240

    Article  Google Scholar 

  • Rapp R (2008) The automatic generation of thesauri of related words for English, French, German, and Russian. Int J Speech Technol 11:147–156

    Article  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Turney PD, Pantel P (2010) From frequency to meaning: Vector space models of semantics. J Artif Intell Res 37:141–188

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Arndt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arndt, M., Arndt, U. (2012). Calculating a Distributional Similarity Kernel using the Nyström Extension. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_34

Download citation

Publish with us

Policies and ethics

Navigation