Abstract
The analysis of distributional similarities induced by word co-occurrences is an established tool for extracting semantically related words from a large text corpus. Based on the co-occurrence matrix C the basic kernel matrix K = CC T reflects word–word similarities. In order to considerably improve the results, a similarity kernel matrix is expressed as \(G\,=\,{U}_{k}{U}_{k}^{T}\), where U k are the first k eigenvectors of the eigendecomposition K = UΣU T. Clearly, the bottleneck of this technique is the high computational demand for calculating the eigendecomposition. In our study we speed up the calculation of the low-rank similarity kernel by means of the Nyström extension. We address in detail the inherent challenge of the Nyström method, namely selecting appropriate kernel matrix columns in such a way that the fast approximation process yields satisfactory results. To illustrate the effectiveness of our method, we have built a thesaurus containing 32,000 entries based on 0.5 billion corpus words (nouns, verbs, adjectives and adverbs) extracted from the Project Gutenberg text collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175
Fellbaum C (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp 304–311
Landauer TK, Dumais ST (1997) A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 104:211–240
Rapp R (2008) The automatic generation of thesauri of related words for English, French, German, and Russian. Int J Speech Technol 11:147–156
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Turney PD, Pantel P (2010) From frequency to meaning: Vector space models of semantics. J Artif Intell Res 37:141–188
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arndt, M., Arndt, U. (2012). Calculating a Distributional Similarity Kernel using the Nyström Extension. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-24466-7_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24465-0
Online ISBN: 978-3-642-24466-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)