Creating Semantic Representations

Nielsen, Finn Årup; Hansen, Lars Kai

doi:10.1007/978-3-030-37250-7_2

Finn Årup Nielsen⁵ &
Lars Kai Hansen⁵

633 Accesses
1 Altmetric

Abstract

In this chapter, we present the vector space model and some ways to further process such a representation: With feature hashing, random indexing, latent semantic analysis, non-negative matrix factorization, explicit semantic analysis and word embedding, a word or a text may be associated with a distributed semantic representation. Deep learning, explicit semantic networks and auxiliary non-linguistic information provide further means for creating distributed representations from linguistic data. We point to a few of the methods and datasets used to evaluate the many different algorithms that create a semantic representation, and we also point to some of the problems associated with distributed representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Thailand)

eBook: EUR 117.69; Price includes VAT (Thailand)

Softcover Book: EUR 139.99; Price excludes VAT (Thailand)

Hardcover Book: EUR 139.99; Price excludes VAT (Thailand)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html
2.
https://radimrehurek.com/gensim/corpora/hashdictionary.html
3.
https://nlp.stanford.edu/projects/glove/
4.
Note it is not always clear how the size is counted. One may count the window size as the total number of words or count it based on the number of words on each side of the word-of-interest.
5.
http://visualgenome.org/
6.
http://cocodataset.org
7.
https://aclweb.org/aclwiki/WordSimilarity-353_Test_Collection_(State_of_the_art)
8.
The dataset is available at https://sites.google.com/site/semeval2012task2/
9.
https://fasttext.cc/
10.
https://github.com/fnielsen/dasem/blob/master/dasem/data/four_words.csv
11.
https://github.com/fnielsen/afinn/blob/master/afinn/data/AFINN-en-165.txt
12.
http://babelfy.org/
13.
Dasem is a Python package for Danish semantics available at https://github.com/fnielsen/dasem
14.
We can place the super concept at A = (0, 0, 0) and the subconcepts at (1, 1, 1), (−1, −1, 1), (−1, 1, −1) and (1, −1, −1). All subconcepts have the same distance to the super concept and the same distance to all other subconcepts.

References

Al-Rfou, R., Perozzi, B., & Skiena, S. (2014). Polyglot: Distributed word representations for multilingual NLP. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, June 2014 (pp. 183–192). https://arxiv.org/pdf/1307.1662.pdf
Anderka, M., & Stein, B. (2009). The ESA retrieval model revisited. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 670–171). https://doi.org/10.1145/1571941.1572070
Bhattacharyya, M., Suhara, Y., Md Rahman, M., & Krause, M. (2017). Possible confounds in word-based semantic similarity test data. In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing – CSCW ‘17 Companion (pp. 147–150). https://doi.org/10.1145/3022198.3026357
Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001 (pp. 245–150). https://doi.org/10.1145/502512.502546
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016, July 29). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29. https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf
Chiu, B., Korhonen, A., & Pyysalo, S. (2016). Intrinsic evaluation of word vectors fails to predict extrinsic performance. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, August 2016. https://sites.google.com/site/repevalacl16/26_Paper.pdf
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Article Google Scholar
Faruqui, M., Tsvetkov, Y., Rastogi, P., & Dyer, C. (2016). Problems with evaluation of word embeddings using word similarity tasks. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, May 2016. https://sites.google.com/site/repevalacl16/11_Paper.pdf
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, August (pp. 1616–1626). http://aclweb.org/anthology/D17-1169
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20, 116–131. https://doi.org/10.1145/503104.503110
Article Google Scholar
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, January (pp. 1606–1611). http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-259.pdf
Ganchev, K., & Dredze, M. (2008). Small statistical models by random feature mixing. In Proceedings of the ACL-2008 Workshop on Mobile Language Processing. Association for Computational Linguistics.
Google Scholar
Henrich, V., & Hinrichs, E. (2010). GernEdiT: A graphical tool for GermaNet development. In Proceedings of the ACL 2010 System Demonstrations, July 2010 (pp. 19–24).
Google Scholar
Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2015). SensEmbed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, July 2015 (pp. 95–105). https://doi.org/10.3115/V1/P15-1010
Jurgens, D. A., Turney, P. D., Mohammad, S. M., & Holyoak, K. J. (2012). SemEval-2012 task 2: Measuring degrees of relational similarity. In ∗SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), June 2012 (pp. 356–364). http://www.aclweb.org/anthology/S12-1047
Koehn, P., & Knight, K. (2003). Empirical methods for compound splitting. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, February (pp. 187–193). https://doi.org/10.3115/1067807.1067833
Köper, M., Scheible, C., & Walde, S. S. (2015). Multilingual reliability and ‘semantic’ structure of continuous word spaces. In Proceedings of the 11th International Conference on Computational Semantics, April (pp. 40–45). http://www.aclweb.org/anthology/W15-0105
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. https://doi.org/10.1037/0033-295X.104.2.211
Article Google Scholar
Lee, D. D., & Seung, S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791. https://doi.org/10.1038/44565
Article Google Scholar
Lee, D. D., & Seung, S. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, 556–562. http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
Lee, Y.-Y., Ke, H., Huang, H.-H., & Chen, H.-H. (2016). Combining word embedding and lexical database for semantic relatedness measurement. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 73–74). https://doi.org/10.1145/2872518.2889395
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems, 27, 2177–2185.
Google Scholar
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistic, 3, 211–225.
Article Google Scholar
Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, August (pp. 13–18). https://doi.org/10.18653/V1/W16-2503
Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, 28, 203–208. https://doi.org/10.3758/BF03204766
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013a). Efficient estimation of word representations in vector space, January 2013. https://arxiv.org/pdf/1301.3781v3
Mikolov, T., Yih, W.-T., & Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2013 (pp. 746–51). http://www.aclweb.org/anthology/N13-1090
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations, December 2017. https://arxiv.org/pdf/1712.09405.pdf
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.
Article Google Scholar
Neelakantan, A., Shankar, J., Passos, A., & McCallum, A. (2014). Efficient non-parametric estimation of multiple embeddings per word in vector space. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1059–10699). https://doi.org/10.3115/V1/D14-1113
Nickel, M., Kiela, D., & Kiela, D. (2017, May 30). Poincaré Embeddings for learning hierarchical representations. Advances in Neural Information Processing Systems. https://arxiv.org/pdf/1705.08039.pdf
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In Proceedings of the Eswc2011 Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, May 2011 (pp. 93–98). http://ceur-ws.org/Vol-718/paper\_16.pdf
Nielsen, F. Å. (2017, October). Wembedder: Wikidata entity embedding web service. https://doi.org/10.5281/ZENODO.1009127
Nielsen, F. Å. (2018). Linking ImageNet WordNet Synsets with Wikidata. In WWW ‘18 Companion: The 2018 Web Conference Companion, April 23–27, 2018, Lyon (pp. 1809–1814). https://doi.org/10.1145/3184558.3191645
Nielsen, F. Å., & Hansen, L. K. (2002). Modeling of activation data in the BrainMap database: Detection of outliers. Human Brain Map**, 15, 146–156. https://doi.org/10.1002/HBM.10012
Article Google Scholar
Nielsen, F. Å., & Hansen, L. K. (2017). Open semantic analysis: The case of word level semantics in Danish. In Human Language Technologies as a Challenge for Computer Science and Linguistics, October 2017 (pp. 415–19). http://www2.compute.dtu.dk/pubdb/views/edoc\_download.php/7029/pdf/imm7029.pdf
Nielsen, F. Å., & Hansen, L. K. (2018). Inferring visual semantic similarity with deep learning and Wikidata: Introducing imagesim-353. In Proceedings of the First Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, April (pp. 56–61). http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/7102/pdf/imm7102.pdf
Nielsen, F. Å., Balslev, D., & Hansen, L. K. (2005). Mining the posterior cingulate: Segregation between memory and pain components. NeuroImage, 27, 520–532. https://doi.org/10.1016/J.NEUROIMAGE.2005.04.034
Article Google Scholar
Nissim, M., van Noord, R., & van der Goot, R. (2019). Fair is better than sensational: Man is to doctor as woman is to doctor. ar**v 1905.09866.
Google Scholar
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). http://www.emnlp2014.org/papers/pdf/EMNLP2014162.pdf
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of NAACL-HLT 2018, March 2018 (pp. 2227–2237). https://arxiv.org/pdf/1802.05365.pdf
Radford, A., Józefowicz, R., & Sutskever, I. (2017). Learning to generate reviews and discovering sentiment, April 2017. https://arxiv.org/pdf/1704.01444.pdf
Radovanović, M., Nanopoulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487–2531.
Google Scholar
Řehůřek, R. (2011). Fast and faster: A comparison of two streamed matrix decomposition algorithms, February 2011. https://arxiv.org/pdf/1102.5597.pdf
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613–620. https://doi.org/10.1145/361219.361220
Article Google Scholar
Scheepers, T., Kanoulas, E., & Gavves, E. (2018). Improving word embedding compositionality using lexicographic definitions. In Proceedings of the 2018 World Wide Web Conference. https://doi.org/10.1145/3178876.3186007
Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/J.NEUNET.2014.09.003
Article Google Scholar
Soboroff, I. M., Nicholas, C. K., Kukla, J. M., & Ebert, D. S. (1997). Visualizing document authorship using n-grams and latent semantic indexing. In Proceedings of the 1997 Workshop on New Paradigms in Information Visualization and Manipulation. https://doi.org/10.1145/275519.275529
Speer, R., Chin, J., & Havasi, C. (2016). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, December 2016 (pp. 4444–4451). https://arxiv.org/pdf/1612.03975.pdf
Sun, Y., Rao, N., & Ding, W. (2017). A simple approach to learn polysemous word embeddings, July 2017. https://arxiv.org/pdf/1707.01793.pdf
Svoboda, L., & Brychcín, T. (2018). New word analogy corpus for exploring embeddings of Czech words. In Computational Linguistics and Intelligent Text Processing (pp. 103–114). https://doi.org/10.1007/978-3-319-75477-2_6
Vylomova, E., Rimell, L., Cohn, T., & Baldwin, T. (2016). Take and took, gaggle and goose, book and read: Evaluating the utility of vector differences for lexical relation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 1671–1682). https://doi.org/10.18653/V1/P16-1158
Wróblewska, A., Krasnowska-Kieraś, K., & Rybak, P. (2017). Towards the evaluation of feature embedding models of the fusional languages. In Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2017 (pp. 420–424).
Google Scholar

Download references

Acknowledgments

We would like to thank Innovation Fund Denmark for funding through the DABAI project.

Author information

Authors and Affiliations

Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark
Finn Årup Nielsen & Lars Kai Hansen

Authors

Finn Årup Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kai Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Finn Årup Nielsen .

Editor information

Editors and Affiliations

Department of Psychology, Lund University, Lund, Sweden
Sverker Sikström
Department of Behavioral Sciences and Learning, Linkö** University, Linkö**, Sweden
Danilo Garcia
Blekinge Center of Competence, Region Blekinge, Karlskrona, Sweden
Danilo Garcia
Department of Psychology, University of Gothenburg, Gothenburg, Sweden
Danilo Garcia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nielsen, F.Å., Hansen, L.K. (2020). Creating Semantic Representations. In: Sikström, S., Garcia, D. (eds) Statistical Semantics. Springer, Cham. https://doi.org/10.1007/978-3-030-37250-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-37250-7_2
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37249-1
Online ISBN: 978-3-030-37250-7
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics

Creating Semantic Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rehabilitation of Count-Based Models for Word Vector Representations

Distributed Vector Representations of Words in the Sigma Cognitive Architecture

Improving ESA with Document Similarity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Creating Semantic Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rehabilitation of Count-Based Models for Word Vector Representations

Distributed Vector Representations of Words in the Sigma Cognitive Architecture

Improving ESA with Document Similarity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation