NSEEN: Neural Semantic Embedding for Entity Normalization

Fakhraei, Shobeir; Mathew, Joel; Ambite, José Luis

doi:10.1007/978-3-030-46147-8_40

Shobeir Fakhraei¹⁴,
Joel Mathew¹⁴ &
José Luis Ambite¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1543 Accesses
9 Citations

Abstract

Much of human knowledge is encoded in text, available in scientific publications, books, and the web. Given the rapid growth of these resources, we need automated methods to extract such knowledge into machine-processable structures, such as knowledge graphs. An important task in this process is entity normalization, which consists of map** noisy entity mentions in text to canonical entities in well-known reference sets. However, entity normalization is a challenging problem; there often are many textual forms for a canonical entity that may not be captured in the reference set, and entities mentioned in text may include many syntactic variations, or errors. The problem is particularly acute in scientific domains, such as biology. To address this problem, we have developed a general, scalable solution based on a deep Siamese neural network model to embed the semantic information about the entities, as well as their syntactic variations. We use these embeddings for fast map** of new entities to large reference sets, and empirically show the effectiveness of our framework in challenging bio-entity normalization datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 82.38; Price includes VAT (Germany)

Softcover Book: EUR 104.85; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Edge Weight Updating Neural Network for Named Entity Normalization

Article 21 December 2022

Measuring Entity Relatedness via Entity and Text Joint Embedding

Article 17 December 2018

A method for named entity normalization in biomedical articles: application to diseases and plants

Article Open access 13 October 2017

Notes

1.
For brevity of notation we denote \(\delta (v_i,v_j)\) with \(\delta _v\).

References

University of Southern California - Information Science Institute Entity Grounding System (2018). http://dna.isi.edu:7100/
Annoy (approximate nearest neighbors oh yeah) (2019). https://github.com/spotify/annoy
Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)
Article Google Scholar
Arighi, C., et al.: Bio-ID track overview. In: Proceedings of the BioCreative VI Workshop (2017)
Google Scholar
Bachrach, Y., et al.: Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces. In: Proceedings of the 8th ACM Conference on Recommender systems (2014)
Google Scholar
Białecki, A., Muir, R., Ingersoll, G.: Apache Lucene 4. In: SIGIR 2012 Workshop on Open Source Information Retrieval (2012)
Google Scholar
Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 294–309. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_19
Chapter Google Scholar
Chen, H., Perozzi, B., Hu, Y., Skiena, S.: HARP: hierarchical representation learning for networks (2018)
Google Scholar
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2
Book Google Scholar
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE TKDE 24(9), 1537–1555 (2012)
Google Scholar
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation (2003)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE TKDE 19(1), 1–16 (2007)
Google Scholar
Getoor, L., Machanavajjhala, A.: Entity resolution: theory, practice & open challenges. Proc. VLDB Endow. 5(12), 2018–2019 (2012)
Article Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2015)
Article Google Scholar
Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018)
Article Google Scholar
Jurczyk, P., Lu, J.J., **ong, L., Cragan, J.D., Correa, A.: FRIL: a tool for comparative record linkage. In: American Medical Informatics Association (AMIA) Annual Symposium Proceedings (2008)
Google Scholar
Kang, N., Singh, B., Afzal, Z., van Mulligen, E.M., Kors, J.A.: Using rule-based natural language processing to improve disease normalization in biomedical text. JAMIA 20(5), 876–881 (2012)
Google Scholar
Kotnis, B., Nastase, V.: Analysis of the impact of negative sampling on link prediction in knowledge graphs. In: WSDM 1st Workshop on Knowledge Base Construction, Reasoning and Mining (KBCOM) (2017)
Google Scholar
Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (2006)
Google Scholar
Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Leaman, R., Lu, Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)
Article Google Scholar
Lee, J., et al.: BioBERT: pre-trained biomedical language representation model for biomedical text mining. ar**. In: ICML Workshop on Computational Biology (2019)
Google Scholar
Michelson, M., Knoblock, C.A.: Learning blocking schemes for record linkage. In: AAAI (2006)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data (2018)
Google Scholar
Naidan, B., Boytsov, L.: Non-metric space library manual. ar**v preprint ar**v:1508.05470 (2015)
Neculoiu, P., Versteegh, M., Rotaru, M.: Learning text similarity with siamese recurrent networks. In: Proceedings the 1st Workshop on Representation Learning for NLP (2016)
Google Scholar
Papadakis, G., Svirsky, J., Gal, A., Palpanas, T.: Comparative analysis of approximate blocking techniques for entity resolution. Proc. VLDB Endow. 9(9), 684–695 (2016)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Ponomarenko, A., Avrelin, N., Naidan, B., Boytsov, L.: Comparative analysis of data structures for approximate nearest neighbor search. In: Data Analytics (2014)
Google Scholar
Rastegari, M., Choi, J., Fakhraei, S., Hal, D., Davis, L.: Predictable dual-view hashing. In: International Conference on Machine Learning (ICML) (2013)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics (2018)
Google Scholar

Download references

Acknowledgments

This work was supported in part by DARPA Big Mechanism program under contract number W911NF-14-1-0364.

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, Los Angeles, USA
Shobeir Fakhraei, Joel Mathew & José Luis Ambite

Authors

Shobeir Fakhraei
View author publications
You can also search for this author in PubMed Google Scholar
Joel Mathew
View author publications
You can also search for this author in PubMed Google Scholar
José Luis Ambite
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shobeir Fakhraei .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fakhraei, S., Mathew, J., Ambite, J.L. (2020). NSEEN: Neural Semantic Embedding for Entity Normalization. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-46147-8_40
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46146-1
Online ISBN: 978-3-030-46147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

NSEEN: Neural Semantic Embedding for Entity Normalization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Edge Weight Updating Neural Network for Named Entity Normalization

Measuring Entity Relatedness via Entity and Text Joint Embedding

A method for named entity normalization in biomedical articles: application to diseases and plants

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

NSEEN: Neural Semantic Embedding for Entity Normalization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Edge Weight Updating Neural Network for Named Entity Normalization

Measuring Entity Relatedness via Entity and Text Joint Embedding

A method for named entity normalization in biomedical articles: application to diseases and plants

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation