Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

Miftahutdinov, Zulfat; Kadurin, Artur; Kudrin, Roman; Tutubalina, Elena

doi:10.1007/978-3-030-72113-8_30

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12656))

Included in the following conference series:

European Conference on Information Retrieval

2358 Accesses
3 Citations

Abstract

Concept normalization in free-form texts is a crucial step in every text-mining pipeline. Neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art results in the biomedical domain. In the context of drug discovery and development, clinical trials are necessary to establish the efficacy and safety of drugs. We investigate the effectiveness of transferring concept normalization from the general biomedical domain to the clinical trials domain in a zero-shot setting with an absence of labeled data. We propose a simple and effective two-stage neural approach based on fine-tuned BERT architectures. In the first stage, we train a metric learning model that optimizes relative similarity of mentions and concepts via triplet loss. The model is trained on available labeled corpora of scientific abstracts to obtain vector embeddings of concept names and entity mentions from texts. In the second stage, we find the closest concept name representation in an embedding space to a given clinical mention. We evaluated several models, including state-of-the-art architectures, on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. Extensive experiments validate the effectiveness of our approach in knowledge transfer from the scientific literature to clinical trials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 96.29; Price includes VAT (Germany)

Softcover Book: EUR 128.39; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Article Open access 17 December 2021

Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism

Linking entities through an ontology using word embeddings and syntactic re-ranking

Article Open access 27 March 2019

Notes

References

Aronson, A.R.: Effective map** of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)
Google Scholar
Atal, I., Zeitoun, J.D., Névéol, A., Ravaud, P., Porcher, R., Trinquart, L.: Automatic classification of registered clinical trials towards the global burden of diseases taxonomy of diseases and injuries. BMC Bioinform. 17(1), 392 (2016)
Article Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl\_1), D267–D270 (2004)
Google Scholar
Boland, M.R., Miotto, R., Gao, J., Weng, C.: Feasibility of feature-based indexing, clustering, and search of clinical trials. Meth. Inf. Med. 52(05), 382–394 (2013)
Article Google Scholar
Brown, A.S., Patel, C.J.: A standard database for drug repositioning. Sci. Data 4(1), 1–7 (2017)
Article Google Scholar
Coletti, M.H., Bleich, H.L.: Medical subject headings used to search the biomedical literature. J. Am. Med. Inform. Assoc. 8(4), 317–323 (2001)
Article Google Scholar
Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47(D1), D948–D954 (2019)
Article Google Scholar
Davis, A.P., Wiegers, T.C., Rosenstein, M.C., Mattingly, C.J.: Medic: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012 (2012)
Google Scholar
Dermouche, M., Looten, V., Flicoteaux, R., Chevret, S., Velcin, J., Taright, N.: ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code extraction from death certificates. CLEF (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Gayvert, K.M., Madhukar, N.S., Elemento, O.: A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23(10), 1294–1301 (2016)
Article Google Scholar
Ghiasvand, O., Kate, R.J.: UWM: Disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns. In: SemEval@ COLING, pp. 828–832 (2014)
Google Scholar
Gill, S.K., Christopher, A.F., Gupta, V., Bansal, P.: Emerging role of bioinformatics tools and software in evolution of clinical research. Perspect. Clin. Res. 7(3), 115 (2016)
Article Google Scholar
Gillick, D., et al.: Learning dense representations for entity retrieval. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 528–537 (2019)
Google Scholar
Hao, T., Rusanov, A., Boland, M.R., Weng, C.: Clustering clinical trials with similar eligibility criteria features. J. Biomed. Inform. 52, 112–120 (2014)
Article Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Huang, C.C., Lu, Z.: Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings Bioinform. 17(1), 132–144 (2015)
Article Google Scholar
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2333–2338 (2013)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. CoRR abs/1905.01969. External Links: Link Cited by 2, 2–2 (2019)
Google Scholar
Ivanenkov, Y., et al.: Identification of novel antibacterials using machine-learning techniques. Front. Pharmacol. 10, 913 (2019)
Article Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. ar**v preprint ar**v:1702.08734 (2017)
Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 2839–2846 (2016)
Article Google Scholar
Lee, J., et al.: Biobert: pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019)
Google Scholar
Leveling, J.: Patient selection for clinical trials based on concept-based retrieval and result filtering and ranking. In: TREC (2017)
Google Scholar
Li, H., et al.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 79–86 (2017)
Google Scholar
Li, J., Lu, Z.: Systematic identification of pharmacogenomics information from clinical trials. J. Biomed. Inform. 45(5), 870–878 (2012)
Article Google Scholar
Li, J., et al.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)
Google Scholar
Liu, Y., Guo, Y., Bakker, E.M., Lew, M.S.: Learning a recurrent residual fusion network for multimodal matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4107–4116 (2017)
Google Scholar
Lo, B.: Sharing clinical trial data: maximizing benefits, minimizing risk. Jama 313(8), 793–794 (2015)
Article Google Scholar
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Article Google Scholar
Miftahutdinov, Z., Tutubalina, E.: Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 393–399 (2019)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mork, J.G., Jimeno-Yepes, A., Aronson, A.R.: The NLM medical text indexer system for indexing biomedical literature. In: BioASQ@ CLEF (2013)
Google Scholar
NLM: Umls glossary (2016). http://www.nlm.nih.gov/research/umls/new_users/glossary.html
Phan, M.C., Sun, A., Tay, Y.: Robust representation learning of biomedical names. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3275–3285 (2019)
Google Scholar
Pradhan, S., Elhadad, N., Chapman, W.W., Manandhar, S., Savova, G.: Semeval-2014 task 7: Analysis of clinical text. In: SemEval@ COLING, pp. 54–62 (2014)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Sen, A., et al.: The representativeness of eligible patients in type 2 diabetes trials: a case study using gist 2.0. J. Am. Med. Inform. Assoc. 25(3), 239–247 (2018)
Google Scholar
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2014)
Article Google Scholar
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. ar**v preprint ar**v:2005.00239 (2020)
Suominen, H., et al.: Overview of the ShARe/CLEF eHealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_24
Chapter Google Scholar
Tutubalina, E., Kadurin, A., Miftahutdinov, Z.: Fair evaluation in concept normalization: a large-scale comparative analysis for bert-based models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6710–6716 (2020)
Google Scholar
Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018)
Article Google Scholar
Van Mulligen, E., Afzal, Z., Akhondi, S.A., Vo, D., Kors, J.A.: Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts. CLEF (2016)
Google Scholar
Wishart, D.S., et al.: Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34(suppl\_1), D668–D672 (2006)
Google Scholar
Wright, D., Katsis, Y., Mehta, R., Hsu, C.N.: Normco: deep disease normalization for biomedical knowledge base construction. In: Automated Knowledge Base Construction (2019). https://openreview.net/forum?id=BJerQWcp6Q
Wu, P., Hoi, S.C., **a, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 153–162 (2013)
Google Scholar
Zhao, S., Liu, T., Zhao, S., Wang, F.: A neural multi-task learning framework to jointly model medical named entity recognition and normalization. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 817–824 (2019)
Google Scholar
Zhavoronkov, A., et al.: Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37(9), 1038–1040 (2019)
Article Google Scholar
Zhu, M., Celikkaya, B., Bhatia, P., Reddy, C.K.: Latte: Latent type modeling for biomedical entity linking. ar**v preprint ar**v:1911.09787 (2019)

Download references

Acknowledgements

Research on academic corpora was carried out by Z.M. and supported by RFBR, project no. 19–37-90074.

Author information

Authors and Affiliations

Insilico Medicine Hong Kong, Pak Shek Kok, Hong Kong
Zulfat Miftahutdinov, Artur Kadurin, Roman Kudrin & Elena Tutubalina

Authors

Zulfat Miftahutdinov
View author publications
You can also search for this author in PubMed Google Scholar
Artur Kadurin
View author publications
You can also search for this author in PubMed Google Scholar
Roman Kudrin
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zulfat Miftahutdinov .

Editor information

Editors and Affiliations

Radboud University Nijmegen, Nijmegen, The Netherlands
Djoerd Hiemstra
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Toulouse Institute of Computer Science Research, Toulouse, France
Josiane Mothe
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Raffaele Perego
Leipzig University, Leipzig, Germany
Martin Potthast
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miftahutdinov, Z., Kadurin, A., Kudrin, R., Tutubalina, E. (2021). Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-72113-8_30
Published: 27 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72112-1
Online ISBN: 978-3-030-72113-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism

Linking entities through an ontology using word embeddings and syntactic re-ranking

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism

Linking entities through an ontology using word embeddings and syntactic re-ranking

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation