Introduction to Natural Language Processing of Clinical Text

  • Chapter
  • First Online:
Natural Language Processing in Biomedicine

Part of the book series: Cognitive Informatics in Biomedicine and Healthcare ((CIBH))

  • 114 Accesses

Abstract

Clinical and biomedical natural language processing (NLP) has a wide range of practical applications in clinical and biomedical research, quality assurance of clinical care and delivery of information to patients. Research in Clinical NLP is constantly growing. This book serves as an introduction to the areas of foundational and applied biomedical language processing, with particular focus on clinical language processing. At the time of this writing, Large Language Models (LLMs) demonstrated remarkable performance in some complex language processing and language generation tasks, including tasks that involve clinical language. These advances revealed the unprecedented opportunities for language processing and the unforeseen pitfalls and complications of using LLMs. All these developments emphasize the need for knowledge of the traditional linguistically motivated and domain-knowledge grounded approaches presented in this book along with the latest developments in machine learning and LLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2): e0000198. https://doi.org/10.1371/journal.pdig.0000198.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature. 2023 Aug;620(7972):172–80. https://doi.org/10.1038/s41586-023-06291-2.

  3. American College of Emergency Physicians (ACEP). Things You Can Do on Your Own—Epic. [Internet]. ACEP; 2021 [cited 19 Sep 2023]. Available from: https://www.acep.org/administration/quality/health-information-technology/epic-articles/things-you-can-do-on-your-own-epic/#:~:text=Also%20known%20as%20%E2%80%9Cdot%20phrases,a%20single%20sentence%20(eg%3A%20

  4. Jacobs L. Interview with Lawrence Weed, MD-the father of the problem-oriented medical record looks ahead. Perm J. 2009 Summer;13(3):84–9. https://doi.org/10.7812/TPP/09-068.

  5. Weed LL. The importance of medical records. Can Fam Physician. 1969;15(12):23–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011 Mar–Apr;18(2):181–6. https://doi.org/10.1136/jamia.2010.007237.

  7. HealthIT.gov. ONC’s Cures Act Final Rule. [Internet]. HealthIT.gov; 2022 [cited 19 Sep 2023]. Available from: https://www.healthit.gov/topic/oncs-cures-act-final-rule

  8. Blease C, Torous J, Hägglund M. Does patient access to clinical notes change documentation? Front Public Health. 2020;27(8): 577896. https://doi.org/10.3389/fpubh.2020.577896.

    Article  Google Scholar 

  9. Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM. 1966;9(1):36–45.

    Article  Google Scholar 

  10. Pratt AW, Pacak MG. Automated processing of medical English. In: International conference on computational linguistics COLING 1969: Preprint No. 11; 1969 Sept.

    Google Scholar 

  11. Shapiro AR. A system for conceptual analysis of medical practices. In: Proceedings of the annual symposium on computer application in medical care, vol. 2. American Medical Informatics Association; 1980 Nov 11. p. 867.

    Google Scholar 

  12. Sager N, Friedman C, Chi E, et al. The analysis and processing of clinical narrative. Medinfo. 1986:1101–5.

    Google Scholar 

  13. Sager N, Friedman C, Lyman M. Medical language processing: computer management of narrative data. Reading, MA: Addison-Wesley; 1987.

    Google Scholar 

  14. Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997:595–9.

    Google Scholar 

  15. Jain NL, Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proc AMIA Annu Fall Symp. 1997:829–33.

    Google Scholar 

  16. Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000:270–4.

    Google Scholar 

  17. Haug PJ, Christensen L, Gundersen M, et al. A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp. 1997:814–18.

    Google Scholar 

  18. Fiszman M, Chapman WW, Evans SR, Haug PJ. Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp. 1999:67–71.

    Google Scholar 

  19. Denny JC, Smithers JD, Miller RA, et al. Understanding medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc. 2003;10:351–62.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010 Sep–Oct;17(5):507–13. https://doi.org/10.1136/jamia.2009.001560.

  21. Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010 May–Jun;17(3):229–36. https://doi.org/10.1136/jamia.2009.002733.

  22. Demner-Fushman D, Rogers WJ, Aronson AR. MetaMap lite: an evaluation of a new Java implementation of MetaMap. J Am Med Inform Assoc. 2017;24(4):841–4. https://doi.org/10.1093/jamia/ocw177.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. CLAMP—a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2018;25(3):331–6. https://doi.org/10.1093/jamia/ocx132.

    Article  PubMed  Google Scholar 

  24. Hahn U, Romacker M, Schulz S. MEDSYNDIKATE-a natural language system for the extraction of medical information from findings reports. Int J Med Inform. 2002;67:63–74.

    Article  PubMed  Google Scholar 

  25. Chapman WW, Fiszman M, Dowling JN, et al. Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Stud Health Technol Inform. 2004;107(Pt 1):487–91.

    Google Scholar 

  26. Chapman WW, Aronsky D, Fiszman M, et al. Contribution of a speech recognition system to a computerized pneumonia guideline in the emergency department. Proc AMIA Symp. 2000:131–5.

    Google Scholar 

  27. Ceusters W, Spyns P, De Moor G. From natural language to formal language: when MultiTALE meets GALEN. Stud Health Technol Inform. 1997;43(Pt A):396–400.

    PubMed  Google Scholar 

  28. Zeng QT, Goryachev S, Weiss S, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Scheurwegs E, Luyckx K, Luyten L, Daelemans W, Van den Bulcke T. Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J Am Med Inform Assoc. 2016 Apr;23(e1):e11–9. https://doi.org/10.1093/jamia/ocv115. Epub 2015 Aug 27.

  30. Meystre SM, Heider PM, Kim Y, Davis M, Obeid J, Madory J, Alekseyenko AV. Natural language processing enabling COVID-19 predictive analytics to support data-driven patient advising and pooled testing. J Am Med Inform Assoc. 2021;29(1):12–21. https://doi.org/10.1093/jamia/ocab186.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, Ngo QH, Dien D, Kawtrakul A, Takeuchi K, Shigematsu M, Taniguchi K. BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008;24(24):2940–1. https://doi.org/10.1093/bioinformatics/btn534.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. McCray AT, Tse T. Understanding search failures in consumer health information systems. AMIA Annu Symp Proc. 2003;2003:430–4.

    PubMed  PubMed Central  Google Scholar 

  33. Johnson A, Pollard T, Horng S, Celi L A, Mark R. MIMIC-IV-Note: deidentified free-text clinical notes (version 2.2). PhysioNet. 2023. Available from: https://doi.org/10.13026/1n74-ne17.

  34. Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010;10(1):1–6.

    Article  Google Scholar 

  35. Berg H, Henriksson A, Fors U, Dalianis H. De-identification of Clinical text for secondary use: research issues. HEALTHINF. 2021:592–9.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dina Demner Fushman .

Editor information

Editors and Affiliations

Glossary

Large Language Model (LLM)

A class of deep learning architectures based on transformer models trained on extensive amounts of text. A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data.

The Medical Information Mart for Intensive Care (MIMIC)

a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.

Office of the National Coordinator for Health Information Technology (ONC)

An agency within the US Department of Health and Human Services that is charged with supporting the adoption of health information technology and promoting nationwide health information exchange to improve health care.

Unified Medical Language System (UMLS)

A terminology system, developed under the direction of the National Library of Medicine, to produce a common structure that ties together the various vocabularies that have been created for biomedical domains.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Demner Fushman, D., Xu, H. (2024). Introduction to Natural Language Processing of Clinical Text. In: Xu, H., Demner Fushman, D. (eds) Natural Language Processing in Biomedicine. Cognitive Informatics in Biomedicine and Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-031-55865-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-55865-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-55864-1

  • Online ISBN: 978-3-031-55865-8

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics

Navigation