Log in

Rule-based natural language processing for automation of stroke data extraction: a validation study

  • Functional Neuroradiology
  • Published:
Neuroradiology Aims and scope Submit manuscript

Abstract

Purpose

Data extraction from radiology free-text reports is time consuming when performed manually. Recently, more automated extraction methods using natural language processing (NLP) are proposed. A previously developed rule-based NLP algorithm showed promise in its ability to extract stroke-related data from radiology reports. We aimed to externally validate the accuracy of CHARTextract, a rule-based NLP algorithm, to extract stroke-related data from free-text radiology reports.

Methods

Free-text reports of CT angiography (CTA) and perfusion (CTP) studies of consecutive patients with acute ischemic stroke admitted to a regional stroke center for endovascular thrombectomy were analyzed from January 2015 to 2021. Stroke-related variables were manually extracted as reference standard from clinical reports, including proximal and distal anterior circulation occlusion, posterior circulation occlusion, presence of ischemia or hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. These variables were simultaneously extracted using a rule-based NLP algorithm. The NLP algorithm’s accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were assessed.

Results

The NLP algorithm’s accuracy was > 90% for identifying distal anterior occlusion, posterior circulation occlusion, hemorrhage, and ASPECTS. Accuracy was 85%, 74%, and 79% for proximal anterior circulation occlusion, presence of ischemia, and collateral status respectively.

The algorithm confirmed the absence of variables from radiology reports with an 87–100% accuracy.

Conclusions

Rule-based NLP has a moderate to good performance for stroke-related data extraction from free-text imaging reports. The algorithm’s accuracy was affected by inconsistent report styles and lexicon among reporting radiologists.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

ASPECTS:

Alberta stroke program early CT score

CTA:

CT angiography

CTP:

CT perfusion

ML:

Machine learning

NLP:

Natural language processing

References

  1. Yu AY, Holodinsky JK, Zerna C, Svenson LW, Jetté N, Quan H, Hill MD (2016) Use and utility of administrative health data for stroke research and surveillance. Stroke 47(7):1946–1952. https://doi.org/10.1161/STROKEAHA.116.012390

    Article  PubMed  Google Scholar 

  2. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G (2000) Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10. https://doi.org/10.1006/cbmr.1999.1535

    Article  CAS  PubMed  Google Scholar 

  3. Pons E, Braun LM, Hunink MG, Kors JA (2016) Natural language processing in radiology: a systematic review. Radiology 279(2):329–343. https://doi.org/10.1148/radiol.16142770

    Article  PubMed  Google Scholar 

  4. Garg R, Oh E, Naidech A, Kording K, Prabhakaran S (2019) Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis 28(7):2045–2051. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004

    Article  PubMed  Google Scholar 

  5. Waqas M, Rai AT, Vakharia K, Chin F, Siddiqui AH (2020) Effect of definition and methods on estimates of prevalence of large vessel occlusion in acute ischemic stroke: a systematic review and meta-analysis. J Neurointerv Surg 12(3):260–265. https://doi.org/10.1136/neurintsurg-2019-015172

    Article  PubMed  Google Scholar 

  6. Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W et al (2021) A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 21(1):179. https://doi.org/10.1186/s12911-021-01533-7

    Article  PubMed  PubMed Central  Google Scholar 

  7. Yu AYX, Liu ZA, Pou-Prom C, Lopes K, Kapral MK, Aviv RI, Mamdani M (2021) Automating Stroke Data extraction from free-text radiology reports using natural language processing: instrument validation study. JMIR Med Inform 9(5):e24381. https://doi.org/10.2196/24381

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ong CJ, Orfanoudaki A, Zhang R, Caprasse FPM, Hutch M, Ma L, Fard D, Balogun O, Miller MI, Minnig M et al (2020) Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS One 15(6):e0234908. https://doi.org/10.1371/journal.pone.0234908

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li MD, Lang M, Deng F, Chang K, Buch K, Rincon S, Mehan WA, Leslie-Mazwi TM, Kalpathy-Cramer J (2021) Analysis of stroke detection during the COVID-19 pandemic using natural language processing of radiology reports. AJNR Am J Neuroradiol 42(3):429–434. https://doi.org/10.3174/ajnr.A6961

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S (2021) Practical guide to natural language processing for radiology. Radiographics 41(5):1446–1453. https://doi.org/10.1148/rg.2021200113

    Article  PubMed  Google Scholar 

  11. Zech J, Pain M, Titano J, Badgeley M, Schefflein J, Su A, Costa A, Bederson J, Lehar J, Oermann EK (2018) Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287(2):570–580. https://doi.org/10.1148/radiol.2018171093

    Article  PubMed  Google Scholar 

  12. Kim C, Zhu V, Obeid J, Lenert L (2019) Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS One 14(2):e0212778. https://doi.org/10.1371/journal.pone.0212778

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z (2018) NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. AMIA Jt Summits Transl Sci Proc 2017:188–196

    PubMed  Google Scholar 

  14. Sykes D, Grivas A, Grover C, Tobin R, Sudlow C, Whiteley W, Mcintosh A, Whalley H, Alex B (2021) Comparison of rule-based and neural network models for negation detection in radiology reports. Nat Lang Eng 27(2):203–224. https://doi.org/10.1017/S1351324920000509

    Article  Google Scholar 

  15. Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley H, Whiteley W, Alex B (2019) Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. CoRR [Internet]. https://arxiv.org/abs/1903.03985. Accessed 20 Feb 2022

  16. Davidson EM, Poon MTC, Casey A, Grivas A, Duma D, Dong H, Suárez-Paniagua V, Grover C, Tobin R, Whalley H et al (2021) The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Med Imaging 21(1):142. https://doi.org/10.1186/s12880-021-00671-8

    Article  PubMed  PubMed Central  Google Scholar 

  17. Houssein EH, Mohamed RE, Ali AA (2021) Machine learning techniques for biomedical natural language processing: a comprehensive review. IEEE Access 9:140628–140653. https://doi.org/10.1109/ACCESS.2021.3119621

    Article  Google Scholar 

  18. CHARTextract – Li Ka Shing Centre for Healthcare Analytics Research & Training (LKS-CHART). 2019. https://lks-chart.github.io/CHARTextract-docs/. Accessed 20 Feb 2022

  19. Wintermark M, Albers GW, Broderick JP, Demchuk AM, Fiebach JB, Fiehler J, Grotta JC, Houser G, Jovin TG, Lees KR et al (2013) Stroke Imaging Research (STIR) and Virtual International Stroke Trials Archive (VISTA)-Imaging Investigators. Acute Stroke Imaging Research Roadmap II. Stroke 44(9):2628–39. https://doi.org/10.1161/STROKEAHA.113.002015

    Article  PubMed  PubMed Central  Google Scholar 

  20. Zaidat OO, Yoo AJ, Khatri P, Tomsick TA, von Kummer R, Saver JL, Marks MP, Prabhakaran S, Kallmes DF, Fitzsimmons BF et al (2013) Cerebral Angiographic Revascularization Grading (CARG) Collaborators; STIR Revascularization working group; STIR Thrombolysis in Cerebral Infarction (TICI) Task Force. Recommendations on angiographic revascularization grading standards for acute ischemic stroke: a consensus statement. Stroke 44(9):2650–63. https://doi.org/10.1161/STROKEAHA.113.001972

    Article  PubMed  PubMed Central  Google Scholar 

  21. Shijie Wu, Mark Dredze. Beto, Bentz, Becas (2019) The surprising cross-lingual effectiveness of BERT. In Kentaro Inui, **g Jiang, Vincent Ng, **aojun Wan 0001, editors, Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Association for Computational Linguistics, Hong Kong, China, pp. 833–844. https://doi.org/10.18653/v1/D19-1077

  22. Peng Y, Yan K, Sandfort V, Summers RM, Lu Z (2019) A self-attention based deep learning method for lesion attribute detection from CT reports. Int Conf Healthcare Info (ICHI). 1–5. IEEE. https://doi.org/10.1109/ICHI.2019.8904668

Download references

Funding

Paulo Puac-Polanco was supported by a University of Ottawa Brain and Mind Research Institute Trainee Award. The remaining authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard I. Aviv.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

This retrospective chart review study involving human participants was in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Study was Ethics approved by the Ottawa Health Science Network Research Ethics Board (OHSN-REB) Protocol #20200309-01H.

Informed consent

The requirement for informed consent was waived by the Ottawa Health Science Network Research Ethics Board (OHSN-REB).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gunter, D., Puac-Polanco, P., Miguel, O. et al. Rule-based natural language processing for automation of stroke data extraction: a validation study. Neuroradiology 64, 2357–2362 (2022). https://doi.org/10.1007/s00234-022-03029-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00234-022-03029-1

Keywords

Navigation