Rule-based natural language processing for automation of stroke data extraction: a validation study

Gunter, Dane; Puac-Polanco, Paulo; Miguel, Olivier; Thornhill, Rebecca E.; Yu, Amy Y. X.; Liu, Zhongyu A.; Mamdani, Muhammad; Pou-Prom, Chloe; Aviv, Richard I.

doi:10.1007/s00234-022-03029-1

Rule-based natural language processing for automation of stroke data extraction: a validation study

Functional Neuroradiology
Published: 01 August 2022

Volume 64, pages 2357–2362, (2022)
Cite this article

Neuroradiology Aims and scope Submit manuscript

Dane Gunter¹,
Paulo Puac-Polanco²,
Olivier Miguel²,
Rebecca E. Thornhill³,
Amy Y. X. Yu⁴,
Zhongyu A. Liu⁴,
Muhammad Mamdani⁵,
Chloe Pou-Prom⁶ &
…
Richard I. Aviv^1,2

626 Accesses
2 Altmetric
Explore all metrics

Abstract

Purpose

Data extraction from radiology free-text reports is time consuming when performed manually. Recently, more automated extraction methods using natural language processing (NLP) are proposed. A previously developed rule-based NLP algorithm showed promise in its ability to extract stroke-related data from radiology reports. We aimed to externally validate the accuracy of CHARTextract, a rule-based NLP algorithm, to extract stroke-related data from free-text radiology reports.

Methods

Free-text reports of CT angiography (CTA) and perfusion (CTP) studies of consecutive patients with acute ischemic stroke admitted to a regional stroke center for endovascular thrombectomy were analyzed from January 2015 to 2021. Stroke-related variables were manually extracted as reference standard from clinical reports, including proximal and distal anterior circulation occlusion, posterior circulation occlusion, presence of ischemia or hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. These variables were simultaneously extracted using a rule-based NLP algorithm. The NLP algorithm’s accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were assessed.

Results

The NLP algorithm’s accuracy was > 90% for identifying distal anterior occlusion, posterior circulation occlusion, hemorrhage, and ASPECTS. Accuracy was 85%, 74%, and 79% for proximal anterior circulation occlusion, presence of ischemia, and collateral status respectively.

The algorithm confirmed the absence of variables from radiology reports with an 87–100% accuracy.

Conclusions

Rule-based NLP has a moderate to good performance for stroke-related data extraction from free-text imaging reports. The algorithm’s accuracy was affected by inconsistent report styles and lexicon among reporting radiologists.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke

Article 09 May 2022

Automated abstraction of myocardial perfusion imaging reports using natural language processing

Article 05 November 2020

Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings

Article Open access 07 August 2014

Abbreviations

ASPECTS:: Alberta stroke program early CT score
CTA:: CT angiography
CTP:: CT perfusion
ML:: Machine learning
NLP:: Natural language processing

References

Yu AY, Holodinsky JK, Zerna C, Svenson LW, Jetté N, Quan H, Hill MD (2016) Use and utility of administrative health data for stroke research and surveillance. Stroke 47(7):1946–1952. https://doi.org/10.1161/STROKEAHA.116.012390
Article PubMed Google Scholar
Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G (2000) Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10. https://doi.org/10.1006/cbmr.1999.1535
Article CAS PubMed Google Scholar
Pons E, Braun LM, Hunink MG, Kors JA (2016) Natural language processing in radiology: a systematic review. Radiology 279(2):329–343. https://doi.org/10.1148/radiol.16142770
Article PubMed Google Scholar
Garg R, Oh E, Naidech A, Kording K, Prabhakaran S (2019) Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis 28(7):2045–2051. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004
Article PubMed Google Scholar
Waqas M, Rai AT, Vakharia K, Chin F, Siddiqui AH (2020) Effect of definition and methods on estimates of prevalence of large vessel occlusion in acute ischemic stroke: a systematic review and meta-analysis. J Neurointerv Surg 12(3):260–265. https://doi.org/10.1136/neurintsurg-2019-015172
Article PubMed Google Scholar
Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W et al (2021) A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 21(1):179. https://doi.org/10.1186/s12911-021-01533-7
Article PubMed PubMed Central Google Scholar
Yu AYX, Liu ZA, Pou-Prom C, Lopes K, Kapral MK, Aviv RI, Mamdani M (2021) Automating Stroke Data extraction from free-text radiology reports using natural language processing: instrument validation study. JMIR Med Inform 9(5):e24381. https://doi.org/10.2196/24381
Article PubMed PubMed Central Google Scholar
Ong CJ, Orfanoudaki A, Zhang R, Caprasse FPM, Hutch M, Ma L, Fard D, Balogun O, Miller MI, Minnig M et al (2020) Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS One 15(6):e0234908. https://doi.org/10.1371/journal.pone.0234908
Article CAS PubMed PubMed Central Google Scholar
Li MD, Lang M, Deng F, Chang K, Buch K, Rincon S, Mehan WA, Leslie-Mazwi TM, Kalpathy-Cramer J (2021) Analysis of stroke detection during the COVID-19 pandemic using natural language processing of radiology reports. AJNR Am J Neuroradiol 42(3):429–434. https://doi.org/10.3174/ajnr.A6961
Article CAS PubMed PubMed Central Google Scholar
Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S (2021) Practical guide to natural language processing for radiology. Radiographics 41(5):1446–1453. https://doi.org/10.1148/rg.2021200113
Article PubMed Google Scholar
Zech J, Pain M, Titano J, Badgeley M, Schefflein J, Su A, Costa A, Bederson J, Lehar J, Oermann EK (2018) Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287(2):570–580. https://doi.org/10.1148/radiol.2018171093
Article PubMed Google Scholar
Kim C, Zhu V, Obeid J, Lenert L (2019) Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS One 14(2):e0212778. https://doi.org/10.1371/journal.pone.0212778
Article CAS PubMed PubMed Central Google Scholar
Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z (2018) NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. AMIA Jt Summits Transl Sci Proc 2017:188–196
PubMed Google Scholar
Sykes D, Grivas A, Grover C, Tobin R, Sudlow C, Whiteley W, Mcintosh A, Whalley H, Alex B (2021) Comparison of rule-based and neural network models for negation detection in radiology reports. Nat Lang Eng 27(2):203–224. https://doi.org/10.1017/S1351324920000509
Article Google Scholar
Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley H, Whiteley W, Alex B (2019) Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. CoRR [Internet]. https://arxiv.org/abs/1903.03985. Accessed 20 Feb 2022
Davidson EM, Poon MTC, Casey A, Grivas A, Duma D, Dong H, Suárez-Paniagua V, Grover C, Tobin R, Whalley H et al (2021) The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Med Imaging 21(1):142. https://doi.org/10.1186/s12880-021-00671-8
Article PubMed PubMed Central Google Scholar
Houssein EH, Mohamed RE, Ali AA (2021) Machine learning techniques for biomedical natural language processing: a comprehensive review. IEEE Access 9:140628–140653. https://doi.org/10.1109/ACCESS.2021.3119621
Article Google Scholar
CHARTextract – Li Ka Shing Centre for Healthcare Analytics Research & Training (LKS-CHART). 2019. https://lks-chart.github.io/CHARTextract-docs/. Accessed 20 Feb 2022
Wintermark M, Albers GW, Broderick JP, Demchuk AM, Fiebach JB, Fiehler J, Grotta JC, Houser G, Jovin TG, Lees KR et al (2013) Stroke Imaging Research (STIR) and Virtual International Stroke Trials Archive (VISTA)-Imaging Investigators. Acute Stroke Imaging Research Roadmap II. Stroke 44(9):2628–39. https://doi.org/10.1161/STROKEAHA.113.002015
Article PubMed PubMed Central Google Scholar
Zaidat OO, Yoo AJ, Khatri P, Tomsick TA, von Kummer R, Saver JL, Marks MP, Prabhakaran S, Kallmes DF, Fitzsimmons BF et al (2013) Cerebral Angiographic Revascularization Grading (CARG) Collaborators; STIR Revascularization working group; STIR Thrombolysis in Cerebral Infarction (TICI) Task Force. Recommendations on angiographic revascularization grading standards for acute ischemic stroke: a consensus statement. Stroke 44(9):2650–63. https://doi.org/10.1161/STROKEAHA.113.001972
Article PubMed PubMed Central Google Scholar
Shijie Wu, Mark Dredze. Beto, Bentz, Becas (2019) The surprising cross-lingual effectiveness of BERT. In Kentaro Inui, **g Jiang, Vincent Ng, **aojun Wan 0001, editors, Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Association for Computational Linguistics, Hong Kong, China, pp. 833–844. https://doi.org/10.18653/v1/D19-1077
Peng Y, Yan K, Sandfort V, Summers RM, Lu Z (2019) A self-attention based deep learning method for lesion attribute detection from CT reports. Int Conf Healthcare Info (ICHI). 1–5. IEEE. https://doi.org/10.1109/ICHI.2019.8904668

Download references

Funding

Paulo Puac-Polanco was supported by a University of Ottawa Brain and Mind Research Institute Trainee Award. The remaining authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

The Ottawa Hospital Research Institute, Ottawa, ON, Canada
Dane Gunter & Richard I. Aviv
Department of Radiology, Radiation Oncology and Medical Physics, University of Ottawa, The Ottawa Hospital Civic Campus Room C110, 1053 Carling Ave, Ottawa, ON, ON K1Y 4E9, Canada
Paulo Puac-Polanco, Olivier Miguel & Richard I. Aviv
Division of Medical Physics, Department of Radiology, Radiation Oncology and Medical Physics, University of Ottawa, Ottawa, ON, Canada
Rebecca E. Thornhill
Department of Medicine (Neurology), University of Toronto, Sunnybrook Health Sciences Centre, Toronto, ON, Canada
Amy Y. X. Yu & Zhongyu A. Liu
Department of Medicine, Unity Health Toronto, University of Toronto, Toronto, ON, Canada
Muhammad Mamdani
Unity Health Toronto, Toronto, ON, Canada
Chloe Pou-Prom

Authors

Dane Gunter
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Puac-Polanco
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Miguel
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca E. Thornhill
View author publications
You can also search for this author in PubMed Google Scholar
Amy Y. X. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyu A. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Mamdani
View author publications
You can also search for this author in PubMed Google Scholar
Chloe Pou-Prom
View author publications
You can also search for this author in PubMed Google Scholar
Richard I. Aviv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard I. Aviv.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

This retrospective chart review study involving human participants was in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Study was Ethics approved by the Ottawa Health Science Network Research Ethics Board (OHSN-REB) Protocol #20200309-01H.

Informed consent

The requirement for informed consent was waived by the Ottawa Health Science Network Research Ethics Board (OHSN-REB).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gunter, D., Puac-Polanco, P., Miguel, O. et al. Rule-based natural language processing for automation of stroke data extraction: a validation study. Neuroradiology 64, 2357–2362 (2022). https://doi.org/10.1007/s00234-022-03029-1

Download citation

Received: 20 June 2022
Accepted: 25 July 2022
Published: 01 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00234-022-03029-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rule-based natural language processing for automation of stroke data extraction: a validation study