Deep Learning Body Region Classification of MRI and CT Examinations

Raffy, Philippe; Pambrun, Jean-François; Kumar, Ashish; Dubois, David; Patti, Jay Waldron; Cairns, Robyn Alexandra; Young, Ryan

doi:10.1007/s10278-022-00767-9

Deep Learning Body Region Classification of MRI and CT Examinations

Original Paper
Open access
Published: 09 March 2023

Volume 36, pages 1291–1301, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Digital Imaging Aims and scope Submit manuscript

Deep Learning Body Region Classification of MRI and CT Examinations

Download PDF

Philippe Raffy²^nAff1,
Jean-François Pambrun²,
Ashish Kumar²^nAff3,
David Dubois ORCID: orcid.org/0000-0001-6629-9605²,
Jay Waldron Patti⁴,
Robyn Alexandra Cairns⁵ &
…
Ryan Young²^nAff6

2394 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

This study demonstrates the high performance of deep learning in identification of body regions covering the entire human body from magnetic resonance (MR) and computed tomography (CT) axial images across diverse acquisition protocols and modality manufacturers. Pixel-based analysis of anatomy contained in image sets can provide accurate anatomic labeling. For this purpose, a convolutional neural network (CNN)–based classifier was developed to identify body regions in CT and MRI studies. Seventeen CT (18 MRI) body regions covering the entire human body were defined for the classification task. Three retrospective datasets were built for the AI model training, validation, and testing, with a balanced distribution of studies per body region. The test datasets originated from a different healthcare network than the train and validation datasets. Sensitivity and specificity of the classifier was evaluated for patient age, patient sex, institution, scanner manufacturer, contrast, slice thickness, MRI sequence, and CT kernel. The data included a retrospective cohort of 2891 anonymized CT cases (training, 1804 studies; validation, 602 studies; test, 485 studies) and 3339 anonymized MRI cases (training, 1911 studies; validation, 636 studies; test, 792 studies). Twenty-seven institutions from primary care hospitals, community hospitals, and imaging centers contributed to the test datasets. The data included cases of all sexes in equal proportions and subjects aged from 18 years old to + 90 years old. Image-level weighted sensitivity of 92.5% (92.1–92.8) for CT and 92.3% (92.0–92.5) for MRI and weighted specificity of 99.4% (99.4–99.5) for CT and 99.2% (99.1–99.2) for MRI were achieved. Deep learning models can classify CT and MR images by body region including lower and upper extremities with high accuracy.

Development and validation of an ensemble artificial intelligence model for comprehensive imaging quality check to classify body parts and contrast enhancement

Article Open access 13 May 2022

Anatomical region identification in medical X-ray computed tomography (CT) scans: development and comparison of alternative data analysis and vision-based methods

Article 06 May 2020

Machine Friendly Machine Learning: Interpretation of Computed Tomography Without Image Reconstruction

Article Open access 29 October 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Accurate anatomic region labeling of medical images is required for classification of body parts included in medical imaging studies. Body part study labels contain key information used to search, sort, transfer, and display medical imaging datasets across clinical and research healthcare systems [1]. Unfortunately, with the increase in multisystem imaging techniques and consolidation or sharing of Picture Archiving and Communication System (PACS) datasets, currently implemented body part image labeling methods can fall short resulting in incomplete selection and presentation of important relevant imaging studies in clinical viewers. Additionally, the increased demand for automated image-based post-processing workflows, automated selection of studies for clinical AI analysis, and automated anatomical-based study selection for development of AI research datasets has accelerated the need for improved efficiency and reliability of anatomical image labeling techniques.

Ideally, labeling of cross-sectional medical images should accurately reflect the anatomy contained in the individual image and identify all body regions included in a study. Currently, for MR and CT, applied body region labels at the image, series, and study level are often limited to one predominant body region (e.g., chest or abdomen) and do not indicate other body regions included in the scan or do not define a body region (e.g., PET CT or whole-body MR). Furthermore, the lack of standardization of anatomic labels between institutions and human data entry errors both contribute to unreliable anatomy-based labeling of imaging studies. These labeling limitations can adversely affect imaging workflows. They have the potential to adversely affect image interpretation if they result in automated hanging protocols failing to display all information relevant to accurate image interpretation or fail to correctly select data for automated post-processing, including clinical AI workflows. The limitations result in the use of manual search strategies for procurement of anatomically based dataset for AI research, which are prohibitive to rapid developments.

We describe two pixel-based models to automatically identify 17 body regions in CT (CT model) and 18 body regions in MRI studies (MRI model). Our approach improves on some of the limitations of previous attempts to tackle this classification problem using supervised and unsupervised deep learning techniques. Previous publications have shown accuracy results ranging from 72 to 92% [2,10], 2.5% of the labeled data was reviewed with an equal number of studies assigned for all body regions.

Table 1 Anatomical landmarks for all 18 body region classes

Full size table

Data Partitions

Test and validation datasets were stratified on patient ID so that one patient could not be present in both datasets. Labeled datasets were organized according to the main body region and sorted according to study size. A 75/25 split between the training and validation patient datasets was performed for each body region. To estimate the size of the test datasets, a strict survey study sampling model [11] was used with the assumptions of a model at least 90% accurate and a 95% confidence interval with a 10% relative error. Based on this model, it was determined that at least 7600 images per body region for CT and at least 3600 images per body region for MRI were needed.

Model

The classification task is composed of multiple stages that are detailed in Fig. 2. As a first step, we used the standard ResNet50V2 model [12] in a multi-class framework. Following the 2D CNN classifier, a few post-processing steps at the series level were applied. First, a rule engine merged the abdomen-chest class to both the abdomen and chest class and classified an entire series as breast if at least 50% of the images in the series were classified as such. Last, a smoothing step was applied to remove labels inconsistent with those in the immediate vicinity, increasing the consistency of the labels and decreasing noise.

Hardware and Framework

The ML experimentations took place in a containerized cloud environment using TensorFlow 2.3.0. The docker images were built with GPU support. Argo was used to manage the execution of the data ingestion, training, evaluation, and reporting workflows, while MLflow was used to manage the experimental results and generated artifacts such as the model checkpoints, reports, and figures. An 8 CORE CPU, 30 GB RAM, NVIDIA V100 GPU cloud instance was used for training both models. An 8 CORE CPU, 30 GB RAM, NVIDIA T4 GPU cloud instance was used for running the TensorFlow Serving inference engine for both models. All the pre-processing and post-processing steps were written in Python, while the ingestion control, results aggregation, and dispatch were implemented using node.js.

Training

We enriched our dataset by applying spatial deformations to a random set of images in each training epoch. These transforms include rotation within an angle of ± π/10, translation, and shear with a maximum of 10% in image size in both directions, scaling with a maximum of 20% in image size in both directions and bilinear interpolation. The transformations were applied using built in TensorFlow library functions. We used the transfer learning approach and model weights from pretrained Resnet50V2 model developed for vision benchmark ImageNet dataset. The training hyperparameters are listed in Supplemental Materials – Training Parameters. The loss function used is categorical cross entropy which is well-suited for the multi-class case. We trained each model with all available slices in each series.

Evaluation

We applied the models to the test datasets and evaluated them by computing the average weighted values of the following performance metrics: F1 score, sensitivity, and specificity. Details are provided in the supplemental materials as to the choice of metrics. The choice is also based on information found in [13]. Results were derived by body region, institution, patient demographics, and acquisition parameters (manufacturer, contrast, CT kernel, slice thickness, sequence type). Performance metrics and their corresponding confidence intervals were determined using the spatial aware bootstrap resampling method [14]. The image sampling procedure ensured that no image slices were closer than 10 mm based on slice position and slice thickness information. This is consistent with the 7.5-mm sampling approach reported in [15]. This spatial aware random sampling was performed at the series level to reduce the impact of strongly correlated images and provide more realistic statistical results. To reduce the inter-series correlation, one series per study (randomly selected at each sampling iteration) in the subsampled dataset was kept. The correlation and correlation significance between the model’s accuracy and each confounding factor was assessed using Cramer’s V and Pearson's chi-squared statistical test.

Results

Data

The data consisted of 2891 CT cases (training, 1804 studies; validation, 602 studies; test, 485 studies) and 3339 MRI cases (training, 1911 studies; validation, 636 studies; test, 792 studies). Flowcharts in the Supplemental Materials Inclusion and Exclusion Criteria (Figs. 1–4) show the distribution of images after the different stages of series and image exclusion criteria. The evaluation of the ground truth revealed a total of 4 labeling errors out of 1455 CT and MRI-labeled studies, which represents an error rate of 0.3%.

Distributions of images and results by confounding factors for the test sets can be found in Tables 2 and 3. Twenty-seven institutions contributed to each CT and MRI test dataset. For CT, 56% of datasets came from primary care hospitals and 44% from critical access hospitals and imaging centers, while for MRI, 55% of the datasets came from primary care hospitals. Sex parity was respected for the CT dataset. A slight over-representation of female sex was noticed for the MRI dataset (56.1%). The age coverage ranged from 18 years old to + 90 years, roughly following the distribution of imaging tests in US Healthcare Systems [16]. Compared to the development datasets (Supplemental Materials – Distribution Development Datasets), the test datasets differed in some key areas. For CT, acquisitions mostly originated from Siemens and non-GE scanners (87.6%, + 80.1%) with a larger proportion of older adults ≥ 65 years (45.6%, + 9.3%), intermediate slice thickness (2 mm < slice thickness < 5 mm) (54.8%, + 46.3%), and non-contrast imaging (76.5%, + 9.1%). For MRI, acquisitions mostly originated from Siemens and non-GE scanner (83.5%, + 69%), a larger proportion of older adults (31.7%, + 11.7%), cases with slice thickness > 2 mm and < 5 mm (68.5%, + 20.3%), and non-contrast imaging (84.0%, + 6.4%).

Table 2 CT image performance metrics by confounding factors. n = number of studies (*series). The p-value for the median chi-square is provided to determine if a significant difference in accuracy is found for each confounding factor

Full size table

Table 3 MR image performance metrics by confounding factors. n = number of studies (*series). The p-value for the median chi-square is provided to determine if a significant difference in accuracy is found for each confounding factor. **124 studies did not have institution information. ***137 series did not have any of the preset sequence tags. Due to the small number of cases, the performance metrics and confidence interval are not reliable for “In and Out of Phase”

Full size table

Model Performance

An overall body region image-level sensitivity of 92.5% (92.1–92.8) was achieved for CT and 92.3% (92.0–95.6) for MRI. The post-processing stages contributed to about 1.1% (CT) to 1.6% (MRI) improvement in classification accuracy. Classification results by body region and confusion matrices by modality are respectively reported in Table 4 and in Figs. 3 and 4. Head and breast images have very discernible features, so they tend to be classified more accurately than other body regions such as the neck and extremities.

Table 4 CT and MR image classification sensitivity and specificity by body region. n = number of images

Full size table

No formal association was found between classification accuracy and CT institution, CT kernel, and MRI contrast. However, statistically superior classification results were noticed in a few instances with Cramer’s V correlation ranging from negligible (V < 0.05) to moderate (V = 0.17). For CT, that was the case for datasets with older (≥ 65) age (p < 0.001, V = 0.041) with contrast (p < 0.001, V = 0.042) and thick (≥ 5 mm) slice (p = < 0.001, V = 0.048). For MRI, imaging centers (p < 0.001, V = 0.064), 44 years and older (p < 0.001, V = 0.087), Philips manufacturer (p = 0.001, V = 0.076), thin slices (p < 0.001, V = 0.0838), and inversion and MRA sequences (p < 0.001, V = 0.179) exhibited better classification performance. For some of the classes in the test sets, the association between accuracy and factors such as manufacturer and MRI sequence could not be reliably assessed: Hitachi and Canon scanner manufacturers and In and Out of Phase MRI sequences. Despite these limitations, the evaluation of accuracy results and confidence intervals points to performance robustness across age, manufacturer, CT slice thickness, and MRI sequence categories.

When mining the DICOM tags in the test datasets for either the “BodyPartExamined” DICOM tag (BP) or “ProcedureType” (PT), the body region information at the study level was only 22.3% (BP) and 42.2% (PT) accurate for CT and 58.3% (BP) and 47.8% (PT) accurate for MRI. In this cohort, the anatomical AI could prove useful to improve the search for anatomically matched cases for about 50% of the cases.

Discussion

The ability to automate accurate anatomic region labeling of medical images using pixel-based AI could address clinical and research workflow challenges related to existing limitations that affect body region labeling of medical images. Our work demonstrates how a deep learning CNN-based classifier can achieve overall state-of-the-art accuracy greater than 90% in identifying body regions in CT and MR images while covering the entire human body and a large spectrum of acquisition protocols obtained from separate institutions. This is the first known attempt to (a) provide a solution that includes MR and (b) offers a solution that covers numerous body regions, in particular extremities that have been excluded in other CT studies.

Our methods achieved an overall body region image-level sensitivity of 92.5% which were similar to other publications restricted to CT image classification [2,

References

Towbin AJ, Roth CJ, Petersilge CA, Garriott K, Buckwalter KA, Clunie DA: The importance of body part labeling to enable enterprise imaging: A HIMSS - SIIM enterprise imaging community collaborative white paper. J Digit Imaging 34:1-15, 2021. https://doi.org/10.1007/s10278-020-00415-0.
Roth HR, Lee CT, Shin HC, Seff A, Kim L, Yao J, Summers RM: Anatomy-specific classification of medical images using deep convolutional nets. Proc IEEE International Symposium on Biomedical Imaging, 2015. https://doi.org/10.1109/ISBI.2015.7163826.
Zhennan Y, Yiqiang Z, Zhigang P, Shu L, Shinagawa Y, Shaoting Z, Metaxas DN, **ang Sean Z: Multi-instance deep learning: discover discriminative local anatomies for bodypart recognition. IEEE Trans Med Imaging 35:1332-1343, 2016. https://doi.org/10.1109/TMI.2016.2524985.
Zhang P, Wang F, Zheng Y: Self-supervised deep representation learning for fine-grained body part recognition. Proc IEEE International Symposium on Biomedical Imaging, 2017. https://doi.org/10.1109/ISBI.2017.7950587.
Sugimori H: Classification of computed tomography images in different slice positions using deep learning. J Healthc Eng, 2018. https://doi.org/10.1155/2018/1753480.
Yan K, Lu L, Summers RM: Unsupervised body part regression via spatially self-ordering convolutional neural networks. Proc IEEE International Symposium on Biomedical Imaging, 2018. https://doi.org/10.1109/ISBI.2018.8363745.
TCIA. Submission and de-identification overview. Available at https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview. Updated 2020. Accessed August 2022.
Wang K, Zhang D, Wu, Li Y, Zhang R, Lin L: Cost-effective active learning for deep image classification. IEEE Trans Circuits Syst Video Technol 27(12):2591–2600, 2016. https://doi.org/10.1109/TCSVT.2016.2589879.
Article Google Scholar
Budd S, Robinson EC, Kainz B: A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med Image Anal 71:102062, 2021. https://doi.org/10.1016/j.media.2021.102062.
Mahgerefteh S, Kruskal JB, Yam CS, Blachar A, Sosna J: Peer review in diagnostic radiology: current state and a vision for the future. Radiographics 29:1221–1231. 2009. https://doi.org/10.1148/rg.295095086.
Turner AG: Expert group meeting to review the draft handbook on designing of household sample surveys: sampling strategies (draft). November 2003. Available at https://unstats.un.org/unsd/demographic/meetings/egm/sampling_1203/docs/no_2.pdf. Accessed August 2022.
He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016. https://doi.org/10.1109/CVPR.2016.90.
Towards Data Science. Available at https://towardsdatascience.com/multi-class-metrics-made-simple-part-i-precision-and-recall-9250280bddc2. Accessed 18 August 2022.
Efron B: Bootstrap methods: another look at the jackknife. Annals Statistics 7(1):1-26, 1979. https://doi.org/10.1214/aos/1176344552.
Leuschner J, Schmidt M, Baguer DO, Maas P: The LoDoPaB-CT dataset: a benchmark dataset for low-dose CT reconstruction methods. Sc Data 8, 109, 2021. https://doi.org/10.1038/s41597-021-00893-z.
Smith-Bindman R, Kwan ML, Marlow EC, et al: Trends in use of medical imaging in US health care systems and in Ontario, Canada, 2000–2016. J Am Med Assoc JAMA 322(9):843–856, 2019. https://doi.org/10.1001/jama.2019.11456.
Elahi A, Reid D, Redfern RO, Kahn CE, Cook TS: Automating import and reconciliation of outside examinations submitted to an academic radiology department. J Digit Imaging 33(2):355–360, 2020. https://doi.org/10.1007/s10278-019-00291-3.

Download references

Author information

Philippe Raffy
Present address: Clairity, Austin, TX, USA
Ashish Kumar
Present address: , Accenture, San Francisco, CA, USA
Ryan Young
Present address: Allen Institute for AI, Seattle, WA, USA

Authors and Affiliations

Enterprise Imaging Solutions, Change Healthcare, 10711 Cambie Road, Richmond, BC, V6X 3G5, Canada
Philippe Raffy, Jean-François Pambrun, Ashish Kumar, David Dubois & Ryan Young
Mecklenburg Radiology Associates, Charlotte, NC, USA
Jay Waldron Patti
University of British Columbia, Vancouver, BC, Canada
Robyn Alexandra Cairns

Authors

Philippe Raffy
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Pambrun
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kumar
View author publications
You can also search for this author in PubMed Google Scholar
David Dubois
View author publications
You can also search for this author in PubMed Google Scholar
Jay Waldron Patti
View author publications
You can also search for this author in PubMed Google Scholar
Robyn Alexandra Cairns
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Young
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Philippe Raffy, Jean-François Pambrun, Ashish Kumar, and David Dubois contributed to the study conception, design, material preparation, and data collection. Analysis was performed by Philippe Raffy, David Dubois, and Ryan Young. The first draft of the manuscript was written by Philippe Raffy, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to David Dubois.

Ethics declarations

Ethical Approval

This retrospective research study was conducted using de-identified data from clinical partners. Based on the nature of the study, official IRB waivers of ethical approval were granted from our healthcare partners.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Key Points

• An off-the-shelf deep learning model can achieve the state-of-the-art anatomic classification results of over 90% sensitivity on CT and MRI sets completely disjoint of training sets.

• Classification results cover the entire human body, in particular extremities that have been excluded in previous CT studies.

• Image-based analysis has the potential to provide accurate metadata about the image composition of a given CT or MRI study.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1449 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Raffy, P., Pambrun, JF., Kumar, A. et al. Deep Learning Body Region Classification of MRI and CT Examinations. J Digit Imaging 36, 1291–1301 (2023). https://doi.org/10.1007/s10278-022-00767-9

Download citation

Received: 09 February 2022
Revised: 29 August 2022
Accepted: 21 December 2022
Published: 09 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10278-022-00767-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep Learning Body Region Classification of MRI and CT Examinations

Abstract

Similar content being viewed by others

Development and validation of an ensemble artificial intelligence model for comprehensive imaging quality check to classify body parts and contrast enhancement

Anatomical region identification in medical X-ray computed tomography (CT) scans: development and comparison of alternative data analysis and vision-based methods

Machine Friendly Machine Learning: Interpretation of Computed Tomography Without Image Reconstruction