Classification of chest radiographs using general purpose cloud-based automated machine learning: pilot study

Ghosh, Tamaghna; Tanwar, Simran; Chumber, Shishir; Vani, Kavita

doi:10.1186/s43055-021-00499-w

Classification of chest radiographs using general purpose cloud-based automated machine learning: pilot study

Research
Open access
Published: 05 May 2021

Volume 52, article number 120, (2021)
Cite this article

Download PDF

You have full access to this open access article

Egyptian Journal of Radiology and Nuclear Medicine Submit manuscript

Classification of chest radiographs using general purpose cloud-based automated machine learning: pilot study

Download PDF

Tamaghna Ghosh¹,
Simran Tanwar¹,
Shishir Chumber² &
…
Kavita Vani ORCID: orcid.org/0000-0003-1673-8686¹

1666 Accesses
5 Citations
Explore all metrics

Abstract

Background

Widespread implementation of machine learning models in diagnostic imaging is restricted by dearth of expertise and resources. General purpose automated machine learning offers a possible solution.

This study aims to provide a proof of concept that a general purpose automated machine learning platform can be utilized to train a CNN to classify chest radiographs.

In a retrospective study, more than 2000 postero-anterior chest radiographs were assessed for quality, contrast, position, and pathology. A selected dataset of 637 radiographs were used to train a CNN using reinforcement learning based automated machine learning platform. Accuracy metrics of each label was calculated and model performance was compared to previous studies.

Results

The auPRC (area under precision-recall curve) was 0.616. The model achieved precision of 70.8% and recall of 60.7% (P > 0.05) for detection of “Normal” radiographs. Detection of “Pathology” by the model had a precision of 75.6% and recall of 75.6% (P > 0.05). The F1 scores were 0.65 and 0.75 respectively.

Conclusion

Automated machine learning platforms may provide viable alternatives to develo** custom CNN models for classification of chest radiographs. However, the accuracy achieved is lower than a comparable traditionally developed neural network model.

Machine Learning in Automated Chest Radiographs Classification

Automated semantic labeling of pediatric musculoskeletal radiographs using deep learning

Article 30 April 2019

High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks

Article Open access 11 October 2016

Background

Current scenario

Until recently, the approach to develop a CAD system to extract meaningful features and infer a diagnosis was based heavily on Rule Based algorithms [

Methods

Dataset creation

A pool of over 2000 postero-anterior view chest radiographs from the out-patient and in-patient department acquired on different computed radiography and digital radiography systems were assessed for quality, level of penetration, positioning, and contrast. Those with very poor quality, low contrast, and unsatisfactory positioning were rejected. However, chest radiographs with minor imperfections in breath-holding, positioning or contrast, deemed reportable by the radiologist were included. Chest radiographs with clothing, jewelry, and implantable medical devices artifacts were included to mirror real-world variations.

Image processing

The resultant dataset of 637 images were then converted from proprietary file types into Joint Photographic Expert Group file type with a 1024 × 1024 matrix size with 96 dpi vertical and horizontal resolution encoded using baseline DCT Huffman coding. The bit depth (bits per sampling) was set at 8bits with a Chroma subsampling Y’CbCr=4:2:0. The dataset was de-identified and was compliant with the Health Insurance Portability and Accountability Act. No data augmentation procedures were performed.

Model implementation

The dataset was uploaded onto Google Cloud Platform (Google LLC, Menlo Park, CA, USA) and processed using Cloud AutoML Vision Beta (release date: July 24, 2018). Multiple labels were created for classifying different pathologies and image characteristics (Fig. 1). Each image was labeled with one or more labels using Vision UI running on Chrome v68.0.3440 (Google LLC, Menlo Park, CA, USA). The dataset was randomly subdivided with 80% of the images under each category allocated to the training set and 10% each to the validation and testing sets. The system used had Intel Core i3 – 4005U 1.70 GHz chipset (Intel, Santa Clara, CA, USA), 4.00 Gb RAM, 512 Gb hard disk space, integrated Intel HD Graphics 4400 (Intel, Santa Clara, CA, USA) with Windows 7 Ultimate Operating System (Microsoft Corporation, Redmond, WA, USA).

Statistical analysis was performed with the in-built metrics projection in Vision API, Google Sheets (Google LLC, Menlo Park, CA, USA) and MedCalc (MedCalc Software Ltd, Ostend, Belgium).

Results

Dataset characteristics

The dataset contained 637 postero-anterior view chest radiographs of which 332 had some pathology (52.1%). The dataset had a mild male predominance (57.8%) with an average age of 26.5 years. Each image assessed subjectively for quality and marked either satisfactory or poor. 82.1% of the images were of satisfactory quality but the dataset also contained 17.9% radiographs which were poor in quality but still deemed reportable by the radiologist. 47.6% of the dataset contained some form of artifact from clothing, jewelry, or implantable devices like pacemakers. The images were also assessed for positioning of the subject and revealed 25.9% to have some degree of rotation—which could lead to certain artifactual findings such as apparent cardiomegaly and prominence of the hila. Forty-three of the 637 radiographs were found to have been acquired in mid-inspiration. These imperfect images were introduced into the dataset to reduce overfitting of the model to the training set and improve its real world applicability.

The images with pathology were sub-classified and labeled into 9 different categories (Fig. 2). The pathologies were also assessed for subjective conspicuity. Each lung field was divided into three lung zone: upper, middle, and lower. A pathology occupying more than or equal to half of a zone was deemed “Apparent.” If the pathology occupied less than half but more than 25% of the lung zone, it was marked as “Conspicuous.” Lesions occupying less than 25% of a lung zone were termed “Subtle.” The distributions of the lesions are shown in Fig. 2.

Accuracy metrics

The precision (positive predictive value) for all labels was 65.7% with a recall (sensitivity) of 40.1%. The auPRC (area under precision-recall curve or average precision) of the model was 0.616 (Fig. 3). The precision and recall for each category is summarized in Table 1. The F1 Score for classification was 0.65 for “Normal” category and 0.75 for “Pathology” category. Further evaluation statistics for both categories are summarized in Tables 2 and 3, respectively.

Table 1 Accuracy metrics

Full size table

Table 2 Evaluation of model performance in detecting “Normal” chest radiographs

Full size table

Table 3 Evaluation of model performance in detecting “Pathology” chest radiographs

Full size table

Discussion

Unmet needs

While there has been considerable interest in the application of convolutional neural networks and other forms of machine learning for classification of chest radiographs into various pathologies, the underlying technology utilized in all these studies remain exclusionary [5, 24,25,26]. These studies either constructed and trained machine learning models de novo or worked with pre-trained CNNs like AlexNet and GoogLeNet [3, 27]. Though these methods yielded high accuracy models which could classify chest pathologies, they were built on systems which required high level of expertise as well as prohibitively costly infrastructure. This has led to a data-algorithm divide. The predictive accuracy of an algorithm is strictly contingent on the dataset that it is trained on (Fig. 4). But a large number of institutions in resource-limited settings may not have access to machine learning technology but do have access to large volumes of data.

Proposed solution

In this study, we tried to explore the possibility of repurposing general purpose automated machine learning models to classify diagnostic images, in particular chest radiographs. The platform used was Cloud AutoML Vision, which circumvents the challenges of requiring a large amount of time and expertise in crafting a neural network by using reinforcement learning [23]. The “controller” recurrent network creates variable length strings. These strings act as templates for development of “child” convolutional neural networks. These “child” networks are trained on the dataset and subsequently evaluated for accuracy. The accuracy metric is used as a positive reinforcement for the “controller” network. Thus in the subsequent iteration, the “child” networks with higher accuracy are favored. This is repeated until a single best “child” network is achieved with the highest accuracy.

Model accuracy

The accuracy metric of our trained model was expectedly lower than dedicated CNNs. The model had very poor sensitivity for sub-classification of pathology. However, the overall accuracy achieved for detection of pathology in chest radiographs was 74.57%. The accuracy parameters of the model are compared with two studies conducted with comparable machine learning models in Table 4.

Table 4 Comparison of accuracy metrics

Full size table

Our model, DeepDx, was able to achieve comparable accuracy to the model used by Bar et al., even surpassing their precision rate by almost 25%. This is substantial progress, especially when viewed in the context of the highly specialized fusion model (two separate deep learning baseline descriptors used along with GIST descriptor) created by Bar et al. [10]. The comparison table also reveals that Cicero et al. in their study achieved a much higher overall accuracy, but the success could be attributed at least in part to the large dataset on which their model was trained [11].

Justification

In our model, the three categories with examples above the minimum recommended number did provide good accuracy and with targeted increase in the dataset in subsequent iterations the overall model accuracy is likely to improve further. As per documentation released with Cloud AutoML Vision (Google LLC, Menlo Park, CA, USA), which we utilized in the study, the minimum recommended examples per label is 100 and approximately 1000 examples are advised for accurate prediction. This may not always be feasible for medical imaging, as rarity is often a feature of diseases with serious implications; and the time required to accrue enough examples may impede progress. This problem is usually circumvented by data augmentation procedures. Application of techniques such as horizontal flip**, crop**, rotation, and padding on chest radiographs and their effect on model accuracy has not been investigated. It may not be prudent to shoehorn techniques, while efficacious in other image datasets, onto diagnostic images. For example, horizontal flip** of a chest radiograph may create false-positive results for detection of cardiomegaly and may in fact reduce accuracy. The model may also fail to flag cases of dextrocardia. Similarly, training machine learning models on rotated radiographs may lead to the algorithms assigning undue importance to irrelevant components of the image. Also many disease processes are defined by their orientation like cephalization of vessels in CCF, which may be lost during the augmentation process.

Accuracy of the model is also likely to gain from changing the labeling structure of the dataset. In our study, we trained the algorithm to diagnose “Normal” and “Pathology” not as a binary alternative but as distinct classification categories. This was done kee** in mind real world application, as many radiographs do not distinctly fit either into an apparently normal or disease category. Many radiographs have suspicious features which should not be classified as disease and may require consensus reporting by radiologists. Another advantage of detecting the two categories separately was that it gave us comparable statistics with a larger number of studies, as most have trained to classify either one of the categories. The downside of this labeling structure was that it added to the complexity and thus probably reduced accuracy of the model. The model can be trained, in further studies; to detect only “Pathology” and the “Normal” can be processed as a default class. The sensitivity of the “Pathology” label should be increased to commit false positives and catch the indeterminate cases rather than being labeled “Normal” (Fig. 5). This will again entail human intervention to sort through and weed out the false positive, but will improve accuracy.

Reflections

The study has highlighted certain definite advantages of using automated machine learning in develo** diagnostic classification models. The method reduces infrastructure requirements and cost to a fractional amount. The ease of use, with GUIs, also enables implementation and fine tuning without cumbersome coding languages. The reinforcement-based learning model greatly reduces the time requirement for develo** complex CNN architecture. And importantly, such platforms provide scalability to improve upon a model and add further complexity to the classifier.

Future implications

Further work needs to be done with larger datasets of diagnostic images, to ascertain the maximal overall accuracy achievable. Multiple platforms now exist providing similar tools and they should be evaluated in a controlled trial for unbiased comparison. Data augmentation procedures should also be validated for use with medical imaging, particularly radiological images. Lastly, most studies attempting to classify chest radiographs have dealt with post-processed compressed images converted to non-native file types such as JPEG and PNG [11, 12, 17, 18]. This conversion may lead to loss of important image characteristics and attempts should be made to use DICOM file types for future training of algorithms.

Conclusion

Computer vision is revolutionizing the field of diagnostic imaging. But its resource-intensive nature may preclude its wider implementation and acceptance. This study presented an alternative to the traditional machine learning infrastructure and aimed to investigate the use of commercially available general purpose cloud-based automated machine learning for detection of pathologies on standard postero-anterior chest radiographs.

The study found automated machine learning to be a viable alternative to human designed diagnostic convolutional neural networks. The accuracy of the model developed was conservative in comparison to standard deep learning models. However, restructuring of the classifiers and increasing the training dataset hold promise of achieving greater accuracy. Further multi-platform studies are required with larger datasets to fully explore its potential.

While machine learning promises of vast improvements in speed and accuracy in detection of pathologies across imaging modalities, greater research focus needs to be directed towards ensuring that this novel technology is used to bridge the health-wealth gap and not widen it.

Availability of data and materials

YES, The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

CNN:: Convolutional neural networks
auPRC:: Area under precision-recall curve
CAD:: Computer-aided diagnosis
AUC:: Area under the curve
ROC:: Receiver-operator curve
SVM:: Support vector machine
ML:: Machine learning
GPU:: Graphics processing nit
API:: Application Programming Interface
GIST:: General Information on Shape and Texture
GUI:: Graphic user interface
JPEG:: Joint Photographic Expert Group
PNG:: Portable network graphics

References

Islam MT, Aowal MA, Minhaz AT, Ashraf K (2017) Abnormality detection and localization in chest x-rays using deep convolutional neural networks. ar**v ar**v:1705.09850
Google Scholar
Dreyer KJ, Geis JR (2017) When machines think: radiology’s next frontier. Radiology 285(3):713–718. https://doi.org/10.1148/radiol.2017171183
Article PubMed Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
Litjens G, Kooi T, Bejnordi BE, Setio AA, Ciompi F, Ghafoorian M et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88. https://doi.org/10.1016/j.media.2017.07.005
Article PubMed Google Scholar
Lo SC, Lou SL, Lin JS, Freedman MT, Chien MV, Mun SK (1995) Artificial convolution neural network techniques and applications for lung nodule detection. IEEE Trans Med Imaging 14(4):711–718. https://doi.org/10.1109/42.476112
Article CAS PubMed Google Scholar
Anavi Y, Kogan I, Gelbart E, Geva O, Greenspan H (2015) A comparative study for chest radiograph image retrieval using binary texture and deep learning classification. In: Engineering in medicine and biology society (EMBC), 2015 37th annual international conference of the IEEE. IEEE, New York, pp 2940–2943
Google Scholar
Anavi Y, Kogan I, Gelbart E, Geva O, Greenspan H (2016) Visualizing and enhancing a deep learning framework using patients age and gender for chest x-ray image retrieval. In: Tourassi GD, Armato SG III (eds) Medical imaging 2016: computer-aided diagnosis. SPIE, New York, p 978510. https://doi.org/10.1117/12.2217587
Chapter Google Scholar
Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H (2015) Chest pathology detection using deep learning with non-medical training. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI). IEEE, New York, pp 294–297. https://doi.org/10.1109/ISBI.2015.7163871
Chapter Google Scholar
Bar Y, Diamant I, Wolf L, Greenspan H (2015) Deep learning with non-medical training used for chest pathology identification. In: Medical imaging 2015: computer-aided diagnosis. SPIE, Orlando, p 94140. https://doi.org/10.1117/12.2083124
Chapter Google Scholar
Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K, Barfett J (2017) Training and validating a deep convolutional neural network for computer-aided detection and classification of abnormalities on frontal chest radiographs. Investig Radiol 52(5):281–287. https://doi.org/10.1097/RLI.0000000000000341
Article Google Scholar
Hwang S, Kim HE, Jeong J, Kim HJ (2016) A novel approach for tuberculosis screening based on deep convolutional neural networks. In: Tourassi GD, Armato SG III (eds) Medical imaging 2016: computer-aided diagnosis. SPIE, New York, p 97852. https://doi.org/10.1117/12.2216198
Chapter Google Scholar
Wang C, Elazab A, Wu J, Hu Q (2017) Lung nodule classification using deep feature fusion in chest radiography. Comput Med Imaging Graph 57:10–18. https://doi.org/10.1016/j.compmedimag.2016.11.004
Article PubMed Google Scholar
Shin HC, Roberts K, Lu L, Demner-Fushman D, Yao J, Summers RM (2016) Learning to read chest x-rays: recurrent neural cascade model for automated image annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, pp 2497–2506. https://doi.org/10.1109/CVPR.2016.274
Chapter Google Scholar
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2016) Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, pp 3462–3471. https://doi.org/10.1109/CVPR.2017.369
Chapter Google Scholar
Yao L, Poblenz E, Dagunts D, Covington B, Bernard D, Lyman K (2017) Learning to diagnose from scratch by exploiting dependencies among labels. ar**v ar**v:1710.10501
Google Scholar
Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T et al (2017) Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. ar**v ar**v:1711.05225
Google Scholar
Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2):574–582. https://doi.org/10.1148/radiol.2017162326
Article PubMed Google Scholar
Kim HE, Hwang S (2016) Deconvolutional feature stacking for weakly-supervised semantic segmentation. ar**v ar**v:1602.04984
Google Scholar
Rajkomar A, Lingam S, Taylor AG, Blum M, Mongan J (2017) High-throughput classification of radiographs using deep convolutional neural networks. J Digit Imaging 30(1):95–101. https://doi.org/10.1007/s10278-016-9914-9
Article PubMed Google Scholar
Yang W, Chen Y, Liu Y, Zhong L, Qin G, Lu Z et al (2017) Cascade of multi-scale convolutional neural networks for bone suppression of chest radiographs in gradient domain. Med Image Anal 35:421–433. https://doi.org/10.1016/j.media.2016.08.004
Article PubMed Google Scholar
Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J et al (2017) Large-scale evolution of image classifiers. In: Proceedings of the 34th international conference on machine learning-volume 70, pp 2902–2911
Google Scholar
Bello I, Zoph B, Vasudevan V, Le QV (2017) Neural optimizer search with reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70, pp 459–468
Google Scholar
Sim Y, Chung MJ, Kotter E, Yune S, Kim M, Do S, Han K, Kim H, Yang S, Lee DJ, Choi BW (2020) Deep convolutional neural network–based software improves radiologist detection of malignant lung nodules on chest radiographs. Radiology 294(1):199–209. https://doi.org/10.1148/radiol.2019182465
Article PubMed Google Scholar
Park S, Lee SM, Kim N, Choe J, Cho Y, Do KH et al (2019) Application of deep learning–based computer-aided detection system: detecting pneumothorax on chest radiograph after biopsy. Eur Radiol 29(10):5341–5348. https://doi.org/10.1007/s00330-019-06130-x
Article PubMed Google Scholar
Dunnmon JA, Yi D, Langlotz CP, Ré C, Rubin DL, Lungren MP (2019) Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology 290(2):537–544. https://doi.org/10.1148/radiol.2018181422
Article PubMed Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Boston, pp 1–9
Google Scholar

Download references

Acknowledgements

We thank Google LLC for granting us permission to utilize their platform. We also extend our deepest gratitude to the staff at the conventional radiography division of Dept. of Radiodiagnosis, Dr. Ram Manohar Lohia Hospital, Delhi, for their support.

Funding

None

Author information

Authors and Affiliations

Department of Radiodiagnosis, Dr. Ram Manohar Lohia Hospital, Delhi, India
Tamaghna Ghosh, Simran Tanwar & Kavita Vani
Department of Neurology, Dr. Ram Manohar Lohia Hospital, Delhi, India
Shishir Chumber

Authors

Tamaghna Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Simran Tanwar
View author publications
You can also search for this author in PubMed Google Scholar
Shishir Chumber
View author publications
You can also search for this author in PubMed Google Scholar
Kavita Vani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TG, ST, and SC collected and analyzed the clinical data. TG, ST, and SC wrote the manuscript. KV guided, supervised, and edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kavita Vani.

Ethics declarations

Ethics approval and consent to participate

The need for approval was waived as retrospective analysis of blinded data ensuring anonymity of patients was done for study.

Consent for publication

Retrospective evaluation of blinded digital data was done ensuring anonymity of patients.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ghosh, T., Tanwar, S., Chumber, S. et al. Classification of chest radiographs using general purpose cloud-based automated machine learning: pilot study. Egypt J Radiol Nucl Med 52, 120 (2021). https://doi.org/10.1186/s43055-021-00499-w

Download citation

Received: 14 February 2021
Accepted: 21 April 2021
Published: 05 May 2021
DOI: https://doi.org/10.1186/s43055-021-00499-w

Classification of chest radiographs using general purpose cloud-based automated machine learning: pilot study

Abstract

Background

Results

Conclusion

Similar content being viewed by others

Machine Learning in Automated Chest Radiographs Classification

Automated semantic labeling of pediatric musculoskeletal radiographs using deep learning

High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks

Background

Current scenario

Methods

Dataset creation

Image processing

Model implementation

Results

Dataset characteristics

Accuracy metrics

Discussion

Unmet needs

Proposed solution

Model accuracy

Justification

Reflections

Future implications

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation