Quantitative and Qualitative Analysis of 18 Deep Convolutional Neural Network (CNN) Models with Transfer Learning to Diagnose COVID-19 on Chest X-Ray (CXR) Images

Chow, Li Sze; Tang, Goon Sheng; Solihin, Mahmud Iwan; Gowdh, Nadia Muhammad; Ramli, Norlisah; Rahmat, Kartini

doi:10.1007/s42979-022-01545-8

Quantitative and Qualitative Analysis of 18 Deep Convolutional Neural Network (CNN) Models with Transfer Learning to Diagnose COVID-19 on Chest X-Ray (CXR) Images

Original Research
Published: 05 January 2023

Volume 4, article number 141, (2023)
Cite this article

Download PDF

SN Computer Science Aims and scope Submit manuscript

Quantitative and Qualitative Analysis of 18 Deep Convolutional Neural Network (CNN) Models with Transfer Learning to Diagnose COVID-19 on Chest X-Ray (CXR) Images

Download PDF

Li Sze Chow ORCID: orcid.org/0000-0003-3877-4434¹,
Goon Sheng Tang¹,
Mahmud Iwan Solihin²,
Nadia Muhammad Gowdh³,
Norlisah Ramli³ &
…
Kartini Rahmat³

2709 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

Coronavirus disease 2019 (COVID-19) is a disease caused by a novel strain of coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), severely affecting the lungs. Our study aims to combine both quantitative and qualitative analysis of the convolutional neural network (CNN) model to diagnose COVID-19 on chest X-ray (CXR) images. We investigated 18 state-of-the-art CNN models with transfer learning, which include AlexNet, DarkNet-19, DarkNet-53, DenseNet-201, GoogLeNet, Inception-ResNet-v2, Inception-v3, MobileNet-v2, NasNet-Large, NasNet-Mobile, ResNet-18, ResNet-50, ResNet-101, ShuffleNet, SqueezeNet, VGG-16, VGG-19, and Xception. Their performances were evaluated quantitatively using six assessment metrics: specificity, sensitivity, precision, negative predictive value (NPV), accuracy, and F1-score. The top four models with accuracy higher than 90% are VGG-16, ResNet-101, VGG-19, and SqueezeNet. The accuracy of these top four models is between 90.7% and 94.3%; the F1-score is between 90.8% and 94.3%. The VGG-16 scored the highest accuracy of 94.3% and F1-score of 94.3%. The majority voting with all the 18 CNN models and top 4 models produced an accuracy of 93.0% and 94.0%, respectively. The top four and bottom three models were chosen for the qualitative analysis. A gradient-weighted class activation map** (Grad-CAM) was used to visualize the significant region of activation for the decision-making of image classification. Two certified radiologists performed blinded subjective voting on the Grad-CAM images in comparison with their diagnosis. The qualitative analysis showed that SqueezeNet is the closest model to the diagnosis of two certified radiologists. It demonstrated a competitively good accuracy of 90.7% and F1-score of 90.8% with 111 times fewer parameters and 7.7 times faster than VGG-16. Therefore, this study recommends both VGG-16 and SqueezeNet as additional tools for the diagnosis of COVID-19.

Detection of COVID-19 from Chest X-Ray Images Using Deep Neural Network with Fine-Tuning Approach

COVID-19Net: An Effective and Robust Approach for Covid-19 Detection Using Ensemble of ConvNet-24 and Customized Pre-trained Models

Article 11 December 2023

Identification of Images of COVID-19 from Chest X-rays Using Deep Learning: Comparing COGNEX VisionPro Deep Learning 1.0™ Software with Open Source Convolutional Neural Networks

Article 10 March 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Coronavirus disease 2019 (COVID-19) is an illness caused by a novel coronavirus, which is now called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first outbreak was on December 21, 2019, in Wuhan City, China [1]. The World Health Organization (WHO) declared COVID-19 a global pandemic on March 11, 2020. It has escalated to 180 million cases with 3.9 million deaths and 165 million recovered as recorded on June 24, 2021 [2]. Among the worst-hit nations are the USA, India, and Brazil.

Effective screening is essential to triage the patients and treat them accordingly. COVID-19 is diagnosed by the real-time reverse transcription-polymerase chain reaction (RT-PCR) of nasopharyngeal swabs [3]. Chest radiography imaging and computed tomography (CT) are essential supplementary diagnostic tools for investigating patients suspected of having COVID-19. It is also a vital tool for patient follow-up. However, it requires an experienced and certified radiologist to triage the patients accurately. CXR findings are often non-specific and thus challenging to categorize due to COVID-19 or not. Therefore, a computer-aided diagnosis with automatic classification of lung abnormalities would be beneficial to assist radiologists to confirm their diagnosis and speed up the process.

Recently, many researchers used a convolutional neural network (CNN), a deep learning algorithm to assist in the diagnosis of COVID-19. Deep learning uses automatic feature extraction and pattern recognition to classify an image. CNN is based on the shared-weight architecture of the convolution kernels or filters, which slide along input features and produce the feature maps. CNN uses fully connected networks where each neuron in one layer is connected to all neurons in the next layer. In each layer, the data are transformed into a higher and more abstract level. The deeper the network, the more complex is the information learned. CNN is commonly used for image classification and segmentation.

Wang et al. proposed the COVID-Net model, which combined human-driven principled network design prototy** with machine-driven design exploration to detect COVID-19 cases from CXR images [4]. They used residual architecture design principles in the first stage of human-driven principled network design. Then they used generative synthesis to identify the optimal macro-architecture and micro-architecture designs for the COVID-Net model. They reported an accuracy of 92.6% on the test dataset, a sensitivity of 87.1% for COVID-19 cases, and a high positive predictive value (PPV) of 96.4% for COVID-19 cases. Mangal et al. used a pre-trained CheXNet [

Quantitative analysis using six assessment metrics on 18 CNN models with transfer learning for diagnosing COVID-19 on CXR images. This is an objective assessment by computer.

Identification of COVID-19 pneumonia-related lung changes on CXR identified visually by two certified radiologists on 50 CXR images. This is the ground truth of the diagnosis.

Qualitative analysis of the top four and bottom three CNN models using Grad-CAM heatmaps, performed by two certified radiologists in comparison with the ground truth. This is a subjective assessment by radiologists.

Material and Methods

Overview of 18 CNN Architectures

VGG uses up to 19 weight layers, which is a very deep convolutional network during its era for large-scale image classification. They explored the conventional Convolutional Networks (ConvNets) and increased the depth of architecture with very small (3 × 3) convolution filters [19]. Our study used two versions of VGG, which are VGG-16 and VGG-19, where the number represents the number of layers. ResNet explicitly reformulates the layers as learning residual functions with reference to the layer inputs. Their baselines were inspired by the VGG nets except that this model has fewer filters and lower complexity. [20]. Our study used three versions of ResNet, which are ResNet-18, ResNet-50, and ResNet-101, where the number represents the number of layers. AlexNet comprises 5 convolution layers and 3 fully connected layers with a final 1000-way softmax layer. They used the “dropout” regularization method to reduce overfitting and non-saturating neurons to make training faster [21]. SqueezeNet is a small CNN architecture with equivalent accuracy to AlexNet although it is 50 times fewer parameters and 510 times smaller than AlexNet. It replaced the 3 × 3 filters with 1 × 1 filters, decreased the number of input channels to 3 × 3 filters, and downsample late in the network so that the convolution layers have a large activation map [22].

Inception-v3 scales up the networks by factorizing convolutions and aggressive dimension reductions inside the neural network. They demonstrated the training of high-quality networks on relatively modest size training sets using the combination of lower parameter count and additional regularization with batch-normalized auxiliary classifiers and label smoothing. They showed high-quality results for low receptive field resolution of 79 × 79, which could help detect relatively small objects [23]. GoogLeNet applies the Inception network, and its architecture is based on the Hebbian principle and the intuition of multi-scale processing. The main benefit is that it allows the increase of the depth and width of the network without a huge computational complexity [24]. Inception-ResNet-v2 combined the ideas of residual connections and the Inception architecture. It shows the benefit of accelerating the Inception networks’ training speed and improving the recognition performance significantly [25]. Xception architecture was inspired by the Inception module, but it is entirely based on depth-wise separable convolutions with linear residual connections. It uses the same number of parameters as Inception-v3 but in a more efficient use of these parameters [26].

DarkNet-19 uses 3 × 3 filters and doubles the number of channels after every pooling step. It uses global average pooling to make predictions and a 1 × 1 filter to compress the feature representation between 3 × 3 convolutions [27]. DarkNet-53 is a variant of DarkNet-19, where it has 53 convolutional layers [28]. DenseNet-201 uses a feed-forward to link each layer to every other layer. In each layer, the feature maps of all the preceding layers are used as inputs. Its feature maps are then used as inputs into all the following layers. It solves the vanishing gradient problem, improves feature propagation, encourages feature reuse, and reduces the number of parameters [6].

MobileNet-v2 is a mobile architecture based on an inverted residual structure and linear bottleneck. The shortcut connections are between the thin bottleneck layers. The intermediate expansion layer used lightweight depth-wise convolutions to filter the features. Its architecture consists of an initial fully convolution layer with 32 filters and 19 residual bottleneck layers [29]. ShuffleNet utilizes pointwise group convolution and channel shuffle. It reduces the computation cost while maintaining accuracy. Its computation is 13 times faster than AlexNet for comparable classification accuracy. It was designed for mobile devices [30]. NasNet designs a new search space to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. They used the neural architecture search (NAS) as the primary search method. The model used a new regularization technique called “Scheduled Drop Path” that improves generalization [31]. Our study used two versions of NasNet, which are NasNet-Large and NasNet-Mobile.

Dataset Preparation

The CXR images in our study were obtained from the public and private domains. The dataset from the public domain is called COVIDx [32], which consists of CXR images from five sources: Actualmed COVID-19 Chest X-Ray Dataset Initiative (Actmed) [33], COVID-19 Image Data Collection: Prospective Predictions Are the Future (COHEN) [34], Fig. 1 COVID-19 Chest X-Ray Dataset Initiative (Fig1) [35], (COVID-19 Radiography Database (SIRM) [36], and RSNA Pneumonia Detection Challenge (RSNA) [37]. The dataset from the public domain are available in the websites listed in the references. The dataset from the private domain was provided by the Department of Biomedical Imaging, Faculty of Medicine, University of Malaya (UM), Malaysia. The dataset from the private domain is not available to the public following the ethnic agreement which is specified for this study only. We obtained both CXR images of normal and COVID-19 subjects from both public and private domains. We chose the CXR images in the posteroanterior (PA) and anteroposterior (AP) views of the lung for this study. The number of images from each domain and source is recorded in Table 2. The size of the normal images range from 1024 × 1024 (smallest) to 2520 × 3032 (largest); the COVID-19 images range from 220 × 206 (smallest) to 4280 × 3520 (largest). There are no specific gray levels in the public domain images since they were taken from various databases. The private domain DICOM images were 12-pixel depth indicating 4096 gray levels in each CXR image. Figure 1 shows a COVID-19 CXR image and a normal lung CXR image provided by UM.

Table 2 The number of CXR images obtained from the public and private domain

Full size table

The 18 CNN models were trained with a combined dataset consisting of 200 normal CXR images (100 from COVIDx and 100 from UM) and 200 COVID-19 CXR images (100 from COVIDx and 100 from UM). These images were split to a ratio of 7:3 for training and validation. For each class (normal and COVID-19), 140 images were used for training, and 60 images were used for validation. The remaining images were used for testing the CNN models to evaluate their performances. The testing dataset consists of 150 normal CXR images (100 from COVIDx and 50 from UM) and 150 COVID-19 CXR images (100 from COVIDx and 50 from UM). The dataset split for training, validation, and testing is recorded in Table 3.

Table 3 The implementation details of the dataset split for training, validation, and testing

Full size table

Hardware and Software

The training, validation, and testing of the CNN models were performed using an Intel(R) Core (TM) i5-10,500 CPU @ 3.10 GHz with 8 GB RAM. The YAKAMI DICOM Tool [38] was used to convert the DICOM images to JPEG file format. Then, the Deep Network Designer Toolbox in MATLAB R2020b (The Mathworks, Inc.) was used for training and testing the 18 CNN models. The MATLAB Grad-CAM Library [39] was used to run the Gradient-weighted Class Activation Map** (Grad-CAM) to visualize the classification decision.

Transfer Learning

Our study applied transfer learning to the 18 CNN models available in MATLAB’s Deep Network Designer. The 18 CNN models were previously trained using the ImageNet images [40]. Since we do not have a large dataset of CXR images to train a deep learning model from scratch, transfer learning was applied to the pre-trained CNN models. In this approach, the CNN models are used as a feature extractor while kee** their initial architecture. Referring to Fig. 2, the lower layers for the feature extractor portion are frozen. The original fully connected, softmax and classification output layers are removed and replaced with a new set with an output size of 2 to indicate the binary classification of COVID-19 or normal classes. We did not attempt to optimize the CNN models or adjust their weights in the feature learning portions. The transfer learning approach is a more efficient and common way for the considerably small size of data; therefore, we do not need to train the CNN models from scratch.

This study used the recommended default hyperparameter settings provided by MathWorks’ Deep Learning Guide. Figure 3 is the training setting used for all the CNN models in this study. No tuning approach was done since it is not the main focus of this study.

Assessment Metric

There are four possible outcomes in a confusion matrix for binary classification: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). True positive (TP) refers to the number of cases correctly classified as positive where the disease is present. True negative (TN) refers to the number of cases correctly classified as negative where the disease is absent. False negative (FN) refers to the number of cases wrongly classified as negative where the disease is present. False positive (FP) refers to the number of cases wrongly classified as positive where the disease is absent.

The TP, TN, FN, and FP are used to calculate the assessment metrics including specificity, sensitivity, precision, NPV, accuracy, and F1-score. These metrics are used to evaluate the performance of the 18 CNN models in this study. The formulas for the specificity, sensitivity (or recall), precision, NPV, accuracy, and F1-score are given in Eq. (1) to Eq. (6), respectively:

$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}}\,{ + }\,{\text{FP}}\,}},$$

(1)

$${\text{Sensitivity/Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}}\,{ + }\,{\text{FN}}}},$$

(2)

$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}}\,\,{ + }\,{\text{FP}}}},$$

(3)

$$\text{Negative Predictive Value (NPV)}=\frac{\text{TN}}{{\text{TN}}+{\text{FN}}},$$

(4)

$${\text{Accuracy}} = \frac{{{\text{TP + }}\,{\text{TN}}}}{{{\text{TP}}\,{ + }\,{\text{TN}}\,{ + }\,{\text{FP}}\,{ + }\,{\text{FN}}}},$$

(5)

$${\text{F1 score}} = \frac{{{2} \times {\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} = \frac{{{2} \times {\text{TP}}}}{{\left( {{2} \times {\text{TP}}} \right)\,{\text{ + FN + }}\,{\text{FP}}}}.$$

(6)

Majority Voting

Majority voting has been adopted with deep learning to improve the COVID-19 detection accuracy [41, 42]. Our study used the hard approach of majority voting, which gives a label of the class for each image according to the highest number of labels (votes) among all the CNN models. It is applied for the 18 CNN models, then repeated for the top 4 CNN models with an accuracy higher than 90%.

Qualitative Analysis with Grad-CAM

The prediction made by the CNN models can be evaluated quantitatively using the assessment metrics described earlier. However, we do not know which part of the images was used as the features in the decision-making for the prediction. Therefore, it is equally important to display some sort of “visual explanation” for the decision made by the CNN models. We used Grad-CAM for this purpose [39]. It uses the gradients of any target concept flowing into the final convolutional layer to produce a coarse localization map highlighting the significant regions in the image for the prediction. It is a useful tool to interpret the model’s decision. For the 18 CNN models, the feature map layer was specified for each model to produce the Grad-CAM heatmap as shown in Table 4.

Table 4 The selected feature map layer used to produce Grad-CAM heatmap in each CNN models

Full size table

From the quantitative analysis, we chose the top four and bottom three CNN models for further qualitative analysis. We produced the Grad-CAM heatmaps of the testing dataset with COVID-19 (50 CXR images from UM). Two certified radiologists with more than 5 and 10 years of CXR interpretation experience independently evaluated these CXR images by drawing a contour over the infected region within the lung using the ITK-SNAP software [43]. For each CXR image, the radiologists were given seven Grad-CAM heatmaps (top four and bottom three) to vote for the closest heatmap with their diagnosis indicated by the contour of the infected region. If there were more than one heatmap with the correct region identified by the CNN model, they were all given one vote. If all the heatmaps showed the wrong region, no vote was given for that image. This process was repeated for 50 CXR images with seven heatmaps each. The bottom three CNN models were included in this vote to ensure that they were the least accurate model compared to the top four models. The radiologists performed blind analysis during the voting without knowing the name of the CNN models. We aim to find the most suitable CNN models for COVID-19 detection by combining both quantitative and qualitative analysis.

Results

Quantitative Analysis

Table 5 records the depth of layers, total layers (convolution, dense, pooling, etc.), and the number of parameters (in million) for the 18 CNN models, arranged from the highest to the lowest number of parameters. All the models used the same input image size of 224 × 224 × 3. Figure 4 shows the training time (left bars), and the validation and testing accuracy (right bars) for each model arranged from the highest to the lowest number of parameters. Generally, a model with a larger number of parameters requires a longer training time.

Table 5 The depth of layers, total layers, number of parameters, and image input size for the 18 CNN models are arranged from the highest to the lowest number of parameters

Full size table

SqueezeNet used the lowest number of parameters (1.24 million) and the shortest training time (514 s = 8 min 34 s), yet a relatively good validation accuracy of 92.5% and testing accuracy of 90.67%. VGG-16 has the highest validation accuracy of 96.67% and testing accuracy of 94.33%, but a relatively long training time (3942 s = 1 h 5 min 42 s). In general, there is a tradeoff in achieving higher accuracy. Nevertheless, NasNet-Large used the longest training time yet it has the lowest validation and testing accuracy. Therefore, the relation between the training time with the validation and testing accuracy is inconclusive among these 18 CNN models.

Table 6 records the classification results (TP, FP, FN, and TN) for the 18 CNN models, arranged from the highest to the lowest number of parameters, and for the majority voting with 18 models and the top 4 models. These values were used to calculate the assessment metric specificity, sensitivity, precision, NPV, accuracy, and F1-score as recorded in Table 7. The 18 CNN models were arranged from the highest to the lowest accuracy (%) in Table 7. It was found that VGG-16 has the highest accuracy of 94.3%, highest specificity of 93.5%, highest precision of 93.3%, and highest F1-score of 94.3%. VGG-19 demonstrated the highest sensitivity value of 95.6% and the highest NPV value of 96.0%. DarkNet-19 and GoogLeNet also demonstrated the highest NPV value of 96.0%. The top 4 models were identified based on an accuracy higher than 90%, which are VGG-16, ResNet-101, VGG-19, and SqueezeNet. The majority voting with 18 models produced an accuracy of 93.0%, which is lower than the majority voting with the top 4 models with an accuracy of 94.0%.

Table 6 Classification results (TP, FP, FN, TN) for the 18 CNN models, arranged in the descending order of the number of parameters; and for the majority voting with 18 models and the top 4 models

Full size table

Table 7 Assessment metric values for the 18 CNN models (arranged from the highest to the lowest accuracy) and for the majority voting with 18 and the top 4 models

Full size table

The assessment metric results in Table 7 are plotted in Fig. 5 from the highest to the lowest number of parameters as the plot moves from the left to the right side. DarkNet-53 demonstrated the most consistent values among the six assessment metrics, while Xception has the largest variation of values. It is observed that there is no specified trend of performance with the number of parameters used in each CNN model. Our study focuses on the performance of different types of CNN models, instead of the number of parameters, to diagnose COVID-19. The majority voting with either 18 or the top 4 models produced consistently higher values of all the assessment metrics. The confusion matrices of the top 4 models, the majority voting with 18 models, and the majority voting with the top 4 models are plotted in Fig. 6.

Qualitative Analysis

From the above quantitative results in Table 7 and Fig. 5, it is inconclusive which CNN model is the best model for identifying COVID-19 from the normal lung CXR images. Therefore, it is necessary to perform qualitative analysis to investigate the most suitable CNN model for diagnosing COVID-19. Figure 7(a) and b show the Grad-CAM heatmaps of the 18 CNN models for the correctly classified COVID-19 and normal CXR images, respectively. The red region is the most significant region where the CNN models extracted the “features” during the prediction process. The blue region is the least significant region for decision-making. It is observed that some of the red regions for decision-making are not within the thoracic cavity. Therefore, the prediction performed by some CNN models was based on the features of a wrong region although it produced the true positive (TP) or true negative (TN) results. The ground truth of the infected lung area is shown in the bottom right corner of Fig. 7(a) by two radiologists. The majority voting method does not have a Grad-CAM heatmap because it is a different approach that produces the label of the images based on the majority votes of the prediction from each CNN model.

To identify which CNN model interpreted the correct region within the lung during the classification process, the qualitative analysis of these heatmaps is necessary with the assistance of the radiologist. Only the top four models (VGG-16, ResNet-101, VGG-19, and SqueezeNet) and bottom three models (NasNet-Mobile, NasNet-Large, and Xception) from Table 7 were chosen to produce the Grad-CAM heatmaps for 50 CXR images (from UM datasets) for the qualitative analysis. Figure 8 shows another COVID-19 CXR image with the ground truth drawn by two radiologists and seven Grad-CAM heatmaps of the top four and bottom three models. The radiologists voted the best heatmap by comparing them with the contour of the infected region drawn by themselves. The result of their voting is recorded in Table 8. The total number of voting is unequal between the two radiologists because in any case without a correct heatmap, no score was given. Referring to Table 8, SqueezeNet has the highest score (printed in bold) on its Grad-CAM heatmaps to the radiologist’s diagnosis. The bottom three models have the least score among both radiologists. This result confirms that the poorly performed CNN models in terms of quantitative analysis agreed with the qualitative analysis by the radiologists.

Table 8 The voting results by two blinded radiologists on 50 Grad-CAM heatmaps of top 4 and bottom 3 CNN models

Full size table

Discussion

This study has demonstrated both quantitative and qualitative analysis of 18 CNN models with transfer learning to diagnose COVID-19 on CXR images. The state-of-the-art CNN models can classify COVID-19 from normal lung CXR images with accuracy between 74.3% and 94.3% in our study as recorded in Table 7. Six assessment metrics were calculated including specificity, sensitivity, precision, NPV, accuracy, and F1-score. Yet, it is difficult to conclude which is the most suitable model from the quantitative analysis result. Most of the CNN models produced competitively good results of assessment metric values. Referring to Table 7, the top four CNN models with accuracy higher than 90% are VGG-16, ResNet101, VGG-19, and SqueezeNet. The majority voting with the hard approach produced aN accuracy of 94.0% when combining the top 4 models and 93.0% when combining all the 18 models. The slightly lower accuracy in combining 18 models is due to the averaging effect from the poorer models.

To date, the majority of the CNN studies for the detection of COVID-19 excluded qualitative analysis by radiologists. The new contribution from our study is the subjective qualitative analysis of the CNN models by certified radiologists alongside the quantitative analysis. Our study has combined both objective assessment (quantitative analysis by computer) and subjective assessment (qualitative analysis by radiologists) to enhance the evaluation of the CNN models. It gives us better confidence in our investigation of the best CNN model for diagnosing COVID-19 on CXR images.

Mangal et al. used RISE [44] to generate saliency maps to visualize their model’s predictions [

Conclusion

The main contribution of this study is the combination of both objective quantitative and subjective qualitative analysis in evaluating the performance of CNN models with transfer learning to diagnose COVID-19. In this study, the quantitative analysis of 18 CNN models with transfer learning revealed that the top four models for diagnosing COVID-19 on CXR images are VGG-16, ResNet-101, VGG-19, and SqueezeNet. The VGG-16 scored the highest accuracy of 94.3% and the highest F1-score of 94.3%. The majority voting with all the 18 CNN models and top 4 models produced an accuracy of 93.0% and 94.0% respectively. The qualitative analysis using Grad-CAM heatmaps of the top four and bottom three models revealed that SqueezeNet is the closest model to the subjective diagnosis of two certified radiologists. SqueezeNet demonstrated a competitively good accuracy of 90.7% and F1-score of 90.8% with the shortest training time of 8 min 34 s. It used 111 times fewer parameters than VGG-16 and its training time was 7.7 times faster than VGG-16. Therefore, our study recommends both VGG-16 and SqueezeNet as additional tools for the diagnosis of COVID-19.

Data availability

The dataset from the public domain are available in the websites listed in the references. The dataset from the private domain is not available to the public following the ethnic agreement which is specified for this study only.

References

Tan W, et al. A novel coronavirus genome identified in a cluster of pneumonia cases—Wuhan, China 2019–2020. China CDC Wkly. 2020;2(4):61–2. https://doi.org/10.46234/ccdcw2020.017.
Article Google Scholar
“COVID Live Update: 167,653,596 Cases and 3,480,642 Deaths from the Coronavirus - Worldometer.” https://www.worldometers.info/coronavirus/ (accessed 24 May 2021).
Wang W, et al. “Detection of SARS-CoV-2 in different types of clinical Specimens,” JAMA - Journal of the American Medical Association, vol. 323, no. 18. American Medical Association, pp. 1843–1844, May 12, 2020, doi: https://doi.org/10.1001/jama.2020.3786.
Wang L, Wong A. “COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images,” 2020.
Rajpurkar P, Irvin J, Zhu K, et al. Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. ar**v. 2017. https://doi.org/10.4855/ar**v.1711.05225
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. “Densely connected convolutional networks,” In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243.
Mangal KS, Arora. CovidAID: COVID-19 detection using chest X-ray. ar**v. https://doi.org/10.48550/ar**v.2004.09803 2020
Kumar Sethy P, Kumari Behera S, Kumar Ratha P, Biswas P. “Detection of coronavirus disease (COVID-19) based on Deep Features and Support Vector Machine,” Preprints, Apr. 2020. Accessed: 24 May 2021. [Online]. Available: www.preprints.org.
Chaudhary PK, Pachori RB. “Automatic diagnosis of COVID-19 and pneumonia using FBD method,” Proc. - 2020 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2020, pp. 2257–2263, Dec. 2020. https://doi.org/10.1109/BIBM49941.2020.9313252.
Chaudhary PK, Pachori RB. FBSED based automatic diagnosis of COVID-19 using X-ray and CT images. Comput Biol Med. 2021. https://doi.org/10.1016/J.COMPBIOMED.2021.104454.
Article Google Scholar
Loey E-SS, Mirjalili S. Bayesian-based optimized deep learning model to detect COVID-19 patients using chest X-ray image data. Comput Biol Med. 2022;142: 105213.
Article Google Scholar
Gour M, Jain S. Uncertainty-aware convolutional neural network for COVID-19 X-ray images classification. Comput Biol Med. 2022;140: 105047.
Article Google Scholar
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big Data. 2016;3(1):1–40. https://doi.org/10.1186/s40537-016-0043-6.
Article Google Scholar
Apostolopoulos ID, Mpesiana TA. Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med. 2020;43(2):635–40. https://doi.org/10.1007/s13246-020-00865-4.
Article Google Scholar
Minaee S, Kafieh R, Sonka M, Yazdani S, Jamalipour Soufi G. “Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning,” Med. Image Anal., vol. 65, Oct. 2020, doi: https://doi.org/10.1016/j.media.2020.101794.
Soares LP, Soares CP. “Automatic detection of COVID-19 cases on X-ray images Using Convolutional Neural Networks,” 2020. [Online]. Available: http://arxiv.org/abs/2007.05494.
Majeed T, Rashid R, Ali D, Asaad A. Issues associated with deploying CNN transfer learning to detect COVID-19 from chest X-rays. Phys Eng Sci Med. 2020;43(4):1289–303. https://doi.org/10.1007/s13246-020-00934-8.
Article Google Scholar
Nayak SR, Nayak DR, Sinha U, Arora V, Pachori RB. Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: a comprehensive study. Biomed Signal Process Control. 2021;64: 102365. https://doi.org/10.1016/J.BSPC.2020.102365.
Article Google Scholar
Simonyan K, Zisserman A. “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., vol. 75, no. 6, pp. 398–406, 2015.
He K, Zhang X, Ren S, Sun J. “Deep residual learning for image recognition,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2016, vol. 2016-Decem, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. https://doi.org/10.1145/3065386.
Article Google Scholar
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” pp. 1–13, 2016.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. “Rethinking the Inception Architecture for Computer Vision,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2016, vol. 2016-December, pp. 2818–2826. https://doi.org/10.1109/CVPR.2016.308.
Szegedy C, et al. “Going deeper with convolutions,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07–12-June, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594.
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. “Inception-v4, inception-ResNet and the impact of residual connections on learning,” In: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 4278–4284.
Chollet F. “Xception: deep learning with depthwise separable convolutions,” In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 1800–1807. https://doi.org/10.1109/CVPR.2017.195.
Redmon J, Farhadi A. “YOLO9000: Better, Faster, Stronger,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525, Dec. 2016, Accessed: 24 May 2021. [Online]. Available: http://arxiv.org/abs/1612.08242.
Redmon J, Farhadi A. “YOLOv3: An incremental improvement,” ar**v. 2018.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520, doi: https://doi.org/10.1109/CVPR.2018.00474.
Zhang X, Zhou X, Lin M, Sun J. “ShuffleNet: an extremely efficient convolutional neural network for mobile devices,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856, doi: https://doi.org/10.1109/CVPR.2018.00716.
Zoph B, Vasudevan V, Shlens J, Le QV. “Learning transferable architectures for scalable image recognition,” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 8697–8710, doi: https://doi.org/10.1109/CVPR.2018.00907.
“COVID-Net/COVIDx.md at master · lindawangg/COVID-Net · GitHub.” https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md (accessed 02 Apr 2021).
“GitHub - agchung/Actualmed-COVID-chestxray-dataset: Actualmed COVID-19 Chest X-ray Dataset Initiative.” https://github.com/agchung/Actualmed-COVID-chestxray-dataset (accessed 02 Apr 2021).
Cohen JP, Morrison P, Dao L, Roth K, Duong TQ, Ghassemi M. “COVID-19 image data collection: prospective predictions are the future,” Jun. 2020, [Online]. Available: http://arxiv.org/abs/2006.11988.
“GitHub - agchung/Figure1-COVID-chestxray-dataset: Figure 1 COVID-19 Chest X-ray Dataset Initiative.” https://github.com/agchung/Figure1-COVID-chestxray-dataset (accessed 02 Apr 2021).
T. Rahman, M. Chowdhury, and A. Khandakar, “COVID-19 Radiography Database,” Kaggle, 2020. https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/data# (accessed 29 Jul 2020).
“RSNA Pneumonia Detection Challenge | Kaggle.” https://www.kaggle.com/c/rsna-pneumonia-detection-challenge (accessed 29 Jul 2020).
“YAKAMI DICOM Tools (Free DICOM Viewer/Converter/etc.).” https://www.kuhp.kyoto-u.ac.jp/~diag_rad/intro/tech/dicom_tools.html#INSTALL (accessed 02 Apr 2021).
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2020;128(2):336–59. https://doi.org/10.1007/s11263-019-01228-7.
Article Google Scholar
“ImageNet.” https://www.image-net.org/ (accessed 24 May 2021).
Chandra TB, Verma K, Singh BK, Jain D, Netam SS. Coronavirus disease (COVID-19) detection in chest X-ray images using majority voting based classifier ensemble. Expert Syst Appl. 2021;165: 113909. https://doi.org/10.1016/j.eswa.2020.113909.
Article Google Scholar
Jabra MB, Koubaa A, Benjdira B, Ammar A, Hamam H. COVID-19 diagnosis in chest X-rays using deep learning and majority voting. Appl Sci. 2021;11:2884. https://doi.org/10.3390/app11062884.
Article Google Scholar
Yushkevich PA, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006;31(3):1116–28. https://doi.org/10.1016/j.neuroimage.2006.01.015.
Article Google Scholar
Petsiuk V, Das A, Saenko K. Rise: Randomized input sampling for explanation of black-box models. Proceedings of the British Machine Vision Conference (BMVC). 2018
Kindermans P, Hooker S, Adebayo J. The unreliability of saliency methods. ar**v:1711.00867. 2017
Ghorbani A, Abid A, Zou J. Interpretation of neural networks in fragile. ar**v:1710.10547. 2018

Download references

Funding

This study was supported in part by the University Malaya Research Grant (Grant No: CSRG002-2020ST). We like to express our gratitude to the Department of Biomedical Imaging, the University of Malaya for providing the CXR images for this study.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, 1, Jalan Puncak Menara Gading, Taman Connaught, Cheras, 56000, Kuala Lumpur, Malaysia
Li Sze Chow & Goon Sheng Tang
Department of Mechanical and Mechatronics Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, 1, Jalan Puncak Menara Gading, Taman Connaught, Cheras, 56000, Kuala Lumpur, Malaysia
Mahmud Iwan Solihin
Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603, Kuala Lumpur, Malaysia
Nadia Muhammad Gowdh, Norlisah Ramli & Kartini Rahmat

Authors

Li Sze Chow
View author publications
You can also search for this author in PubMed Google Scholar
Goon Sheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Mahmud Iwan Solihin
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Muhammad Gowdh
View author publications
You can also search for this author in PubMed Google Scholar
Norlisah Ramli
View author publications
You can also search for this author in PubMed Google Scholar
Kartini Rahmat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Sze Chow.

Ethics declarations

Conflict of Interest

The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

Ethical Approval

This research has been approved on ethical grounds by the Medical Research Ethics Committee, University Malaya Medical Centre Ethics Board on 19 June 2020 (2020417-8530).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chow, L.S., Tang, G.S., Solihin, M.I. et al. Quantitative and Qualitative Analysis of 18 Deep Convolutional Neural Network (CNN) Models with Transfer Learning to Diagnose COVID-19 on Chest X-Ray (CXR) Images. SN COMPUT. SCI. 4, 141 (2023). https://doi.org/10.1007/s42979-022-01545-8

Download citation

Received: 22 March 2022
Accepted: 03 December 2022
Published: 05 January 2023
DOI: https://doi.org/10.1007/s42979-022-01545-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantitative and Qualitative Analysis of 18 Deep Convolutional Neural Network (CNN) Models with Transfer Learning to Diagnose COVID-19 on Chest X-Ray (CXR) Images

Abstract

Similar content being viewed by others

Detection of COVID-19 from Chest X-Ray Images Using Deep Neural Network with Fine-Tuning Approach

COVID-19Net: An Effective and Robust Approach for Covid-19 Detection Using Ensemble of ConvNet-24 and Customized Pre-trained Models

Identification of Images of COVID-19 from Chest X-rays Using Deep Learning: Comparing COGNEX VisionPro Deep Learning 1.0™ Software with Open Source Convolutional Neural Networks

Introduction