Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier

Saygılı, Ahmet

doi:10.1007/s13369-021-06240-z

Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier

Research Article-Computer Engineering and Computer Science
Published: 07 October 2021

Volume 47, pages 2435–2453, (2022)
Cite this article

Download PDF

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier

Download PDF

Ahmet Saygılı ORCID: orcid.org/0000-0001-8625-4842¹

1924 Accesses
15 Citations
Explore all metrics

Abstract

COVID-19 is a virus that has been declared an epidemic by the world health organization and causes more than 2 million deaths in the world. To achieve this, computer-aided automatic diagnosis systems are created on medical images. In this study, an image processing and machine learning-based method is proposed that enables segmenting of CT images taken from COVID-19 patients and automatic detection of the virus through the segmented images. The main purpose of the study is to automatically diagnose the COVID-19 virus. The study consists of three basic steps: preprocessing, segmentation and classification. Image resizing, image sharpening, noise removal, contrast stretching processes are included in the preprocessing phase and segmentation of images with Expectation–Maximization-based Gaussian Mixture Model in the segmentation phase. In the classification stage, COVID-19 is classified as positive and negative by using kNN, decision tree, and two different ensemble methods together with the kernel support vector machines method. In the study, two different CT datasets that are open to the public and a mixed dataset created by combining these datasets were used. The best accuracy values for Dataset-1, Dataset-2 and Mixed Dataset are 98.5%, 86.3%, 94.5%, respectively. The achieved results prove that the proposed approach advances state-of-the-art performance. Within the scope of the study, a GUI that can automatically detect COVID-19 has been created.

Covid-19 Classification Based on Gray-Level Co-occurrence Matrix and Support Vector Machine

A Hybrid MSVM COVID-19 Image Classification Enhanced with Swarm Feature Optimization

COVID-19 CT-images diagnosis and severity assessment using machine learning algorithm

Article 24 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

COVID-19 is a highly contagious disease from the Coronavirus family and can cause numerous diseases. The World Health Organization (WHO) declared COVID-19 as the “Public Health Emergency of International Importance” on January 30, 2020, and as a pandemic on March 11, 2020 [1]. Billions of people around the world have been affected by the rapid spread of COVID-19 (SARS-CoV-2) and causing severe respiratory failure. SARS-CoV-2 infection can lead to severe pneumonia, potentially fatal [2, 3]. Until now (9 February 2021), there are a total of 106,125,682 confirmed cases, 2,320,497 of which resulted in death [1]. These numbers reveal the extent of the epidemic.

Early diagnosis is the most important factor directly affecting the spread of the virus in COVID-19. The best known diagnostic method for COVID-19 is the reverse transcription-polymerase chain reaction (RT-PCR) [1, 4]. On the other hand, studies show that this test has a sensitivity of about 60–70%, and therefore it is necessary to perform more than one test to make an accurate diagnosis [5,6,7]. Since the RT-PCR test has a sensitivity of 60–70%, symptoms are determined by using lung and chest radiology images in the diagnosis process [8, 9]. CT is a frequently used and successful imaging method in diagnosis [10]. The main purpose of computer-aided automatic diagnosis systems is to assist medical professionals in their decisions. Situations such as the high spread rate of the COVID-19 epidemic and an insufficient number of physicians increase the importance of automatic diagnosis approaches.

Findings obtained in CT tomography images have a very important place in the diagnosis stage. One of the most important findings that radiologists look at from COVID-19 images are ground-glass opacity [11, 12]. With ground-glass opacity, it appears to have a piece frosted appearance on the lung. Another obvious finding is the crazy-paving pattern [11, 12]. In a study investigating the radiological findings of COVID-19, the findings obtained from 88 patients were included [12]. According to the results obtained in the study, the most common patterns seen in chest CT are ground-glass opacity seen in 65% of the patients. Besides, 47% of the patients had air bronchogram, 35% had flat or irregular interlobular septal thickening, 32% had adjacent pleural thickening, and 10% had a crazy-paving pattern [12]. As can be seen here, there are significant differences between a healthy CT image and a CT image with COVID-19. It is aimed to determine the differences between computer-based approaches. For this, various image processing and machine learning methods are used.

Medical imaging technologies are frequently used in the diagnosis of different diseases. Image segmentation is a basic technique in image processing. Generally, in segmentation processes, the process is carried out by making use of the brightness and gray level of the image pixels, contrast, texture, and color. Segmentation is especially applied to medical images. In this study, which we carried out, segmentation was also used. The main motivation for this study is to propose a robust and highly accurate system that can diagnose COVID-19 through CT images. The innovations and contributions of the study are as follows;

1.
To propose a new and effective approach based on machine learning and image processing that can diagnose COVID-19.
2.
To create a decision support system that can support and assist medical professionals in their COVID-19 work.
3.
Identify suspected COVID-19 cases accurately and quickly, thus playing an important role in timely quarantine and medical treatment.
4.
It has been shown that a robust approach is presented for mixed data sets.
5.
The proposed method is independent of the data set. This method is applicable for different datasets.
6.
It has been shown that more successful results are obtained than other studies in the literature when segmented images are used.
7.
A GUI accessible on GitHub has been created for the use of medical professionals and people who want to work in this field.

The following parts of the study are as follows; Sect. 2 provides a summary of the literature, Sect. 3 data set and method, Sect. 4 experimental study results, Sect. 5 discussion, and Sect. 6 conclusion.

2 Related Work

Since COVID-19 emerged, numerous academic studies have been conducted using radiological images like CT and X-ray. The rate of detecting COVID-19 from CT images is higher than in X-ray images [13]. X-ray images can be misleading in ground-glass opacities [14]. In other words, ground-glass opacities that are not visible in the X-ray image can be seen when imaging with CT. Some of the studies in the literature were performed on CT images [15,16,17,18,19], some on X-ray images [20,21,22,23,24], and some on both [25,26,27]. In our experimental studies, we use two different CT data sets and a mixed data set consisting of combining these data sets. There are many studies in the literature using CT datasets. In the first of these studies, Wang and others redesigned the COVID-Net deep learning architecture and applied it to two different CT images [28]. As the most important feature of their study, they stated that they reduced the data heterogeneity with the normalization process applied to each data set separately. They achieved 90.83% and 78.69% accuracy, respectively. Jaiswal et al. aimed to diagnose COVID-19 from CT images with the pre-trained DenseNet201-based CNN deep learning model [29]. In this study, which has the same data set as the data set we used in our study, the testing accuracy is 96.25%. They used DenseNet, a CNN model, in their studies. Yazdani et al. made the Resnet56 architecture for COVID detection over the residual attention network model [19]. In the study, an accuracy value of 92% was obtained. Silva et al. used two different data sets in their study. EfficientCovidNet achieved an 87.68% accuracy rate with deep learning architecture [17]. Unlike other studies, the performance of the system was measured with different data sets during the training and testing stages by the cross dataset. The accuracy rate obtained in this way is 56.16%. Details of more studies that do not use the same data set as our study but operate on CT and X-ray images are shown in Table 1.

Table 1 Some of the studies in the literature about COVID-19

Full size table

The main motivation of this study is to propose a robust and rapid system capable of diagnosing COVID-19 through CT images. It has been understood that a generalizable and rapid approach should be put forward to provide support, especially to medical professionals. For this motivation, the studies carried out in the literature were examined and their deficient and positive aspects were investigated. As a result of the evaluations, it was seen that there was no study using segmented images in the studies carried out in the literature. In addition, it has been determined that mixed data sets, which are obtained by using different data sets together, are not used. In the light of these determinations, a general method has been tried to be put forward. It would not be wrong to say that this application, which was created and made available on GitHub, is the most important contribution of the study to this field. By selecting any CT image via the GUI, results can be obtained in less than 1 min. And this process is carried out with a very high accuracy rate.

3 Materials and Methods

Computed tomography (CT) images were used for experimental studies [15, 18, 38]. First dataset contains 1252 images with COVID-19 (+) and 1230 images with COVID-19 (−) [15, 18]. It can be said to be one of the largest CT COVID-19 datasets available to the public. This data set we use belongs to 60 patients with COVID-19 in Brazil. While 30 of these 60 people are COVID-19 (+), 30 of them are COVID-19 (−). The top two rows in Fig. 1 show COVID-19 (+), the bottom two rows show COVID-19 (−) images.

Our second dataset used in our study includes 349 COVID-19 (+) images and 397 COVID-19 (−) images. In that 37% of those with COVID-19 (+) are female and 63% are male. Sample images of this dataset are shown in Fig. 2. The top row in Fig. 2 show COVID-19 (+), the bottom row show COVID-19 (−) images.

3.1 Preprocessing and Segmentation

The CT images in the study are in png format. Also, the dimensions of the images range from 182 × 129 to 534 × 341. For this reason, the resizing process was applied to our data set in the first stage of the preprocessing step. In our study, the nearest method is used for the resize operation. This method is the default method used in the resize operation. The image has been resized to 320 × 256 dimensions, which are considered to be the most suitable size for all images using the average height and width. All images were then converted to gray levels. Gray-level images are easier and less costly to process because a single value is used to represent each pixel. After this conversion process, image sharpening and contrast enhancement were applied. After all these processes, the segmentation step was started. Gaussian Mixture Modeling Method based on Expectation–Maximization (EM) algorithm was used for the segmentation process. Figure 3 shows the pseudocode of the EM algorithm. The pixels of the image to be segmented in the initial phase are expressed by P. K indicates the number of clusters. First, with the K-mean clustering method, the initial values of the parameters are determined. After the initial values of the parameters of the method are determined, the expectation and maximization steps (E and M) are performed repeatedly to determine the probability distribution parameters. As can be seen in Fig. 3, the probability estimation is performed with step E, while the parameters of the probability distribution are determined in step M.

In our study, segmentation with the number of clusters of different sizes was applied and it was determined that the most suitable number of clusters was 3. For this reason, the number of clusters was applied as 3 in the segmentation process. A Wiener filter was used in the noise reduction stage [39]. This filter is generally a linear smoothing filter used in the frequency domain. The Wiener filter increases the blurring level of images. The Wiener filter is formulated as follows;

$$W\left(u, v\right)= \frac{H(u, v)}{{\left|H(u, v)\right|}^{2}+{S}_{nx}(u, v)}$$

(1)

${S}_{nx}(u, v)$ represents the signal-to-noise ratio (SNR). H (u, v) represents the sinc function of the target pixel. The segmentation and noise removal stages on two sample images are shown in Fig. 4. When the figure is examined, the effect of three different neighborhood sizes on the image can be seen in the Wiener filter [3 3], [5 5], and [7 7]. As a result of the experimental studies, it has been seen that using a window size for the Wiener filter [3 3] leads to the best result. For this reason, [3 3] was used as the window size in our study. As you increase in the neighborhood, so does the blurring level of the image. Besides, it is possible to see from the same figure that the ground-glass opacity in the COVID-19 (+) image becomes more pronounced as a result of segmentation.

The flow chart showing all the operations we have performed in the preprocessing and segmentation stage is shown in Fig. 5.

3.2 Feature Extraction and Classification

In this work, Histogram of Oriented Gradients (HOG), Gray-Level Co-Occurrence Matrix (GLCM), and Local Binary Pattern (LBP) methods were used in feature extraction. Since the method that gives the most successful results is HOG, the results obtained with this method will be given.

For feature extraction with HOG [40]; Horizontal and vertical Sobel filters are applied (G_x and G_y edge detection is provided) Then, the gradient size and gradient orientation angle (G and α) are calculated. Mentioned transactions can be seen in Fig. 6.

$$\begin{aligned} G_{x} &= I*\left[ { - 1\, 0\, 1} \right],\quad G_{y} = I*\left[ { - 1\, 0\, 1} \right]^{T} ,\\ G &= \sqrt {G_{x}^{2} + G_{y}^{2} } ,\quad \alpha = \arctan \frac{{G_{y} }}{{G_{x} }} \end{aligned}$$

(2)

As a result of experimental studies, it was determined that the cell size that gives the highest success for HOG is 32. When using the HOG method, it is necessary to increase the cell size to capture large-scale spatial information. However, when you increase the cell size, you may lose small-scale details. In other words, it is not possible to talk about a single optimal cell size for HOG. The best cell size for the HOG method depends on your data. This can be obtained as a result of experimental studies. In the classification stage of the study, training of the extracted features and performing the classification process are included. At this stage, K Nearest Neighbor (kNN), Ensemble kNN, ensemble subspace discriminant analysis, Kernel Support Vector Machines (k-SVM), Decision Tree, and kernel naive Bayes methods were used. The parameters of these methods used are given in Table 2.

Table 2 Parameters of the methods

Full size table

kNN is a nonparametric classification method [41]. The prediction of a new sample is determined by considering the closest neighbors and assigned to the most similar cluster [42]. The number k here shows how many nearest neighbors to look after. The Minkowski distance was used for the distance to the neighbors. The Minkowski distance is a generalized version of the Euclidean and Manhattan distance measures. In this distance method, if the p-value is 1, it is the Manhattan distance measure, and if it is 2, it is the Euclidean distance measure. In our study, the p-value was used as 2. In other words, the Euclidean version of the Minkowski distance criterion was used. The k value used in the kNN method in our study was obtained as a result of experimental studies. The system was tested with 1, 3, 5, and 7 k values, and it was seen that the most successful results were obtained for the 1 value of k.

Ensemble methods perform the process of combining weak classifiers to obtain strong classifiers [43, 44]. Ensemble kNN is one of these methods. In high-dimensional data, the kNN is heavily affected by noise. To improve the performance of the nearest neighbor classifier with ensemble methods, a method is proposed in which each classifier of the ensemble can only access a random subset of features [45]. Another ensemble method is the ensemble subspace discriminant analysis method, which eliminates the weaknesses of the Linear Discriminant analysis method and creates a stronger method. One of the shortcomings of this method is that weak discrimination capability may arise due to the random selection of subsets in the random subspaces method. The majority voting (MV) method is used to find a solution to this weakness of RSM. Under normal circumstances, a single classifier in the ensemble can use only a small portion of the features in the feature space, but thanks to this method, each classifier can classify any new unknown sample [46, 47].

Decision trees, another method we use in our study, is a classification model in the form of a tree structure consisting of decision nodes and leaf nodes according to properties and goals. A decision tree algorithm is developed by breaking the data set into small pieces. The naive Bayes classifier method is a method that has been used for many years and is based on probability theory. A naive Bayes classifier can also be considered as a Bayesian network where each attribute is conditionally independent of each other and the concept to be learned is conditionally dependent on all these attributes [48].

The SVM is the last classifier that gave our study’s name. SVM is a widely used method in medical image processing applications. The main purpose of the SVM algorithm is to find a hyper-plane that separates data [49]. The standard SVM algorithm was created for binary classification problems. However, today SVM method is frequently used for multi-class problems. Support vector machines algorithms use kernel functions [50]. Kernel functions map data to a different feature space [51]. The most commonly used kernel functions in the literature are Gaussian, Radial Basis, Sigmoid and Polynomial functions [52]. In our study, by experimenting with all these kernel functions, we determined that the function that gives the best result is the polynomial kernel function. For d degree polynomials, the polynomial kernel is defined as;

$$K\left(x,y\right)={({x}^{T}y+c)}^{d}$$

(3)

where x and y are feature vectors calculated from training or test samples in the input space. c ≥ 0 is a free parameter controlling the influence of lower-order versus higher-order in the polynomial. When c = 0, the kernel is called homogeneous [3]. d indicates the degree of the polynomial. In our study, it is also called cubic kernel because the d value was chosen as 3.

In the classification phase of our study, tenfold cross-validation (CV) was used to determine the training and testing data. In this method, 10% of the data is reserved for testing and 90% for training in each iteration, and this process is repeated 10 times. It is tried to prevent the memorization of data by changing the test part by 10% in each iteration.

The final success rate is obtained by taking the average of the success rates obtained as a result of 10 iterations. Figure 7 shows the general flow diagram of our study. In the figure, all process steps are shown one by one on a COVID-19 (+) image selected from the data set. When the figure is examined, you can have an idea of what the effect of each process step is.

The pseudocode of the proposed method can be seen in Fig. 8. After applying the pseudocode seen in Fig. 8 to the datasets used in the study, classification success is achieved.

4 Experimental Results

In this study, GT730 4 GB video card, 16 GB memory, and i5 processor were used. MATLAB platform is used for the coding process. Accuracy, Recall, Specificity, Precision, Negative Predictive Value (NPV), Matthews Correlation Coefficient, and AUC values were applied to evaluation. Accuracy is the ratio of correct estimates to total estimates, as seen in formula 2. If we have a symmetrical data set, i.e., positive and negative classes show a balanced distribution, this metric is suitable for evaluation [53]. Another metric is recall, also known as sensitivity. This metric is preferred when identifying positive cases is crucial. It is useful in situations such as detecting whether there is a fatal disease or not. Represents the ratio of true negatives to total negatives within the specificity data set [54]. This metric is preferred in situations where it is important to detect all true negatives. For example, if there is a situation where people with positive test results will be punished, it would be more meaningful to look at the results of this metric. The precision metric is the ratio of true positives to total predicted positives. It is preferred in cases where false positives are riskier. NPV is preferred in situations where false negatives are more risky, unlike Precision [55]. The Matthews Correlation Coefficient (MCC) is used in machine learning as a measure of the quality of binary classification [56]. It is generally considered a balanced metric that can be used even when classes are of very different sizes [57]. MCC returns values between − 1 and 1. A coefficient of 1 indicates a perfect prediction, while 0 is no better than a random guess, and -1 indicates a discrepancy between the prediction and the observation. With all these metrics, ROC curves are also given for the methods that produce the most successful results. AUC values are performance indicators in the ROC curves shown in Fig. 10. It is also stated that the AUC value is more sensitive than the accuracy of performance measurement [58].

These performance metrics are shown in (4)–(9). TP, TN, FP, and FN mean true positive, true negative, false positive, and false negative, respectively.

$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{(\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN})}*100$$

(4)

$$\mathrm{Recall}=\frac{\mathrm{TP}}{(\mathrm{TP}+\mathrm{FN})}*100$$

(5)

$$\mathrm{Specificity}=\frac{\mathrm{TN}}{(\mathrm{TN}+\mathrm{FP})}*100$$

(6)

$$\mathrm{Precision}=\frac{\mathrm{TP}}{(\mathrm{TP}+\mathrm{FP})}*100$$

(7)

$$\mathrm{NPV}=\frac{\mathrm{TN}}{(\mathrm{TN}+\mathrm{FN})}*100$$

(8)

$$\mathrm{MCC}=\frac{\mathrm{TP}*\mathrm{TN}-\mathrm{FP}*\mathrm{FN}}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$$

(9)

Results of four of the six classification methods used in Table 3 that produced the most successful results are given. When the table is examined, it would be correct to say that the kernel SVM method, which misdiagnosed 17 of the positive samples and 21 of the negative samples, produced the best results. While this method has 38 defective detections in total, the kNN method has 43, and the ensemble subspace kNN method has 44 defective detections.

Table 3 Confusion matrices were obtained for the four most successful methods for Dataset-1

Full size table

Table 4 shows the confusion matrices of the most successful classification methods obtained for Dataset 2. Here, as in Dataset 1, the kernel SVM method produced the most successful result. In the results obtained from the Kernel SVM method, it is seen that 102 samples were classified incorrectly and 644 samples were classified correctly.

Table 4 Confusion matrices were obtained for the four most successful methods for Dataset-2

Full size table

In our study, a mixed data set was created, unlike many studies in this area. This data set was created by combining the samples from two different data sets we used. The confusion matrix of the results obtained for the mixed data set is shown in Table 5. It is possible to see that the highest success here was obtained with the kernel SVM method with an accuracy rate of 94.5%, followed by the subspace kNN method with 94.1%. It is possible to say that the success rates achieved are remarkably good.

Table 5 Confusion matrices were obtained for the four most successful methods for Mixed Dataset

Full size table

When Fig. 9 is examined, it is seen that some positive images are classified as negative, and some negative images are classified as positive. When looking at the b and c images, which are negative, it is seen that there are various anomalies in the lungs. These people may have had different ailments before or may have habits that adversely affect their lungs (for example, smoking). And the structure of these lung images is similar to that of patients with COVID-19. On the other hand, a and d images that are positive are classified as negative by the system. When looking at the image in d from these images, it is understood that any anomaly has not yet occurred clearly on the image. This may also be the early stages of the disease. Therefore, the system also predicted this image incorrectly.

Table 6 shows the result values of six different metrics that we have evaluated when the purposes of use above are more meaningful. It can be seen from the table that the k-SVM method produces more successful accuracy values compared to other methods. However, it is also noteworthy that the precision results of kNN and ensemble kNN methods are higher than k-SVM. It is also a point to be considered that naive Bayes and decision tree methods produce very low results compared to other methods. In the last column of Table 4, the number of incorrectly classified samples can be seen. The kNN, k-SVM, and ensemble kNN methods produced 43, 38, and 44 erroneous results, respectively. When the MCC values are examined, it is seen that the k-SVM and kNN methods give the best results with 0.97 values.

Table 6 Classification results according to different metrics for Dataset 1

Full size table

The results of 6 different metrics obtained for Dataset 2 are shown in Table 7. Similar to the previous table, it is seen that the most successful results are obtained with the k-SVM method. Again, as can be seen from the table, the MCC value in the k-SVM method is much higher than in other methods.

Table 7 Classification results according to different metrics for Dataset 2

Full size table

The results for the mixed data set are shown in Table 8. According to the results obtained with mixed data, the highest rate of correct classification was in the k-SVM method with 94.5% accuracy and 0.89 MCC values.

Table 8 Classification results according to different metrics for mixed dataset

Full size table

It was stated that ROC curves are an effective method used in performance measurements. The graphs below also show the ROC curves of four different methods that produce the most successful results. When we look at the AUC values below these curves, the k-SVM is the most successful method with a 1.00 AUC value for dataset 1 (Fig. 10).

As seen in Fig. 11, the AUC value for the second data set was obtained with the k-SVM method and its value was 0.91. The method that follows this method as the most successful method is subspace kNN.

In Table 6, it is seen that the accuracy value of the kNN method is higher than the ensemble methods. However, when looking at Fig. 10, it is striking that the AUC values of the ensemble methods are higher than the kNN method. From this, it can be inferred that making a direct performance comparison based on accuracy is not always appropriate.

The model we realized in our study was recorded and a simple GUI was created using this model as shown in Fig. 12. We have stated before that we aim to support medical professionals. The main purpose of this GUI is to provide an application that can help experts in their decisions. In practice, all the medical professionals need to do is click Load an Image for COVID-19 Diagnosis and select one of the CT images on their computer.

After that, the selected image is tested with the trained model, and the result predicted by the system is written in the Text Area section under the GUI. Also, a segmented version of the image is shown on the right of the original image. It is thought that this segmented image will contribute to the evaluation of the patient’s CT image. As a result of synchronizing the result CT images sent from the tomography devices to the computer used by the radiologist or to the hospital information management system software, the result of the COVID-19 diagnosis produced by our system can be automatically displayed on the screen of the radiologist.

5 Discussion

The main aim of this study is to support medical professionals in their decisions. The results obtained in our study support this goal. Also, thanks to the simple GUI developed for medical professionals to use easily, the diagnosis of COVID-19 from CT images can be made with 98.5%, 86.3%, and 94.5% accuracy for Dataset-1, Dataset-2, and Mixed Dataset, respectively. It is seen that the realization of the results obtained in the study on the images obtained by the segmentation process has a positive effect on the results.

Table 9 shows the success rates and methods used by those who use the data set we use among the studies on COVID-19 in the literature. It is seen that from the table, the results in our study are higher than the results obtained in other studies. This highlights the originality and effectiveness of our study.

Table 9 Comparison of studies in the literature

Full size table

Training and prediction times are shown in Table 10 to get a better idea about the methods we use in our study. When the results obtained in the study are analyzed, it is possible to see that the kernel SVM method achieves the highest accuracy rate. The closest followers of this method are kNN and subspace kNN methods. However, these successes in accuracy rates do not mean that these methods are faster than other methods. The kernel SVM method produces results faster than other classifiers used in the training phase. In estimating a new example, the k-naive Bayes and ensemble kNN methods are the methods that produce the fastest results. Decision trees take the longest time to estimate a new sample. However, it is possible to say that the estimation time of the k-SVM method, which is the method with the highest accuracy, can also provide support to medical professionals in making quick decisions.

Table 10 Training and estimation times of the classification methods used

Full size table

6 Conclusion

The method we propose in this study, which we have carried out, is to ensure the detection of the COVID-19 virus in CT images. A CT image given to the proposed method can be classified as COVID-19 positive or negative. An effective model that enables the detection of the COVID-19 virus from CT images on segmented images is proposed. Two different data sets were used in the study. In addition, a mixed data set was created by combining these two different datasets. The datasets used are the most comprehensive and publicly accessible. In our study, a model was created by using many methods of image processing and artificial learning methods. A tenfold cross-validation method was applied to test the accuracy of the model. According to the model results, the highest success value was obtained for dataset 1 using the kernel SVM method with an accuracy of 98.5%. For data set 2, this rate was determined as 86.3%. In the mixed data set, a high accuracy rate of 94.5% was obtained. Matthews Correlation Coefficient (MCC) metric was also used in our study and 0.97, 0.73, 0.89 MCC values were obtained for Dataset 1, Dataset 2, Mixed Dataset, respectively. This shows the effectiveness of the results obtained. Also, a user-friendly GUI was created from the created model. Thanks to this GUI, medical professionals will be able to perform the segmentation of CT images and will also receive an answer to the question of whether COVID is positive or negative by the system.

Considering the results obtained, it is seen that the system we propose achieves higher success than other studies in the literature. Accurate and rapid detection of suspected COVID-19 cases, and thus timely quarantine and medical treatment is the important process. The main purpose of our study is to contribute to this situation and this contribution is achieved with the resulting GUI.

There is no detailed clinical data related to the data set we used in our study. We can say this is a limitation of our work. Adequate demographic and clinical information of patients can also be made in many different analyzes. For example, if there were data on the patient’s disease history or chronic disease information, evaluations could be made on the interactions of COVID-19 with other diseases. We plan to create a data set suitable for this situation and to implement these different analyzes in our future studies. Another limitation is that the CT images in the study do not have information on which day of the disease it belongs to. If we have such information, analysis of the daily course of the disease can also be done with computer-aided studies on CT images. In addition to these, future goals of this study include expanding the scope of this study with different imaging methods (x-Ray, Ultrasound, MR, etc.). In this way, an application that produces solutions for all kinds of images will be obtained.

Code availability

As a result of this study, a GUI that can automatically detect COVID-19 has been created. This GUI can be accessed at https://github.com/asaygili/COVID_Detection.

References

W. H. O.: Coronavirus disease (COVID-19) pandemic. https://covid19.who.int/. Accessed 25 Apr 2021 ((2020))
Fauci, A.S.; Lane, H.C.; Redfield, R.R.: Covid-19—navigating the uncharted. ed: Mass Medical Soc (2020)
Velavan, T.P.; Meyer, C.G.: The COVID-19 epidemic. Trop. Med. Int. Health 25, 278 (2020)
Article Google Scholar
Rodriguez-Morales, A.J.; Cardona-Ospina, J.A.; Gutiérrez-Ocampo, E.; Villamizar-Peña, R.; Holguin-Rivera, Y.; Escalera-Antezana, J.P., et al.: Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med. Infect. Dis. 34, 101623 (2020)
Article Google Scholar
Borghesi, A.; Maroldi, R.: COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression. Radiol. Med. (Torino) 125, 509–513 (2020)
Article Google Scholar
Fang, Y.; Zhang, H.; **e, J.; Lin, M.; Ying, L.; Pang, P., et al.: Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology 296, E115–E117 (2020)
Article Google Scholar
Wong, H.Y.F.; Lam, H.Y.S.; Fong, A.H.-T.; Leung, S.T.; Chin, T.W.-Y.; Lo, C.S.Y., et al.: Frequency and distribution of chest radiographic findings in patients positive for COVID-19. Radiology 296, E72–E78 (2020)
Article Google Scholar
Kanne, J.P.; Little, B.P.; Chung, J.H.; Elicker, B.M.; Ketai, L.H.: Essentials for radiologists on COVID-19: an update—radiology scientific expert panel. ed: Radiological Society of North America (2020)
**e, X.; Zhong, Z.; Zhao, W.; Zheng, C.; Wang, F.; Liu, J.: Chest CT for typical coronavirus disease 2019 (COVID-19) pneumonia: relationship to negative RT-PCR testing. Radiology 296, E41–E45 (2020)
Article Google Scholar
Devi, S.S.; Singh, N.H.; Laskar, R.H.: Fuzzy C-means clustering with histogram based cluster selection for skin lesion segmentation using non-dermoscopic images. Int. J. Interact. Multimed. Artif. Intell. 6, 26–31 (2020)
Google Scholar
Lee, E.Y.; Ng, M.-Y.; Khong, P.-L.: COVID-19 pneumonia: what has CT taught us? Lancet. Infect. Dis 20, 384–385 (2020)
Article Google Scholar
Shi, H.; Han, X.; Jiang, N.; Cao, Y.; Alwalid, O.; Gu, J., et al.: Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. Lancet. Infect. Dis 20, 425–434 (2020)
Article Google Scholar
Borakati, A.; Perera, A.; Johnson, J.; Sood, T.: Diagnostic accuracy of X-ray versus CT in COVID-19: a propensity-matched database study. BMJ Open 10, e042946 (2020)
Article Google Scholar
Ng, M.-Y.; Lee, E.Y.; Yang, J.; Yang, F.; Li, X.; Wang, H., et al.: Imaging profile of the COVID-19 infection: radiologic findings and literature review. Radiol. Cardiothoracic Imaging 2, e200034 (2020)
Article Google Scholar
Angelov, P.; Soares, E.: Explainable-by-design approach for covid-19 classification via ct-scan. medRxiv (2020)
Hasan, N.; Bao, Y.; Shawon, A.: DenseNet convolutional neural networks application for predicting COVID-19 using CT Image (2020)
Silva, P.; Luz, E.; Silva, G.; Moreira, G.; Silva, R.; Lucio, D., et al.: COVID-19 detection in CT images with deep learning: a voting-based scheme and cross-datasets analysis. Inform. Med. Unlock. 20, 100427 (2020)
Article Google Scholar
Soares, E.; Angelov, P.; Biaso, S.; Froes, M.H.; Abe, D.K.: SARS-CoV-2 CT-scan dataset: a large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv, p. 2020.04.24.20078584 (2020)
Yazdani, S.; Minaee, S.; Kafieh, R.; Saeedizadeh, N.; Sonka, M.: Covid ct-net: predicting covid-19 from chest ct images using attentional convolutional network. ar**v preprintar**v:2009.05096 (2020)
Abbas, A.; Abdelsamea, M M.; Gaber, M.M.: Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell., pp. 1–11 (2020)
Joshi, R.C.; Yadav, S.; Pathak, V.K.; Malhotra, H.S.; Khokhar, H.V.S.; Parihar, A., et al.: A deep learning-based COVID-19 automatic diagnostic framework using chest X-ray images. Biocybern. Biomed. Eng. 41, 239–254 (2021)
Article Google Scholar
Linda, W.: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images. J. Netw. Comput. Appl. 20, 1–12 (2020)
Google Scholar
Ozturk, T.; Talo, M.; Yildirim, E.A.; Baloglu, U.B.; Yildirim, O.; Acharya, U.R.: Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020)
Article Google Scholar
Almalki, Y.E.; Qayyum, A.; Irfan, M.; Haider, N.; Glowacz, A.; Alshehri, F.M., et al.: A novel method for COVID-19 diagnosis using artificial intelligence in chest X-ray images. Healthcare 9, 522 (2021)
Article Google Scholar
Irfan, M.; Iftikhar, M.A.; Yasin, S.; Draz, U.; Ali, T.; Hussain, S., et al.: Role of hybrid deep neural networks (HDNNs), computed tomography, and chest X-rays for the detection of COVID-19. Int. J. Environ. Res. Public Health 18, 3056 (2021)
Article Google Scholar
Kassani, S.H.; Kassasni, P.H.; Wesolowski, M.J.; Schneider, K.A.; Deters, R.: Automatic detection of coronavirus disease (covid-19) in x-ray and ct images: a machine learning-based approach. ar**v preprintar**v:2004.10641 (2020)
Maghdid, H.S.; Asaad, A.T.; Ghafoor, K.Z.; Sadiq, A.S.; Khan, M.K.: Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms. ar**v preprint ar**v:2004.00038(2020)
Wang, Z.; Liu, Q.; Dou, Q.: Contrastive cross-site learning with redesigned net for COVID-19 CT classification. IEEE J. Biomed. Health Inform. 24, 2806–2813 (2020)
Article Google Scholar
Jaiswal, A.; Gianchandani, N.; Singh, D.; Kumar, V.; Kaur, M.: Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J. Biomol. Struct. Dyn., pp. 1–8 (2020)
Hu, S.; Gao, Y.; Niu, Z.; Jiang, Y.; Li, L.; **ao, X., et al.: Weakly supervised deep learning for covid-19 infection detection and classification from ct images. IEEE Access 8, 118869–118883 (2020)
Article Google Scholar
Wu, Y.-H.; Gao, S.-H.; Mei, J.; Xu, J.; Fan, D.-P.; Zhao, C.-W., et al.: JCS: an explainable COVID-19 diagnosis system by joint classification and segmentation. ar**v preprint ar**v:2004.07054(2020)
Sun, L.; Mo, Z.; Yan, F.; **a, L.; Shan, F.; Ding, Z., et al.: Adaptive feature selection guided deep forest for covid-19 classification with chest ct. IEEE J. Biomed. Health Inform. 24, 2798–2805 (2020)
Article Google Scholar
Mishra, N.K.; Singh, P.; Joshi, S.D.: Automated detection of COVID-19 from CT scan using convolutional neural network. Biocybern. Biomed. Eng. (2021)
Brunese, L.; Martinelli, F.; Mercaldo, F.; Santone, A.: Machine learning for coronavirus COVID-19 detection from chest x-rays. Procedia Comput. Sci. 176, 2212–2221 (2020)
Article Google Scholar
Barstugan, M.; Ozkaya, U.; Ozturk, S.: Coronavirus (covid-19) classification using ct images by machine learning methods. ar**v preprint ar**v:2003.09424(2020)
Mohammed, M.A.; Abdulkareem, K.H.; Garcia-Zapirain, B.; Mostafa, S.A.; Maashi, M.S.; Al-Waisy, A.S., et al.: A comprehensive investigation of machine learning feature extraction and classification methods for automated diagnosis of covid-19 based on x-ray images. Comput. Mater. Contin. 66 (2020)
Khan, N.; Ullah, F.; Hassan, M.A.; Hussain, A.: COVID-19 classification based on Chest X-Ray images using machine learning techniques. J. Comput. Sci. Technol. Stud. 2, 01–11 (2020)
Google Scholar
Yang, X.; He, X.; Zhao, J.; Zhang, Y.; Zhang, S.; **e, P.: COVID-CT-dataset: a CT scan dataset about COVID-19. ar**v preprint ar**v:2003.13865(2020)
Lim, J.S.; Oppenheim, A.V.: Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67, 1586–1604 (1979)
Article Google Scholar
Dalal, N.; Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 886–893 (2005)
Arya, S.; Mount, D.M.; Netanyahu, N.S.; Silverman, R.; Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM 45, 891–923 (1998)
Article MathSciNet Google Scholar
Glowacz, A.: Ventilation diagnosis of angle grinder using thermal imaging. Sensors 21, 2853 (2021)
Article Google Scholar
Barandela, R.; Valdovinos, R.M.; Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)
Article MathSciNet Google Scholar
Opitz, D.; Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
Article Google Scholar
Bay, S.D.: Combining nearest neighbor classifiers through multiple feature subsets. In: ICML, pp. 37–45 (1998)
Ashour, A.S.; Guo, Y.; Hawas, A.R.; Xu, G.: Ensemble of subspace discriminant classifiers for schistosomal liver fibrosis staging in mice microscopic images. Health Inf. Sci. Syst. 6, 1–10 (2018)
Article Google Scholar
Kuncheva, L.I.; Rodríguez, J.J.; Plumpton, C.O.; Linden, D.E.; Johnston, S.J.: Random subspace ensembles for fMRI classification. IEEE Trans. Med. Imaging 29, 531–542 (2010)
Article Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, pp. 41–46 (2001)
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24, 1565–1567 (2006)
Article Google Scholar
Hofmann, M.: Support vector machines-kernels and the kernel trick. Notes 26, 1–16 (2006)
Google Scholar
Schölkopf, B.; Smola, A.J.; Bach, F.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Hussain, M.; Wajid, S.K.; Elzaart, A.; Berbar, M.: A comparison of SVM kernel functions for breast cancer detection. In: 2011 Eighth International Conference Computer Graphics, Imaging and Visualization, pp. 145–150 (2011)
Šimundić, A.-M.: Measures of diagnostic accuracy: basic definitions. EJIFCC 19, 203–211 (2009)
Google Scholar
van Stralen, K.J.; Stel, V.S.; Reitsma, J.B.; Dekker, F.W.; Zoccali, C.; Jager, K.J.: Diagnostic methods I: sensitivity, specificity, and other measures of accuracy. Kidney Int. 75, 1257–1263 (2009)
Article Google Scholar
Coulthard, M.G.: Quantifying how tests reduce diagnostic uncertainty. Arch. Dis. Child. 92, 404–408 (2007)
Article Google Scholar
Boughorbel, S.; Jarray, F.; El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12, e0177678 (2017)
Article Google Scholar
Khan, K.B.; Siddique, M.S.; Ahmad, M.; Mazzara, M.: A hybrid unsupervised approach for retinal vessel segmentation. BioMed Res. Int. 2020 (2020)
Fan, J.; Upadhye, S.; Worster, A.: Understanding receiver operating characteristic (ROC) curves. Can. J. Emerg. Med. 8, 19–20 (2006)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Research Fund of the Tekirdag Namık Kemal University. Project Number: NKUBAP.06.GA.21.317

Author information

Authors and Affiliations

Computer Engineering Department, Tekirdağ Namık Kemal University, Silahtarağa Mahallesi Üniversite 1.Sokak, No:13, 59860, Çorlu, Tekirdağ, Turkey
Ahmet Saygılı

Authors

Ahmet Saygılı
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmet Saygılı.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saygılı, A. Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier. Arab J Sci Eng 47, 2435–2453 (2022). https://doi.org/10.1007/s13369-021-06240-z

Download citation

Received: 02 June 2021
Accepted: 20 September 2021
Published: 07 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s13369-021-06240-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier

Abstract

Similar content being viewed by others

Covid-19 Classification Based on Gray-Level Co-occurrence Matrix and Support Vector Machine

A Hybrid MSVM COVID-19 Image Classification Enhanced with Swarm Feature Optimization

COVID-19 CT-images diagnosis and severity assessment using machine learning algorithm

1 Introduction

2 Related Work