Introduction

Coronavirus disease 2019 (COVID-19), or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was initially identified in China in 2019 and subsequently spread worldwide, exploding into a pandemic [1]. The respiratory system is the most frequently involved organ in COVID-19 [2]. However, COVID-19 is also known to cause multiorgan system injury, including thromboembolic, neurologic, cardiac, nephrogenic, hepatic, gastrointestinal, endocrinological, and dermatological symptoms, through various pathological mechanisms related to immunity, inflammation, and fibrosis [3, 4]. Computed tomography (CT) plays a crucial role in assessing the severity and extent of lung involvement by COVID-19 [5], whereas the standard confirmatory test for SARS-CoV-2 infection is reverse-transcription polymerase chain reaction (RT-PCR) assay [6]. Characteristic imaging findings of acute lung involvement by COVID-19 include parenchymal or ground-glass opacities with peripherally and lower-lung predominant distribution, crazy-paving appearance, reversed-halo appearance, and subpleural sparing [7,8,9,10,11,12]. Subpleural sparing along with Hampton’s hump and triangular wedge-shaped opacities could suggest the presence of underlying coagulopathies that are commonly seen in COVID-19 patients, and scrutiny on these findings could result in better diagnostic accuracy for COVID-19 infection (Fig. 1) [12]. Interstitial fibrosis is observed as a late sequela of COVID-19 infection [13].

Fig. 1
figure 1

Chest CT images of COVID-19 in a 65-year-old man (“severe” clinical severity) (a, b). Subpleural ground-glass opacity is surrounded by dense linear opacity, corresponding with “reversed CT halo sign” (a white arrows). The subpleural sparing of ground-glass opacities is noted (subpleural sign: black arrows) [12]. One reader rated a score of 9 in total severity score (TSS), 17 in chest CT score (CCTS), and 22 in CT severity score (CTSS), and the other reader rated a score of 12 in TSS, 16 in CCTS, and 21 in CTSS. Chest CT images of COVID-19 in a 25-year-old man (“mild” clinical severity) (cf). The round ground-glass nodule is observed in the peripheral area of the right upper lobe (a arrow), and several solid nodules are seen in the peripheral lung of the left upper and lower lobe (bd arrows). Both readers rated a score of 1 for the right upper, left upper, and lower lung in TSS and CCTS. In CTSS, one reader rated a score of 1 for the right and left upper lobe and 2 for the left lower lung, whereas the other reader rated a score of 1 for the right upper, left lower, and left lower lung because one of the lesions in the left lower lung was located at the border of segments 6 and 10 (d arrow), and one reader rated a score of 1 for two regions

To standardize the subjective assessment of the degree of acute COVID-19 lung involvement, some different semiquantitative CT severity scoring systems have been proposed. The total severity score (TSS), proposed by Li [14], has five grades of severity: 0%, 1–25%, 26–50%, 51–75%, and 75–100% involvement for five lung lobes. The chest CT score (CCTS), proposed by Li [15], is the same as the scoring system developed for severe acute respiratory syndrome [16], which requires diagnostic readers to rate severity by six grades: 0%, < 5%, 5–25%, 26–49%, 50–75%, and > 75% involvement for five lung lobes. The CT severity score (CTSS) proposed by Yang [17] has three severity grades: 0% or absence of involvement, < 50% involvement, and ≥ 50% involvement thresholds, for 20 regions of the lung. Therefore, compared with TSS or CCTS, CTSS requires a simpler grading of severity but, as a trade-off, a more complex assessment for more divided lung regions. Each semiquantitative scoring system should have individual advantages and disadvantages; however, there has been no research to compare the accuracy and efficacy of these different methods.

The purpose of this study is to compare the clinical usefulness of the three different semiquantitative CT severity scoring systems. We evaluate their interobserver agreement, time required for evaluation, and degree of correlation with the clinical severity as well as the computer-calculated quantitative CT severity of the lung involvement.

Materials and methods

Patients

This retrospective study was approved by our institution (National Hospital Organization Nishisaitama-Chuo National Hospital), and written informed consent was waived. We enrolled 108 patients diagnosed with COVID-19 infection by RT-PCR from respiratory tract specimens in our single institution from March 2020 to October 2020. We excluded patients with a history of lung surgery. A pulmonologist (Y.H.) abstracted patient age, sex, body weight, height, body mass index, duration from initial symptom to CT examination, and clinical severity at admission. Clinical severity at admission was classified into binary grades of mild and severe: severe grade was defined as < 93% of percutaneous oxygen saturation or requiring oxygen inhalation. Each patient’s risk of develo** critical illness was assessed and categorized into three groups (low, moderate, and high) using a predictive scoring system reported by Liang et al. [18].

CT acquisition and reconstruction

Among the 108 enrolled patients, 93 were scanned in our hospital using a 64-row detector CT scanner (Aquilion 64, Canon Medical Systems). We performed scans during inspiratory breath holding using the following parameters: 512–512 matrix, 250–370 mm field of view, and 120 kVp. We reconstructed lung setting images with a slice thickness of 5 mm using FC52 kernel. Fifteen patients were scanned outside our hospital using various CT scanners and different parameters. We reconstructed lung setting images with a slice thickness of 5 mm (n = 14) or 3 mm (n = 1).

Semiquantitative scoring system

Table 1 summarizes three different CT-based semiquantitative scoring systems (TSS, CCTS, and CTSS) assessed in this study. TSS and CCTS were scored using the original methods proposed in previous articles [14, 15]. Scores for TSS and CCTS were rated for five pulmonary lobes. For TSS, scores of 0, 1, 2, 3, and 4 were assigned if parenchymal opacification involved 0%, 1–25%, 26–50%, 51–75%, or 76–100%, respectively. For CCTS, scores of 0, 1, 2, 3, 4, and 5 were rated if parenchymal opacification involved 0%, < 5%, 5–25%, 26–49%, 50–75%, and ≥ 75% (Table 1).

Table 1 Summary of the three evaluated semiquantitative scoring systems in this study

We made a minor modification to CTSS and developed a modified CTSS. The original CTSS has three severity grades: 0% or absence of involvement, < 50% involvement, and ≥ 50% involvement thresholds, for 20 regions of the lung. The original CTSS was intended to make the bilateral lung segments symmetrical and thus subdivided the left apico-posterior segment (S1 + 2) and left anterior basal segment (S8) into two different segments, respectively [17]. However, we encountered cases in which it was difficult to define the border of two regions subdivided from the left S8 as we have scored COVID-19 lung involvement. Therefore, we modified this scoring system using 19 instead of 20 segments, including 10 right-lung segments and 9 left-lung segments, subdividing only the left S1 + 2 into S1 and S2 (Table 1 and Fig. 1).

Reading session

The semiquantitative scores of the three different systems were rated independently by two board-certified radiologists (with 7 and 12 years of experience in thoracic radiology, respectively) in three different sessions. The readers knew only that all patients were positive for COVID-19 infection as confirmed by RT-PCR and were blinded to other clinical information of the patients. The readers rated the score of each patient for each semiquantitative scoring system in three reading sessions, without taking into consideration the confidence level of suspicion for COVID-19 infection. The readers scored the lung lesions only when they thought the findings were related to COVID-19 infection; other lung lesions (e.g., atelectasis and lung nodules/masses) were not considered. Abnormal findings outside the lungs were not described in this session. Readers received prestudy training to rate three sample cases before each session. Interpretation times were recorded in all cases. To avoid recall bias, each reading session was separated by at least 2 weeks.

Automatic quantitative measurement

We performed a quantitative analysis using Python, with a script written by one of the authors (H.T.). The voxel volume of each lung lobe was automatically calculated using U-net (LTRCLobes_R231; model available on GitHub, https://github.com/JoHof/lungmask). The R231 model performs segmentation on individual slices and extracts the right-left lung separately with good performance when dense structures including tumors and consolidation exist. The trachea was not included in the lung segmentation. LTRCLobes performs segmentation of individual lung lobes with limited performance when dense structures exist. The LCRCLober_R231 model runs the R231 and LTRCLobes model and fuses the results [19], in which false negatives from LTRCLobes are filled by R231 predictions and mapped to a neighbor label, whereas false positives from LTRCLobes are removed (Fig. 2a, b).

Fig. 2
figure 2

Masking images of each segmented lobe. Chest CT images of COVID-19 in a 60-year-old man (“severe” clinical severity) show the multiple peripheral rounded opacities abutting the pleura (a, 1–4: white arrows). The lesions in the left lower lobe demonstrate bulging opacity presumably indicative of Hampton’s hump sign (a, 4: black arrow), and triangular wedge-shaped opacities (a, 2, 3: black arrowheads). These imaging findings are presumably indicative of infarct [12]. Solid nodules in the right middle lobe and left upper lobe (a, 2 and 3: white arrowheads) were considered unrelated to COVID-19. In TSS, both readers recorded 2/1/1/1/2 for the right upper, right middle, right lower, left upper, and left lower lobes, and the patient-level score was 7/20. In CCTS, two readers rated 3/1/2/1/4 and 3/1/2/1/3 for each lung, and the patient-level scores were 11/25 and 10/25, respectively. In CTSS, two readers rated 4/1/5/3/6 and 4/2/5/2/4 for each lobe, and the patient-level scores were 19/38 and 17/38, respectively. The masking images of each segmented lobe (b, 1–4) are the corresponding slices of chest CT (a, 1–4). The U-net model segments the lung lobes using different colors: right upper lobe, green; right middle lobe, yellow green; right lower lobe, yellow; left upper lobe, dark blue; and left lower lobe, blue green. The images demonstrating the involved areas (c, 1–4) are the corresponding slices (a and b, 1–4). The voxel volumes of areas with CT numbers ranging between − 750 and − 1 were extracted from individual segmented lung lobes (c, 1-4). The quantitative dense area ratios were 23.8 in the right upper lobe, 9.0 in the right middle lobe, 21.3 in the right lower lobe, 12.9 in the left upper lobe, and 30.3 in the left lower lobe

The voxel volume of areas with CT values ranging from − 750 to − 1 was extracted from the individual segmented lung lobe(s) (Fig. 2c). The quantitative dense area ratio (QDAR) of the lung lobe(s) was calculated using the following formula:

$$\mathit{QDAR}=\frac{\mathit{voxel\ volume\ of\ area\ with\ CT\ value\ ranging\ from}-750\ \mathit{to}-1\ \mathit{in\ lung\ lobe}(\mathit{s})}{\mathit{voxel\ volume\ of\ lung\ lobe}(\mathit{s})}$$

Statistical analysis

We used intraclass coefficient correlation (ICC) class 2 to assess interobserver agreement of the semiquantitative scoring systems. The agreement outcomes were classified as follows: < 0.50, poor agreement; 0.50–0.75, fair agreement; 0.75–0.90, good agreement, and 0.90–1.00, excellent agreement. To compare the reading time among the three scoring systems, we performed one-way analysis of variance (ANOVA) and paired t test with Bonferroni correction to compare differences among groups if the one-way ANOVA revealed a significant difference.

We analyzed the relationship between the three different semiquantitative systems and clinical severity at admission (mild vs. severe) using receiver-operating characteristic (ROC) analysis and compared by DeLong test with Bonferroni correction. The cutoff value was determined with the Youden index. Additionally, the relationship between the three different semiquantitative systems and patients’ risk of develo** critical illness (low vs. moderate/high) was analyzed using ROC analysis in the same manner.

We analyzed the correlation between three semiquantitative scores and QDAR using Spearman’s rank correlation coefficient at both the patient and lobe levels. In the per-patient-level analysis, we evaluated the correlation between the total score of the semiquantitative scale and QDAR. For the lobe-level correlation analysis, the score of the semiquantitative scale was standardized using the following formula:

$$Standerdized\mathit\;score\mathit\;(TSS,\;CCTS,\;CTSS)=\frac{The\mathit\;total\mathit\;score\mathit\;of\mathit\;the\mathit\;lobe(s)\mathit\;rated\mathit\;by\mathit\;the\mathit\;reader}{The\mathit\;expected\mathit\;maximum\mathit\;total\mathit\;score\mathit\;of\mathit\;the\mathit\;lobe(s)}$$

We compared the difference in QDAR among the neighboring lobe-level score categories of 0–4 in TSS and 0–5 in CCTS using a t test with Bonferroni correction. A p value < 0.05 was considered significant. Statistical analysis was conducted using open-source statistical software (version 3.6.3, R).

Results

A total of 108 patients (46 ± 20 years old; male:female = 59:49) were enrolled in this study. Body weight, height, and body mass index were measured in 97 patients (89.8%) and were 66.0 ± 16.4 kg, 164.0 ± 9.5 cm, and 24.3 ± 4.6, respectively (Table 2). The duration from initial symptoms to CT examination was available in 104 (96.3%) patients and 4.8 ± 3.9 days. Fourteen patients (13%) had severe clinical severity on admission (i.e., patients who required oxygen inhalation or who had SpO2 < 93%), and 94 patients (87%) had mild clinical severity (Table 2). Higher scores were observed in the lower lobe than in the upper and middle lobes in all semiquantitative scoring systems by both readers (Table 3).

Table 2 Patient demographics
Table 3 Results of semiquantitative scores and automatic quantitative measurement at the patient and lobe levels for each semiquantitative scoring system

Patient-level interobserver agreement of the three semiquantitative scoring systems showed excellent agreement (ICC: 0.952–0.970, p < 0.001). Lobe-level interobserver agreement showed excellent agreement in CCTS and CTSS (0.916–0.936, p < 0.001) and good agreement in TSS (0.882, p < 0.001; Table 4). The average required time for each case was 25.7 ± 10.2 s for TSS, 27.7 ± 11.7 s for CCTS, and 48.9 ± 28.8 s for CTSS for reader 1 and 41.7 ± 14.9 s for TSS, 39.5 ± 11.7 s for CCTS, and 80.0 ± 37.7 s for CTSS for reader 2. One-way ANOVA indicated a significant difference among the three scoring systems for both readers (p < 0.001). In the pairwise comparison using a t test, CTSS required significantly more time than TSS and CCTS did in both readers (p < 0.001).

Table 4 Interobserver agreement of the three semiquantitative scoring systems

Table 1 shows the respective sensitivity, specificity, and cutoff values as calculated by the Youden index for clinical severity at admission. There was no significant difference in AUC for the clinical severity at admission among the three semiquantitative scoring systems for both readers (Table 1 and Fig. 3).

Fig. 3
figure 3

Receiver-operating characteristic curves for the clinical severity at admission by semiquantitative scoring systems. The receiver-operating characteristic curve is almost similar among the three semiquantitative scoring systems, and the areas under the curve of TSS, CCTS, and CTSS are 0.855 (95% CI 0.732–0.979), 0.853 (95% CI 0.729–0.978), and 0.853 (95% CI 0.726–0.980) for reader 1 (a) and 0.842 (95% CI 0.721–0.963), 0.850 (95% CI 0.723–0.977), and 0.836 (95% CI 0.713–0.960) for reader 2 (b), respectively. a Reader 1, b reader 2. CCTS, chest CT score; CTSS, modified CT severity score; TSS, total severity score

The risk of develo** critical illness was assessed in 76% of patients (82/108); the risk could not be calculated in the remaining 24% of patients (26/108) due to the lack of one or more necessary clinical variables. Among the 82 patients, 49% (40/82), 50% (41/82), and 1% (1/82) were categorized as having a low, moderate, or high risk of develo** critical illness, respectively. AUC for differentiating the risk of develo** critical illness (low vs. moderate/high) of TSS, CCTS, and CTSS were 0.792, 0.818, and 0.786 in reader 1 and 0.788, 0.802, and 0.792 in reader 2, respectively (Fig. 4). There were no significant differences among the three semiquantitative scoring systems for both readers.

Fig. 4
figure 4

Receiver-operating characteristic (ROC) curves for predicting the risk of develo** critical illness using the semiquantitative scoring systems. The ROC curves were very similar among the three semiquantitative scoring systems, but CCTS demonstrated the highest area under the curve (R1: 0.818 [95% CI: 0.728–0.907] and R2: 0.802 [95% CI: 0.708–0.896]) compared to TSS (R1: 0.792 [95% CI: 0.697–0.888] and R2: 0.788 [95% CI: 0.693–0.883]) and CTSS (R1: 0.786 [95% CI: 0.689–0.883] and R2: 0.792 [95% CI: 0.696–0.888]). a Reader 1, b reader 2. CCTS, chest CT score; CTSS, modified CT severity score; TSS, total severity score

All three semiquantitative scoring systems were significantly well correlated with the QDAR for both patient-level correlation and lobe-level correlation (Table 5 and Fig. 5). For the patient level, CCTS showed the highest correlation with the QDAR, followed by TSS with the second highest correlation and CTSS with the lowest correlation for both readers. Five cases showed a QDAR > 50, and three out five cases had a low semiquantitative score (0–1) in each scoring system. These three cases showed mild diffuse mosaic-like increased attenuation in the lung parenchyma, possibly due to air trap** or presumed pulmonary embolus (Westermark sign) [12] or inadequate inspiration.

Table 5 Correlation between the three semiquantitative scoring systems and automatic quantitative measurement
Fig. 5
figure 5

Patient level of scatterplot and regression line between semiquantitative score and QDAR. The patient level of the scatterplot and regression line between the semiquantitative scores (TSS, CCTS, CTSS) by regression equation and rho value and QDAR for reader 1 (a, b, c—1) and 2 (a, b, c—2). CCTS, chest CT score; CTSS, CT severity score; QDAR, quantitative dense area ratio; TSS, total severity score

For lobe-level analysis, the median QDAR was 14.2 and 14.3 in TSS score 0, 16.9 and 16.1 in TSS score 1, 33.4 and 29.5 in TSS score 2, and 53.9 and 53.9 in TSS score 3 by both readers, respectively. A significant difference in QDAR was observed between TSS scores 1 and 2 and between TSS scores 2 and 3 for both readers 1 and 2. The median QDAR was 14.2 and 14.3 in CCTS score 0, 14.2 and 14.0 in CCTS score 1, 22.4 and 21.5 in CCTS score 2, 29.1 and 42.1 in CCTS score 3, and 54.8 and 51.9 in CCTS score 4. We observed a significant difference in QDAR between CCTS scores 1 and 2 and CCTS scores 2 and 3 in both readers 1 and 2 and CCTS scores 3 and 4 in reader 1 (Fig. 6).

Fig. 6
figure 6

Lobe-level correlation of the semiquantitative system and QDAR. Both TSS and CCTS demonstrated a proportional correlation to the quantitative dense area ratio in scores of excluding minimal and maximum scores. CCTS, chest CT score; QDAR, quantitative dense area ratio; TSS, total severity score. *p < 0.05

Discussion

We compared the clinical usefulness among the three semiquantitative CT-based scoring systems (TSS, CCTS, and CTSS) using the calculated CT severity of the lung (QDAR) as well as clinical severity at admission. Interobserver agreement among the three scoring systems was excellent for the patient level (ICC: 0.952–0.970) and good to excellent for the lobe level (ICC: 0.882–0.936) between the two board-certified radiologists. However, CTSS required a significantly longer time for both readers (R1: 48.9 ± 28.8 s, R2: 80.0 ± 37.7 s) as compared with TSS (R1: 25.7 ± 10.2 s, R2: 41.7 ± 14.9 s, p < 0.001) or CCTS (R1: 27.7 ± 11.7 s, R2: 39.5 ± 11.7 s, p < 0.001). The AUC in the ROC analysis to predict the clinical severity at admission was 0.842–0.855 in TSS, 0.850–0.853 in CCTS, and 0.836–0.853 in CTSS. The correlation between the scoring system and QDAR was highest in CCTS (0.443–0.448), second highest in TSS (0.435–0.437), and lowest in CTSS (0.415–0.426).

To establish a surrogate standard reference for the CT severity of the lung, we adopted the previously reported U-net model for automated lung lobe segmentation (LTRCLobes_R231). Using this model, we successfully created accurate lung lobe masks bilaterally. We then extracted the additional masks using a CT value ranging from − 750 to − 1 to include both parenchymal and ground-glass opacities, which can commonly be seen in COVID-19 infection, and then calculated the QDAR. The QDAR’s major advantage is its ability to provide an accurate and reproducible reference value that could correlate with CT severity rather than human interpretation [20, 21]. Its disadvantage is that it cannot distinguish the qualitative difference within each lobe and therefore should inevitably include false-positive structures within the mask, including pulmonary vasculature, atelectasis, old inflammatory change, fibrotic changes, and inadequate inspiration or air trap**. We consider that some of the higher QDARs seen in the low semiquantitative scaling system should reflect these false positives. In fact, three cases with a QDAR > 50 with low semiquantitative score (0–1) had mild diffuse increased attenuation in the lung parenchyma, possibly because of air trap** or inadequate inspiration. It is also presumed that some of the cases demonstrated Westermark’s sign, a sign of pulmonary embolus that appears as heterogeneous attenuation of the lung parenchyma [12].

CTSS was originally developed to investigate the distribution of lung involvement of COVID-19 pneumonia, with both lungs divided equally, resulting in scoring 20 segments in both lungs for 10 segments for each [17]. We modified this scoring system by using 19 instead of 20 segments, including 10 right-lung segments and 9 left-lung segments (details provided in the “Materials and methods” section). CTSS requires readers to evaluate more subdivided regions (20 regions in the original CTSS and 19 regions in CTSS we adapted in this research) with a smaller scale (3 points, 0–2), as compared with TSS (five regions, 5-point scale) and CCTS (five regions, 6-point scale). We assume that the shorter interpretation time in TSS and CCTS as compared with CTSS is mainly accomplished by the smaller interpretation burden in assessing the extent of disease. Given the pandemic situation of COVID-19, physicians need to promptly assess the disease severity of many patients. Furthermore, the AUC of CTSS for clinical severity on admission was similar to that of TSS and CCTS, but the correlation between the scoring system and QDAR was lowest in CTSS. Thus, TSS and CCTS are more appropriate in terms of clinical usefulness as compared with CTSS.

The difference between TSS and CCTS relies only on the absence or presence of the 5% threshold in scoring the degree of severity. Therefore, CCTS, which has 5% threshold, is assumed to have a better capability of distinguishing subtle lung involvement from mild lung involvement as compared with TSS which does not have 5% threshold. To quantify this difference, we evaluated lobe-level QDAR in TSS and CCTS and compared neighboring scores (Fig. 6). The median QDAR of score 1 in CCTS (< 5% involvement) was 14.2 in reader 1 and 14.0 reader 2 and that of score 2 in CCTS (5–25%) was 22.4 in reader 1 and 21.5 in reader 2, and the difference in the QDAR between these two scores was significant in both readers. Given that 75% of asymptomatic patients infected with COVID-19 demonstrate small ground-glass opacity in several lobes (1–5 lobes) [10, 22], the category of minimal involvement (< 5%) in CCTS is helpful for stratifying the patients’ lung involvement. We presume that the slightly higher correlation with QDAR observed in CCTS compared with TSS should reflect this difference.

Our results are consistent with previous reports demonstrating the three semiquantitative scores predict the clinical severity in COVID-19 pneumonia with substantial sensitivity and specificity [14, 15, 17]. The AUC for clinical severity at admission was almost similar to those of the initial study in TSS (0.842–0.855 vs. 0.819), in CCTS (0.850–0.853 vs. 0.870), and in CTSS (0.853–0.836 vs. 0.892) [14, 15, 17]. The definition of severe clinical severity in this study is almost similar to that of the initial studies, but we did not include partial pressure of arterial blood oxygen or oxygen concentration [14, 15, 17]. The proportion of cases with severe clinical severity (13.0%: 14/108) in our cohort was similar to that in previous studies of TSS (10.3%: 8/78) [14] and CTSS (17.6%: 18/102) [17] but quite different from that in the previous study of CCTS (30.1%: 25/83) [15]. Nevertheless, when validated outside the cohort (this study) with different populations, the diagnostic performance was almost similar. For both readers, there was no significant difference in the AUC for the predictive risk of develo** critical illness, but CCTS (0.802–0.818) was higher than TSS (0.792–0.788) and CTSS (0.786–0.792).

This study has some limitations. First, this was a single-center retrospective study. Second, the number of clinically severe patients at admission was small (n = 12). However, in terms of the risk of develo** critical illness, the ratio of low-risk group patients to moderate/high-risk group patients was approximately 1:1, and CCTS demonstrated the highest AUC for differentiating both the risk of develo** critical illness and clinical severity at admission. Third, false-positive structures were included within the standard reference QDAR, as mentioned above. Fourth, qualitative aspects of pulmonary opacities (i.e., likelihood of COVID-19 infection) were not distinguished in the semiquantitative scoring system. Implementing a model that could automatically score the probability of COVID-19 infection for pulmonary opacities should be investigated in the future. Fifth, CT findings of COVID-19 pneumonia change dramatically over time; ground-glass opacities are dominant immediately after hospitalization [23], whereas consolidation is common within 9–13 days [24]. In our study, the duration between initial symptoms and the CT scan was 4.79 ± 3.91 days, which was the ground-glass opacity dominant phase. Finally, some of the patients underwent CT scan outside our hospital with different CT parameters, including thickness and kernel. This may affect the results of the automated quantification of CT severity.

Conclusion

The three semiquantitative scoring systems (TSS, CCTS, and CTSS) demonstrated substantial diagnostic performances for the clinical severity in patients with COVID-19 pneumonia with excellent interobserver agreement. The interpretation time was significantly shorter in TSS and CCTS than in CTSS. The correlation between scoring system and the QDAR was highest in CCTS, followed by TSS and CTSS. Therefore, we consider CCTS to be the most appropriate CT scoring system for clinical practice.