Introduction

Acute respiratory distress syndrome (ARDS), characterized as refractory hypoxemia and respiratory distress, is a severe state of acute lung injury. It may lead to severe respiratory disorders, e.g., coronavirus disease 2019, severe acute respiratory syndrome, and Middle East respiratory syndrome [1]. In the intensive care unit (ICU), 10.4% of patients suffer ARDS attacks, accounting for 23.4% of patients needing mechanical ventilation, with mortality between 34.9% and 46.1% [2].

Mechanical ventilation is an essential method in the treatment of ARDS [3]. However, some patients do not benefit from this treatment [4]. Researchers have been dedicating to seek indicators for predicting mortality to guide individualized decision-making [5, 6]. In previous studies, several ventilator parameters, including tidal volume (Tv) [7], positive end-expiratory pressure (PEEP) [8], plateau pressure and driving pressure [9], and peak inspiratory pressure (PIP) [10], have been proven to be associated with the outcome of ARDS. The mechanical power (MP), an integrated parameter computed by several critical variables, has been proven to be a new marker to assess the severity of ARDS [11]. Several models have been developed, but the area under the receiver operating characteristic curve (AUC) is less than satisfactory (approximately 0.75) [12], better prognostic indicators are urgently needed.

CT is a vital tool in the evaluation of ARDS. The findings include ground-glass opacifications combined with bilateral consolidation [13, 14]. It has been used previously to evaluate mechanical ventilation [15] on patient outcomes [16]. Traditional visual CT interpretation is subjective and less informative. Radiomics, a high-throughput method to extract a large amount of quantitative information from radiographs, has brought massive power to the field of medical imaging [17, 18]. However, no applications in ARDS have been reported.

Therefore, this study aimed to extract radiomics features from chest CT images to develop and validate a prediction model for the 28-day mortality in ARDS patients with mechanical ventilation and compare its performance with previously reported models derived from clinical and ventilator parameters.

Methods

This was a retrospective study, and the patients' informed consents were waived. The Institutional Ethics Committees of the Hospital approved the study for Clinical Research (No. 2021ZDSYLL060-P01). All patients diagnosed with ARDS between January 2014 and June 2019 in a tertiary ICU of a university hospital were primarily considered. The diagnostic criteria of ARDS were according to the Berlin definition [1], proposed in 2012. Further inclusion and exclusion procedures were conducted. The study flowchart is provided in Fig. 1.

Fig. 1
figure 1

Study flowchart

Inclusion criteria: (1) treated with mechanical ventilation; (2) underwent chest CT 24 h before or after mechanical ventilation.

Exclusion criteria: (1) Inadequate CT image quality; (2) Patients under 18 years old; (3) Patients with underlying pulmonary diseases including chronic obstructive pulmonary disease, tuberculosis, asthma, pulmonary fibrosis, and pulmonary tumor or with the history of pulmonary lobectomy.

The following patients' data were collected: (1) Demographics: age, gender, height, weight, Acute Physiologic and Chronic Health Evaluation (APACHE) II score; (2) Initial (day 0) mechanical ventilator parameters: PaO2/FiO2 ratio (P/F ratio), Tv, PEEP, PIP, respiratory rate (RR).

The primary outcome was the all-cause mortality on Day 28 after ARDS onset. Patients discharged in 28 days were followed up via telephone at day 28, and the survival status was recorded.

The secondary outcome was the all-cause mortality on Day 10 after ICU admission, the ICU length of stay, and the hospitalization length of stay.

CT acquisition

All recruited patients underwent non-contrast chest CT examinations in a supine position within 24 h before or after ICU admission. All CT examinations were performed with one of the following CT scanners: Discovery CT750 HD (GE MEDICAL SYSTEMS), Optima CT670 (GE MEDICAL SYSTEMS), Revolution CT (GE MEDICAL SYSTEMS), SOMATOM Sensation 64 (SIEMENS). The main parameters for scanning were as follows: tube-voltage 100–140 kVp, tube-current 150–190 mAs, matrix 512 × 512, slice thickness 5 mm. The images were exported and saved in Digital Imaging and Communication in Medicine (DICOM) format.

Image analysis

Three regions of interest (ROIs) covering the entire lung parenchyma were delineated manually at the lung's upper, middle, and lower levels and were saved as label_1, label_2, and label_3 separately. The upper, middle and lower levels were defined as 2 cm above the carina, 1 cm below the carina, and 1 cm above the right hemidiaphragm. Adjustment in some patients was allowed due to varied body sizes. All these ROIs were drawn in ITK-SNAP version 3.8 (www.itksnap.org/). An example was demonstrated in Fig. 2.

Fig. 2
figure 2

Schematic diagram of the study

The radiomics features were extracted in Python 3.7 with the package of Pyradiomics 3.0.0 (https://pyradiomics.readthedocs.io/en/latest/). A total of 1218 features of each label including First Order, Shape, Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), Neighboring Gray Tone Difference Matrix (NGTDM), Gray Level Dependence Matrix (GLDM) were extracted. These features were extracted from the original and the derived images by applying Laplacian of Gaussian filtering or Wavelet filtering.

First of all, the Synthetic Minority Oversampling TEchnique (SMOTE) was applied to balance the positive/negative samples. Normalization and standardization were then applied to the feature matrix. The data were separated into the training set and the validation set with a proportion of approximately 2:1. The validation set was fixed in follow-up work. Dimensional reduction was achieved by removing variables with Pearson Correlation Coefficients higher than 0.9, and analysis of variance (ANOVA) was used to evaluate the relationship between features and the outcome. The logistic regression analysis was then performed to build radiomics signatures for the each label. A 5-floder cross validation was applied to evaluate the degree of overfitting of the prediction model. Radiomics scores were calculated and recorded as Radiomics_Score_1, Radiomics_Score_2, Radiomics_Score_3 for the three levels, relatively. All these statistics were performed in Feature Analysis Explorer (FAE, https://github.com/salan668/FAE) [19].

Statistical analysis

The cohort was divided into survivors and non-survivors based on the outcome on Day 28. The continuous variables were presented as means (standard deviations) or medians (interquartile ranges [IQR]) by group and were compared with student's t test or Mann–Whitney U test. The categorical variables were expressed as frequency and percentage and were compared with the Chi-square test or Fisher's exact test.

All continuous variables were normalized, and the univariate and multivariate logistic regression analysis were applied to explore risk factors. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated. The 2-tailed P < 0.05 was considered statistically significant. Various models for predicting 28-day mortality were constructed. The clinical model included APACHE II score. The radiomics model incorporated Radiomics_Score_2, Radiomics_Score_3. Ventilator models comprised APACHE II score and each ventilator parameter, respectively. These ventilator parameters were Tv, tidal Volume normalized to predicted body weight (Tv/PBW), RR, PIP, PEEP, driving pressure, MP, MP normalized to predicted body weight (MP/PBW), MP normalized to compliance (MP/compliance). The predictive performance of these models was evaluated by AUC and compared by Delong's test. All these statistics were performed in MedCalc version 19.7.4.

Formulas related to ventilator parameters calculation were as follows:

$${\text{Driving pressure }}\Delta {\text{P }} = {\text{ PIP}} - {\text{PEEP}};$$
$${\text{MP}}\left( {{\text{J}}/{\text{min}}} \right) \, = \, 0.0{98 } \times {\text{ Tv}} \times {\text{ RR }} \times \, ({\text{PIP}} - \Delta {\text{P }} \times \, 0.{5}).$$

Results

A total of 366 patients (276 survivors and 90 non-survivors) were finally included in this study. Characteristics of included patients are presented in Table 1. Age and sex were equally distributed (P = 0.43, P = 0.97) in two groups. The APACHE II score in non-survivors (30 [23–35]) was higher than that in survivors (21 [16–27], P < 0.001). The ICU length of stay and hospitalization showed no difference between the two groups (P = 0.29, P = 0.37). For primary causes of ARDS, most patients were due to intrapulmonary factors (88.0% in survivors and 90.0% in non-survivors). The number of patients without sepsis was slightly higher than those with sepsis but with no statistical significance (P = 0.18). The percentage of patients with moderate severity was highest in all severity subgroups (51.1% in survivors and 47.8% in non-survivors).

Table 1 Baseline characteristics of the included patients

A total of 1218 features of each label was extracted. Seven features at the upper lung level were selected with most characteristics related to texture uniformity. At the middle lung level, 12 features were selected, most of which related to density distribution. As for the lower lung, only three features, including Interquartile Range, Mean, and the Median after wavelet transformation, were selected. Detailed information about selected radiomics features was listed in the Supplemental material Table S1. More explanation of the radiomics features can refer to the website (https://pyradiomics.readthedocs.io/en/latest/features.html).

The P/F ratio (P = 0.003), APACHE II score (P < 0.001), Radiomics_Score_1 (P < 0.001), Radiomics_Score_2 (P < 0.001), Radiomics_Score_3 (P < 0.001), Tv/PBW (P = 0.03), RR (P = 0.03), MP (P = 0.01) and MP/PBW (P = 0.01) were associated with the 28-Day mortality in the univariate logistic regression analysis (Table 2). After the multivariate analysis, three independent risk factors including APACHE II score (OR 2.607, 95% CI 1.896–3.584, P < 0.001), Radiomics_Score_2 (OR 2.230, 95% CI 1.387–3.583, P = 0.01), Radiomics_Score_3 (OR 1.633, 95% CI (1.143–2.333, P = 0.01) were selected at last.

Table 2 Factors associated with 28-day mortality in ARDS patients with mechanical ventilation

A clinical model (APACHE II score), a radiomics model (Radiomics_Score_2 + Radiomics_Score_3), and a clinical_radiomics model (APACHE II score + Radiomics_Score_2 + Radiomics_Score_3) to predict the 28-Day mortality were constructed. The AUC of these three models in validation set were 0.758 (95% CI 0.710–0.802), 0.692 (95% CI 0.641–0.739), 0.813 (95% CI 0.767–0.850), respectively (Fig. 3). The difference between the clinical model and the radiomics model showed no significance (P = 0.13, Delong test). When combined together, the clinical_radiomics model demonstrated higher predictive power than clinical model (P = 0.004) as well as the radiomics model (P < 0.001). Figure 3 shows the ROC curve of the three models in the training set and validation set. For the clinical_radiomics model, the sensitivity and specificity were 92.5% and 58.7%, respectively.

Fig. 3
figure 3

The ROC curve of the radiomics model, clinical model, and clinical_radiomics model in the training (A) and validation sets (B)

A total clinical_radiomics score was computed by the coefficient in the final logistic regression clinical_radiomics model. A cutoff value was defined as the corresponding score when maximum value of sensitivity plus specificity was achieved. The change of sensitivity and specificity with the clinical_radiomics score and the distribution of clinical_radiomics score in two groups are presented in supplementary material Fig. S1. All cases were divided into the lower risk and high-risk groups with a cutoff clinical_radiomcis score of 2.3. The Kaplan–Meier survival analysis on Day 28 after ARDS onset is shown in Fig. 4A, with the log-rank test (P < 0.001).

Fig. 4
figure 4

The Kaplan–Meier survival analysis of the 28-Day mortality after ARDS onset (A) and 10-Day mortality after ICU admission (B) by risk stratification

When taking the survival status on Day 10 after ICU admission as the secondary outcome, the APACHE II score, Radiomics_Score_2, Radiomics_Score_3 also showed statistical significance.

The odds ratios of these factors were APACHE II score (OR 1.108, 95% CI 1.067–1.149, P < 0.001), Radiomics_Score_2 (OR 1.906, 95% CI 1.044–3.478, P = 0.03), Radiomics_Score_3 (OR 1.665, 95% CI 1.189–2.332, P = 0.003). The AUC was 0.791 (95% CI 0.746–0.831), sensitivity was 76.8% and specificity was 71.0%.

With the cutoff clinical_radiomics score of 2.3, the high-risk and low-risk groups also showed significant difference by Kaplan–Meier survival analysis, shown in Fig. 4B (P < 0.001, log-rank test).

However, the correlation between these factors and the ICU length of stay or hospitalization was relatively low.

Models constructed by clinical information (APACHE II score) and each ventilator parameter are summarized in Table 3. The Tv and Tv/PBW models were ranked the top two among all models (0.773, 95% CI 0.726–0.815 and 0.770, 95% CI 0.723–0.812). The AUCs of the remaining models were all less than 0.770.

Table3 Performance of models in predicting 28-day mortality

When comparing the top two ventilator models with the clinical, radiomics, and clinical_radiomics models, the clinical_radiomcis model showed the best performance (P = 0.02 with Tv model and P = 0.01 with Tv/PBW model).

Discussion

In critical care medicine, ARDS remains an important life-threatening issue. In a retrospective study [20] of 18 ICUs in mainland China, though ARDS occurs in a low incidence (3.57%), it results in high in-hospital mortality (46.3%). The most common risk factors of ARDS attribute to intrapulmonary disorders (83.7%), of which similar results can be found in our study (88%). The proportions of mild, moderate, and severe illness were 9.7%, 47.4%, 42.9%, respectively, which are also similar to the distribution in our cohort (23.6%, 51.1%, 25.3% in survivors, 13.3%, 47.8%, 38.9% in non-survivors). These results could indicate the reliability of the data from this cohort and potentially be comparable.

Fig. 5
figure 5

Chest CT images of an 82-year-old male admitted to the intensive care unit for acute respiratory distress syndrome with no special previous history. The CT images were obtained within 24 h of illness onset. The upper (A), middle (B), and lower (C) lungs show bilateral focal opacity, especially in the posterior and inferior of the lung. The table show specific metrics. The male survived after 28 days of illness onset, and the clinical_radiomics model made the correct prediction

Among all demographic characteristics and clinical information (age, sex, primary causes of ARDS, the severity of ARDS, with/without sepsis, P/F ratio, APACHE II Score), only the severity of ARDS and the APACHE II Score associated with the mortality in our cohort. The mortality rate increased with the severity, but this effect was eliminated following multivariate regression analysis. The final clinical model in this study was only based on APACHE II Score. APACHE II is a scoring system based on 12 admission physiologic variables, age and chronic health status. It is a widely used ICU scoring system and has been recognized as a robust prognostic marker for ARDS. Even so, the reported predictive strength is approximately 0.75 evaluated with AUC [21].

Researchers are dedicating to explore other markers aside from APACHE II scores. A recent paper in Intensive care medicine summarized the predictive models of ventilator parameters, and found favorable predictive value with AUC ranging from 0.743 to 0.753 [12]. Our study has roughly confirmed these results by constructing various ventilator models in our cohort using the same parameters. The AUCs ranged from 0.762 to 0.773, demonstrating a general consistency. Our study did not show statistical significance between each of the ventilator models, which indicates none of these ventilator models has predominant predictive power over others.

Fig. 6
figure 6

Chest CT images of a 31-year-old female admitted to the intensive care unit for acute respiratory distress syndrome with no special previous history. The CT images were obtained within 24 h of illness onset. The upper (A), middle (B), and lower (C) lungs show bilateral diffuse opacity. The table show specific metrics. The female died within 28 days of illness onset, and the clinical_radiomics model made the correct prediction

In our study, we integrated APACHE II Score with radiomics and built a new clinical_radiomics model, which demonstrated improved predictive strength compared with clinical model as well as ventilator models. As for the detailed radiomics features selected in the model, most reflect homogeneity and dispersion of pulmonary opacities. For the upper lung, most related features were grouped to the uniformity and the entrophy, while for middle and lower lungs, the features related to the skewness, kurtosis, mean density, et al. in the cluster of first-order contribute more. Up to now, from the pulmonary radiographic change perspective, only morphology phenotypes [22,23,24], which include the diffused, focal, and patchy classification, and the opacity density or its proportion [14], were reported to be associated with the outcome of ARDS. And this study would potentially be helpful in quantitatively characterize the opacity.

With this clinical_radiomics model, we stratified the patients into high- and low-risk groups using a cutoff value and observed a significant difference in mortality between groups. The cutoff value showed a sensitivity of 92.5%, which is high enough for recognizing high-risk patients, and could make earlier personalized intervention possible (Figs. 5, 6).

Limitations could also be found in this study. First of all, the method to delineate the ROIs could be controversial. We selected three two-dimensional CT sections for evaluation in this study. It cannot be denied that three-dimensional segmentation of the total lesions and the whole lung would give a clearer denotation. Researchers have been contributed to the automatic three-dimensional segmentation of the lung [25, 26]. However, the accuracy of segmentation is unsatisfactory in the setting of ARDS. The two-dimensional CT segmentation is much easier to acquire and more ready to be applied in a clinical setting. Second, its retrospective nature and selection bias cannot be ignored. Though CT is valid for the evaluation of ARDS and should be done as soon as possible after the onset of the illness, the critical situation of the patients may not allow them to be transported to do the examination. The patients included in this study may not be representative of the whole cohort of ARDS. However, the similar distribution of characteristics with other studies indicates comparability. In addition, the general consistent performance of various ventilator models in our cohort also enhances the study's reliability. Thirdly, certain semantic features like interlobular/intralobular septa thickening, crazy-paving sign may improve the performance of the prediction model. However, most of the CT images in this study were of 5 mm thickness which do not allow accurate evaluation of these features.

Conclusions

This study demonstrated that the radiomics information from the chest CT images could add incremental value in predicting ARDS prognosis. A better predictive model was achieved by integrating the APACHE II Score with Radiomics of chest CT images. Most radiomics features selected were related to homogeneity and dispersion. The middle and lower lung showed more predictive value than the upper lung. Patients with more heterogeneous and more diffused patterns of pulmonary opacity may have a worse prognosis.