Introduction

Background

The coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to a global pandemic since December 2019 [1]. The exponential growth of infected individuals, along with 20–30% of in-hospital mortality [2,3,4], has wreaked havoc on global health care systems [2,3,4]. Illness severity for COVID-19 varies drastically between individuals, ranging from asymptomatic or mild illness to severe viral pneumonia with acute respiratory distress or failure [5]. Most patients are categorized into mild-to-moderate COVID-19, defined as subjects with mild symptoms up to mild pneumonia [6]. Nevertheless, decompensation could also afflict these otherwise well-appearing patients rather rapidly due to the difficulty of predicting severe lung injury early in the course of illness [7].

Since the beginning of the COVID-19 pandemic, various prognostic models have been proposed to predict adverse outcomes [8]. However, most of these models were developed for the inpatient setting rather than for the emergency department (ED) [8]. The National Early Warning Score (NEWS) is one of the most accurate tools for detecting patient deterioration outside intensive critical units (ICUs), which is based only on patients’ vital signs [9]. In addition to vital signs, other factors such as age, comorbidities, imaging features and blood tests have also been reportedly associated with COVID-19 severity [8].

Objective

In this study, we attempted to develop and validate a NEWS-based logistic regression model, the COVID-19 ED pneumonia mortality index (CoV-ED-PMI), for adult COVID-19 patients presenting at ED with suspected pneumonia to predict 1-month mortality. Furthermore, we compared CoV-ED-PMI with other recommended prognostic tools [8], including CURB-65, a commonly used risk-stratifying tool for community-acquired pneumonia [10] and 4C Mortality Score, a newly developed tool to predict inpatient mortality related to COVID-19 by the Coronavirus Clinical Characterisation Consortium [11].

Materials and methods

Study setting and design

This retrospective study was conducted in the five study hospitals affiliated with Baylor Scott & White Health (BSWHealth), Texas, USA. This study was performed in accordance with the Declaration of Helsinki amendments. The Institutional Review Board approved this study (reference number: 344143) and waived the requirement for informed consent. The results are reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [12].

Study population

All consecutive patients visiting the EDs of the study hospitals between 1st March and 30th November 2020 were screened. Patients fulfilling the following criteria were eligible for inclusion: (1) tested positive for SARS-CoV-2 by quantitative reverse transcriptase polymerase chain reaction (RT-PCR) from samples collected through nasopharyngeal or oropharyngeal swabs; (2) age ≥ 18 years; (3) received chest X-ray (CXR) examination for suspected pneumonia. All patients were followed up for 1 month. If a single patient visited EDs in these study hospitals multiple times, only the data of the first visit were extracted for analysis. Because of the limited capacity for quantitative RT-PCR testing, during March and April 2020, SARS-CoV-2 screening was restricted to patients with contact or travel history or patients with suspicious laboratory or imaging findings. Since May 2020, the decision to have the RT-PCR test was left at the discretion of the ED clinicians without further limitation. The CXR examination was ordered under clinical suspicion for pneumonia. During the study period, the vaccination for COVID-19 was not yet available.

Data collection and outcome measures

All potential candidate variables were extracted from the BSWHealth’s electronic medical record (EMR) system (Epic, Verona, WI). We used previous reports [13] to assist us in selecting promising candidate predictors, including basic demographics, comorbidities documented through diagnostic codes linked to ambulatory primary care and specialty encounters, body mass index (BMI) (kg/m2) recorded within the previous 12 months before the index ED visit, presenting vital signs and supplemental oxygen use recorded at ED triage, the first CXR report during the index ED visits, and the blood test results. Since our study design was retrospective in nature, the descriptive terms used by radiologists were not standardized. To account for CXR findings in the analysis, we first randomly selected 10% of the available CXR reports. Second, we recorded the descriptive or diagnostic terms used by radiologists in the pilot-run extraction form and calculated the appearance frequency of these terms. Finally, we listed those terms with appearance frequency above 5% in the formal extraction form for research assistants blinded to the patient outcomes to extract. Neutrophil-to-lymphocyte ratio (NLR) was calculated by dividing the percentage of neutrophils by percentage of lymphocytes. If the percentage of lymphocytes was zero, the value would be replaced by 0.5. The NEWS [14], CURB-65 [10] and 4C Mortality Score [11] were computed according the variables recorded at ED triage.

We specified 1-month mortality as the primary outcome, defined as all-cause mortality occurring within 1 month after the index ED visit. We checked the survival status of all included patients through the interconnected EMR system in BSWHealth. Therefore, even if the death occurred at non-study hospitals, the death record could still be identified in the EMR system.

Sample size

Because of the retrospective nature of this study, the number of eligible patients during the study period determines the final sample size. Samples were split into the derivation and validation cohorts at a 70-to-30 ratio chronologically to simulate a prospective validation study.

Statistical analysis

Categorical variables are presented as counts with proportions, and continuous variables are presented as medians with interquartile range. Differences between groups were evaluated by chi-square test for categorical variables or Wilcoxon’s rank-sum test for continuous variables.

In the derivation cohort, we performed the multivariable logistic regression analyses to estimate the associations between candidate variables and outcome. We placed all candidate variables in the regression model for variable selection regardless of their significance by univariate analyses. We employed generalized additive models (GAMs) [15] to explore non-linear effects of the continuous variables on outcomes and to identify the optimal cut-off points to transform these variables into categorical variables. We developed the final prediction model by stepwise variable selection procedure with iterations. We defined the significance levels for entry and to stay at 0.15 to avoid exclusion of potential variables. The final prediction model was derived by excluding non-significant variables sequentially until all regression coefficients were significant.

Missing values in variables like BMI were considered as missingness at random and replaced with imputed values based on age and sex by simple linear regression method. For missing values of missingness not at random like blood tests, we set a binary variable for each blood test to indicate the presence or absence of missing values. Then, this indicator variable was multiplied by the variable of blood test and used during the model-building process.

In the validation cohort, we assessed the discriminative performance of the derived model by area under the receiver operating characteristic (ROC) curve (AUC). We evaluated model calibration by the Hosmer–Lemeshow goodness-of-fit test and a calibration plot to compare the prediction with the observed risk of 1-month mortality. We also validated the CURB-65 and 4C Mortality Score by calculating the AUC in the validation cohort. The discriminative performance of the CoV-ED-PMI, CURB-65, and 4C Mortality Score were compared by DeLong test [16].

We used R 4.0.3 software (R Foundation for Statistical Computing, Vienna, Austria) to analyze data by using packages of My.stepwise, VGAM, ROCR, pROC, gbm and ResourceSelection. A two-tailed p-value < 0.05 was considered statistically significant.

Results

The patient selection process is shown in Fig. 1. The final cohort included a total of 1678 patients. Of them, the derivation cohort consisted of 1,174 patients dated before 15th July, and the validation cohort 504 patients dated after 16th July 2020.

Fig. 1
figure 1

Patient inclusion flowchart. ED: emergency department

The characteristics of the patients and all candidate variables in the final cohort, and their comparisons between the derivation and validation cohorts are presented in Table 1. Overall, the median patient age was 54.9 years, and 840 patients (50.1%) were male. Pneumonia was diagnosed in 925 patients (55.1%) by CXR. The median NEWS, CURB-65 and 4C Mortality Score were 3, 0, and 5, respectively. The median number of missing values per patient was 1, and 867 patients (51.6%) had at least one missing value. A total of 180 patients (10.7%) died 1 month after the index ED visit. Several differences existed between the derivation and validation cohorts, resulting in higher mortality in the validation cohort.

Table 1 Characteristics of patients of COVID-19 presenting at emergency departments with suspected pneumonia

The differences between patients stratified by 1-month mortality in the derivation cohort are shown in Table 2. For the derivation cohort, the GAM plots illustrated the association of logit (p), where p represented the probability for 1-month mortality, with age, BMI, NLR, lactate dehydrogenase (LDH), and NEWS (Supplemental Fig. 1). If logit (p) was greater than zero, the odds for sustaining 1-month mortality would be greater than one. Therefore, 27.5 was selected as a cut-off point to transform BMI into a categorical variable used in logistic regression analysis.

Table 2 Characteristics of patients in the derivation cohort stratified by one-month mortality

As shown in Table 3, the CoV-ED-PMI included nine variables, resulting in excellent discriminatory performance in the derivation cohort (AUC: 0.94, 95% confidence interval [CI]: 0.92–0.96). The CoV-ED-PMI has been launched online as the following URL: https://chou2.chou-tw.com/index.php/predict/). In the validation cohort, the CoV-ED-PMI also demonstrates good discriminatory performance (AUC: 0.83, 95% CI: 0.79–0.87). Nonetheless, the calibration plot (Fig. 2) indicates that the CoV-ED-PMI may overestimate and underestimate mortality for patients higher and lower than predicted mortality of approximately 40%, respectively (Hosmer–Lemeshow test p-value: < 0.001). In the validation cohort, the discriminatory performance of the CoV-ED-PMI was significantly better than CURB-65 (AUC: 0.74, 95% CI: 0.69–0.79, p-value: < 0.001); in contrast, the CoV-ED-PMI did not significantly outperform 4C Mortality Score (AUC: 0.81, 95% CI: 0.77–0.86, p-value: 0.30) (Fig. 3).

Table 3 The Cov-ED-PMI model
Fig. 2
figure 2

Calibration curve when validating the CoV-ED-PMI for 1-month mortality in the validation cohort. The dots on the X-axis separate the validation cohort into 10 equal patient numbers of subgroups. The Red curve: calibration curve; gray area: 95% confidence interval

Fig. 3
figure 3

Comparison of ROC curves for three different prediction models. ROC: receiver operating characteristic; AUC: area under ROC curve

Discussion

Main findings

We developed and validated a logistic regression model, the CoV-ED-PMI, to assist physicians in estimating the probability of 1-month mortality of COVID-19 patients assessed in the ED with suspected pneumonia, aiming to assist the clinician in the safe disposition of these patients. This model included nine common variables and achieved good discriminative performance in the validation cohort. This model outperformed CURB-65 and was similar to 4C Mortality Score in discriminative performance.

Comparison with previous studies

After reviewing more than 100 prognostic models, Wynants et al. [8] indicated that CURB-65 and 4C Mortality Score were 2 of the most recommended tools in predicting mortality for inpatients with COVID-19. The highest AUC achieved by CURB-65 in previous external validation studies for inpatients with COVID-19 was 0.84 [17]. However, in our validation cohort, the discriminatory performance of CURB-65 was just 0.74. The different case mix between hospitalized patients and patients presenting at ED may render CURB-65 unfit for use in the ED. As for 4C Mortality Score [11], which was specifically designed for COVID-19, its discriminative performance in our validation cohort (AUC: 0.81) was even higher than its performance in the original study (AUC: 0.77). As a result, although 4C Mortality Score study [11] enrolled patients hospitalized for high likelihood of contracting SARS-CoV-2 between February and June 2020 without confirmation by RT-PCR, being quite different from our cohort, the results of current study suggested that 4C Mortality Score may also be applied to COVID-19 patients presenting to ED with suspected pneumonia.

Interpretation of current analysis

Distinct from most predictive models that analyze each vital sign separately, the CoV-ED-PMI employs a composite score of vital signs, i.e., NEWS, to assist in risk stratification for COVID-19 patients presenting at the ED with suspected pneumonia. For general ED patients, NEWS had been shown to accurately predict both in-hospital mortality and ICU admission [9]. For COVID-19 patients, Covino et al. [18] indicated that NEWS was the most accurate early warning score in predicting ICU admission within 48 h and 7 days from ED admission. We selected NEWS as the foundation of CoV-ED-PMI not only because of emergency physician (EP) familiarity with the scoring tool, but also for its validated prognostic performance. However, given that NEWS only uses initial vital signs assessed at ED triage, it is less suited to predict the full trajectory over a period of 1 month. As such, we added variables commonly available at ED visits to improve the prognostic performance of NEWS.

Many studies have demonstrated that age and comorbidities are associated with mortality of COVID-19 [13]. Similarly, in CoV-ED-PMI, age, congestive heart failure, chronic kidney disease, hepatitis, and history of transplantation were identified as significant predictors. Besides these comorbidities, several studies have indicated the association between obesity and increased risk of mechanical ventilation, severe pneumonia, and death with COVID-19 [19, 20]. Obesity was also used as a predictor of poor outcome in the 4C mortality score [11]; nonetheless, the thresholds of being obese were not defined explicitly. In our model, the GAM plot revealed that not only obesity (BMI > 30 kg/m2), but also the upper range of overweight (BMI between 27.5 and 30 kg/m2) may be associated with increased 1-month mortality.

Activation of multiple inflammatory pathways has been noted in COVID-19 patients [21]. In CoV-ED-PMI, NLR and LDH were selected as the representative biomarkers of inflammation. Patients with severe COVID-19 may present with lymphopenia [22]. Therefore, NLR could indicate both the general inflammatory status and the underlying severity of COVID-19. Normal ranges of the NLR has been reported to be between 0.78 and 3.53 [23] but no studies reported an absolute value of NLR to define severity of COVID-19. As shown in Supplemental Fig. 1, after logit transformation, there was a near-linear association between NLR and the probability of 1-month mortality. Therefore, NLR was treated as a continuous variable in our model. On the other hand, LDH is a ubiquitous enzyme found in nearly all living cells, which is released during tissue injury, and thus its plasma concentration increases during various pathologic processes. For COVID-19, LDH has also been reported as a prognostic biomarker [24]. In our analysis, after logit transformation, LDH concentrations are also proportional to the probability of 1-month mortality (Supplemental Fig. 1) and thus also treated as continuous variable in our model.

Interestingly, no CXR features were identified to be a significant predictor in CoV-ED-PMI. CXR severity score, quantified by counting the involved lung areas, was reported to be predictive of risk for hospital admission and intubation in COVID-19 patients [25]. In contrast, in our study, only the qualitative data, i.e., findings in the radiologists’ reports, were analyzed and the CXR report forms were not prospectively designed with uniform definitions, which may introduce heterogeneity, leading to the null association between CXR features and outcomes.

Future implications

The COVID-19 pandemic is threatening the capacity of EDs around the world. A better understanding of the factors associated with the disease severity can help withhold the spread of the pandemic. Approximately 81% of COVID-19 patients were categorized as mild-to-moderate severity (mild symptoms up to mild pneumonia) [6], and most of them did not need specific treatments and hospitalization [26]. Because of the limited hospital capacity, EPs need to stratify patients into different risks categories and identify those with highest risk of disease progression for hospitalization. Currently, EPs are faced with a lack of validated risk assessment tools to assist in the disposition of COVID-19 patients with suspected pneumonia since most previous prediction models [8] were derived from the patient data obtained during the early pandemic. Nevertheless, studies [27] indicate that there was a significant temporal variation in in-hospital mortality of COVID-19, suggesting that early experience in dealing with COVID-19 patients may not be applicable to those diagnosed later. Therefore, we split our cohort chronologically into an earlier derivation cohort and a later validation cohort. The unadjusted mortality of the validation cohort was about twice that of the derivation cohort. This significant difference in mortality may lead to the modest calibration of CoV-ED-PMI in the validation cohort, despite that its good validated discriminative performance. As such, when using CoV-ED-PMI to compute the risk of 1-month mortality, the clinicians should be aware of the potential over- and underestimation.

Most of the previous models [8] enrolled COVID-19 inpatients and aimed to predict the adverse outcomes of these patients post-ED disposition. Conversely, the CoV-ED-PMI we developed is to assist the EPs in pre-disposition assessment and risk stratification. Furthermore, we only focus on those COVID-19 patients with suspected pneumonia because most COVID-19 patients without symptoms or only mild illness can receive minimal, symptomatic treatment [26]. The CoV-ED-PMI only required basic demographics, comorbidities recorded in EHR, vital signs measured at triage and two commonly performed blood tests, which may be computed quite easily by an online calculator. The similar discriminative performance between the CoV-ED-PMI and 4C Mortality Score (which was developed from a large database) may assure EPs of the utility of CoV-ED-PMI in clinical use. However, the effects of vaccination against COVID-19 were not considered in our model. To accommodate these changes, the CoV-ED-PMI needs periodical validation and update to keep up with the rapidly changing pandemic.

Study limitations

First, missing data are common during pandemics of emerging infectious disease because of the uncertainty in standard management, and our study is no exception. Complete case analysis may lead to exclusion of a substantial proportion of the available subjects, thereby leading to a loss of precision and power [28]. To deal with missing data, multiple imputation was applied to replace the missing values given that the missingness was caused by missing completely at random or missing at random [12]. Nonetheless, most missing blood test results in our study were likely due to a lack of clinical indication, which may be associated with the severity of COVID-19 and clinical outcomes. For missing not at random, we used an indicator variable for each variable of blood test. With a better understanding of COVID-19, the indication for blood tests may also change. Nonetheless, we have successfully validated our model with a chronologically split validation cohort, which may mitigate the concerns for the generalizability of our model to some extent. Second, there was no strict definition for COVID-19 patients with suspected pneumonia. Only 55.1% of patients had radiological diagnosis of pneumonia with the remaining diagnosed clinically (Table 1). The different reporting styles of radiologists or the qualitative nature in extraction may in part explain this low proportion. On the other hand, although radiological exams are not the necessary diagnostic criteria for pneumonia [29], when faced with an emerging disease like COVID-19, the agreement in clinical diagnosis of COVID-19-related pneumonia may be susceptible to different diagnostic criteria used by different EPs, which may introduce heterogeneity and influence the applicability of our model. This could only be resolved through a prospective study design with pre-specified diagnostic and inclusion criteria.

Conclusions

The CoV-ED-PMI was developed and validated using clinical data from COVID-19 patients in the ED to predict 1-month mortality. This free-to-use tool has been launched online and can be easily applied in the clinical settings. By entering only ED vital signs and some common variables, it has the potential to help the EPs facilitate the disposition of ongoing COVID-19 patients with suspected pneumonia. Nonetheless, the predicted 1-month mortality is not the only important piece of information in the decision-making process, the final disposition should still be made considering other factors, like the patients’ wish and facilities’ capabilities.