Introduction

Sepsis is defined as life-threatening organ dysfunction caused by a dysregulated host response to infection1. Despite significant advances in treatment, sepsis affects approximately 19–48.9 million people worldwide each year and remains one of the leading causes of death in critically ill patients worldwide2. Sepsis causes various complications, such as lung injury, liver injury, kidney injury, myocardial injury and brain injury3,4,5. Acute respiratory distress syndrome (ARDS) is a common complication in patients with sepsis and is characterized by diffuse alveolar injury; patients present with clinical symptoms such as acute respiratory distress, hypoxemia and pulmonary oedema6. The mortality rate of sepsis-induced ARDS is 30–40%, which is higher than that of other types of ARDS7,8. Moreover, when patients progress to severe ARDS, the mortality rate increases to more than 40%5E).

Figure 5
figure 5

Random forest plot and SVM analysis. (A)–(B). SVM-RFE analysis plot; the horizontal axis represents the number of feature genes. The best fivefold cross-validation precision and error are 0.842 and 0.158, respectively. (Software: R (4.0.2) version, R packet: e1071 (1.7–13). URL: https://cran.rstudio.com/web/packages/e1071/index.html). (C) Random forest tree. (D) The precision and Gini coefficient of random forest plot analysis of macrophage-related differentially expressed genes to determine gene importance. ((C, D) Software: R (4.0.2) version, R packet: randomForestSRC (3.2.1). URL: https://www.randomforestsrc.org/https://ishwaran.org/). (E) SVM-RFE method and random forest plot method to screen genes in the intersection of the Venn diagram.

Differential expression validation in the validation dataset

The differential expression analysis of the genes obtained after random forest plot and SVM analyses in the GSE154918, GSE28750, and GSE185263 datasets showed that, in GSE154918, the expression of all 10 genes was significantly different (P < 0.05) (Fig. 6A). In the GSE28750 dataset, the expression levels of SGK1, ANPEP, DYSF and MSRB1 were significantly different between groups (P < 0.05) (Fig. 6B). In the GSE185263 dataset, the expression levels of SGK1, MYD88, DYSF, PLEKHO2, CYTH4 and MSRB1 were significantly different between groups (P < 0.05) (Fig. 6C). Only SGK1, DYSF and MSRB1 showed differential expression in all three validation sets.

Figure 6
figure 6

Gene expression validation. (A) Gene expression validation in the GSE154918 dataset. (B) Gene expression validation in the GSE28750 dataset. (C) Gene expression validation in the GSE185263 dataset. (Software: R (4.0.2) version, R packet: ggplot2 (3.4.1). URL: https://ggplot2.tidyverse.org/). (D)–(M). ROC analysis of the value of these genes in diabetes mellitus diagnosis. (Software: R (4.0.2) version, R packet: pROC (1.18.0). URL: http://expasy.org/tools/pROC/).

ROC curve analysis of the diagnostic value of marker genes

To further understand the diagnostic value of marker genes in ARDS, ROC curve analysis was performed using samples from the control and sepsis-induced ARDS groups in the GSE32707 dataset as study samples. The results suggested that all 10 marker genes had a good predictive effect in terms of the diagnosis of ARDS, with an area under the curve greater than 0.65; SGK1 (AUC = 0.791) had the best diagnostic effect, followed by LST1 (AUC = 0.743), MSRB1 (AUC = 0.740) and DYSF (AUC = 0.721) (Fig. 6D–M). All three validated differentially expressed genes showed good diagnostic value for ARDS.

Construction of the nomogram

Based on the validation results of external datasets and ROC analysis, SGK1, DYSF and MSRB1 were used as the basic genes for subsequent analysis. To further understand the diagnostic value of SGK1, DYSF and MSRB1 in ARDS, nomogram analysis was performed, and a model was constructed. The calibration curves suggested that the nomogram model curves had high overlap with the ideal model, indicating that the nomogram model composed of three genes, SGK1, DYSF and MSRB1, showed good diagnostic prediction for ARDS. Moreover, the area under the curve (AUC) was 0.809, demonstrating that the prediction effect of the nomogram model was better than that of each of the three genes (SGK1, DYSF and MSRB1) alone (Fig. 7A,B).

Figure 7
figure 7

Nomogram and cluster analysis. (A) Nomogram plot. (B) Nomogram model prediction effect plot. (Software: R (4.0.2) version, R packet: rms (6.5–0). URL: https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms). (C) Matrix heatmap of key gene clustering analysis. (D) Cluster analysis delta area plot. (E) PCA of different clusters after cluster analysis. (Software: R (4.0.2) version, R packet: ConsensusClusterPlus [1.62.0]. URL: https://bioconductor.org/packages/ConsensusClusterPlus/;). (F) Cluster analysis subtype grou** and sample consistency of sample diagnostic grou** Sankey plot. (G) Analysis of the expression differences of key genes in different clusters.

Clustering analysis

To understand the effect of SGK1, DYSF and MSRB1 on sample clustering, cluster analysis was performed. The results suggested that the samples could be divided into 2 clusters according to the expression values of SGK1, DYSF and MSRB1 (Fig. 7C,D). PCA showed that the cluster analysis had a good clustering effect (Fig. 7E). Sankey plots showed good agreement between cluster and disease grou** for cluster analysis, with Cluster 1 having a high proportion of samples in the control group and Cluster 2 having a high proportion of samples in the ARDS group (Fig. 7F). Wilcox analysis suggested that SGK1, DYSF and MSRB1 were differentially expressed in Cluster 1 and Cluster 2 (Fig. 7G) (P < 0.05), and the trend of difference was consistent with the trend of difference between the control and ARDS groups. These results further demonstrate the diagnostic value of SGK1, DYSF and MSRB1 for ARDS.

Transcription factors and miRNA prediction of SGK1, DYSF and MSRB1

Based on the prediction results in the JASPAR, HumanTFDB, and GTRD databases, a total of 173 transcription factors of SGK1 (Fig. 8A), 140 transcription factors of MSRB1 (Fig. 8B), and 172 transcription factors of DYSF (Fig. 8C,D) were obtained. Based on the prediction results in six databases, miRWalk, RNA22, RNAInter, TargetMiner, TargetScan and miRDB, a total of 28 miRNAs for SGK1 were obtained (see Supplementary Fig. S3A online). miRNAs for DYSF and MSRB1 were not predicted in the TargetMiner database, and the prediction results in the remaining five databases suggested that a total of 12 miRNAs of MSRB1 (see Supplementary Fig. S3B online) and 21 miRNAs of DYSF (see Supplementary Fig. S3C,D online) were obtained.

Figure 8
figure 8

Transcription factor prediction. (A) Transcription factors of DYSF were predicted by three databases, as shown in Venn diagram. (B) Transcription factors of MSRB1 were predicted by three databases, as shown in Venn diagram. (C) Transcription factors of DYSF were predicted by three databases, as shown in Venn diagram. (Software: R (4.0.2) version, R packet: VennDiagram (1.7.3). URL: https://cran.rstudio.com/web/packages/VennDiagram/index.html). (D) Interaction of SGK1, MSRB1 and DYSF with predicted transcription factors.

Discussion

Sepsis affects more than 30 million people worldwide each year and is one of the leading causes of death in critically ill patients22. Sepsis-induced ARDS is a type of acute progressive respiratory failure caused by sepsis, and its pathogenesis is mostly focused on the inflammatory response, oxidative stress, and abnormal coagulation23. As innate immune cells, macrophages play a key role in the inflammatory response. It was found that secretory autophagy of alveolar macrophages (AMs) promotes the inflammatory response and lung injury through secreted IL-1β24. Impaired phagocytosis of alveolar macrophages in advanced sepsis further increases the severity of ARDS, and IFN-β treatment reverses the impairment of AM function induced by IL-10 and reduces the severity and mortality of ARDS in a dose-dependent manner25.

The results of the enrichment analysis in this study suggest significant enrichment of multiple pathways. Immune cells play a substantial role in the development and treatment of ARDS. Neutrophils play a major role in the process of leukocyte migration. Neutrophils are the first immune cells to migrate to the site of inflammation after stimulation by chemokines released from damaged lung tissue, and massive activation of neutrophils leads to peripheral tissue damage and lung dysfunction26,27. Eliav et al.28 found that the leukocyte migration inhibitor GT-73 significantly reduced the number of infiltrating leukocytes in LPS-induced ARDS in mice, decreased the levels of cytokines and was protective against ARDS. The reactive oxygen species-related pathway was also significantly enriched, and in ARDS, the production of reactive oxygen species (ROS) is detrimental to lung tissue, and ROS can damage lung endothelial cells and lead to impaired alveolar-capillary barrier function29. NF-κB is an important transcription factor that controls the release of proinflammatory mediators, and ROS can also cause activation of NF-κB, which can exacerbate inflammatory lung injury30. Studies have shown that suppressing the production of ROS is beneficial in reducing inflammatory lung injury31.

GSVA suggests significant differences in cytokine/cytokine receptor and Toll-like receptor pathway enrichment levels. In ARDS, multiple receptors of innate immune cells (macrophages, dendritic cells or monocytes), such as Toll-like receptors, recognize PAMPs and DAMPs to induce cytokine release syndrome32. It has been shown that the inflammatory response to LPS-induced ARDS can be attenuated by inhibiting TLR4 expression33. AM is a major source of cytokines and chemokines that initiate the immune response, and the overproduction of these proinflammatory cytokines (IL-1β, IL-6, TNF-α, IL-8, etc.) leads to the development of acute lung injury34. Additionally, AM is a major source of anti-inflammatory cytokines, such as IL-10 and TGF-β, which can suppress the inflammatory response to acute lung injury.

The three key genes screened in this study have been partially studied in lung injury and other diseases. SGK1 is a member of the protein kinase subfamily, a serine/threonine protein kinase with high homology to second messengers such as protein kinase B. As a hub for multiple signal transduction pathways and cellular phosphorylation events, SGK1 plays an important role in cell proliferation, ion channel regulation, signal transduction and other physiological processes and is thought to have an essential function in inflammation35. Studies have shown that SGK1 enhances the function of sodium channels to promote clearance of alveolar oedema fluid in a mouse model of lung injury36,37. Michalick et al. found that SGK1 may provide promising new targets for the prevention or treatment of ventilator-associated lung injury38. ** of cluster analysis with sample grou**. Wilcox test was used to analyse the differences in gene expression between different clustered grou**s.

Transcription factors and miRNA prediction of key genes

The promoter sequences of the core genes (including the transcription start site up to 2000 bp upstream of it) were obtained from the NCBI database (https://www.ncbi.nlm.nih.gov/gene/). After that, transcription factor prediction was performed in the JASPAR database (https://jaspar.genereg.net/), HumanTFDB database (http://bioinfo.life.hust.edu.cn/HumanTFDB#!/) and GTRD database (http://gtrd.biouml.org/#!) based on the obtained promoter sequences. The transcription factors obtained from the three databases were intersected to obtain transcription factors that existed in all three databases at the same time and then visualized using Cytoscape software. miRNAs of key genes were predicted in the miRWalk, RNA22, RNAInter, TargetMiner, TargetScan and miRDB databases, and then the miRNAs in each database were overlapped to identify the intersecting miRNAs; the Venn diagram was visualized using Cytoscape software.