Introduction

Due to its confined and locally aggressive growth, GBM is one of the most prevalent malignant tumors globally, with a significant morbidity and fatality rate [1]. It is also the most common primary intracranial tumor [2]. The prognosis of GBMs is dismal, with less than 5% of affected patients surviving > 5 years at the time of diagnosis. With the research advancements, remarkable results have been achieved in exploring the molecular pathogenesis of glioma, such as isocitrate dehydrogenase (IDH) status [3] and O6-methylguanine DNA methyltransferase promoter (MGMTp) methylation [4]. Diagnoses, categorization systems, and precise therapy have all improved due to these findings. However, although IDH mutations help individuals with gliomas live longer, gliomas with IDH mutations are prone to frequent return [5]. Therefore, further research is essential for identifying new molecular targets, prognostic assessment work, and develo** therapeutic options. Only four medications, including bevacizumab, temozolomide, lomustine, and carmustine, have been authorized by the US Food and Drug Administration (FDA) to treat GBM [6]. Although these adjuvant drugs and surgical treatments have improved the prognosis of glioma patients to some extent, the overall survival (OS) of patients is still very low [7], which is partly because the mechanisms of the tumor microenvironment and immune evasion are not fully understood and high-grade gliomas are spatially and temporally heterogeneous. In addition, different cells have different mutational characteristics [4]. Most important is the blood-brain barrier (BBB), a dynamic interface between blood and brain tissue that selectively prevents the passage of substances. The effectiveness of antitumor chemotherapeutic agents is hampered by the blood-brain barrier, which strictly regulates the homeostasis of the central nervous system [8].

In recent years, many studies have used traditional bulk RNA sequencing data to explore potential prognostic markers for GBM and improve our understanding of tumorigenesis and progression. For example, a prognostic model was developed based on 5 ferroptosis-related genes to predict survival and response to immunotherapy in GBM patients [S1A). FindVariableFeatures function was used to screen for highly variable genes based on expression data after normalization. The first 2000 highly variable genes are shown in Fig. S1B. After PCA and UMAP downscaling analysis, we identified 13 different cell clusters (Fig. 1A, B), after which we used the “SingleR” package for cluster annotation and UMAP to visualize the downscaled cell types. In addition, we identified four cell types, including ECs, monocytes, macrophages, and neutrophils (Fig. 1C). After applying Fisher’s exact test, ECs were identified as the most important cell type, and ReactomeGSA functional enrichment analysis showed that these cell types were mainly involved in classical antibody-mediated complement activation, transmembrane transport, and serotonin receptor and cardiolipin synthesis (CL) (Fig. 1D).

Next, the “monocle” R package was used to determine the cell trajectories and pseudotime distributions of two cell types that significantly differed in tumor and normal samples. We observed neutrophils corresponding to state 1 and ECs corresponding to states 2 and 3 (Fig. 1E-G).

Finally, we calculated the contribution of genes during cell development and selected the top 100 genes for visualization (Fig. S2A). Cell-cell communication networks were inferred by calculating the likelihood of communication (Fig. S2B). In addition, we predicted the cell-cell communication network for the relevant ligand receptors, finding that MSTN-TGFBR1-ACVR2A (Fig. S2C), WNT7B-FZD4-LRP6 (Fig. S2D), and others had a crucial role in the communication network of ECs .

Fig. 1
figure 1

Different cluster annotations and cell type identification in GBM10 × scRNA-seq data. A-C Cluster annotation and cell type identification by UMAP. D Functional enrichment analysis of identified hub cell types using the “ReactomeGSA” package. E-G Cell trajectory and pseudo-time analysis for the identified hub cell types. GBM, glioblastoma multiforme; scRNA-seq, single cell RNA sequencing; UMAP, uniform manifold approximation and projection

Identification of DEGs in bulk RNA-Seq data

We performed a differential analysis of tumor and normal tissues from the TCGA-GBM and GTEx cohorts, identifying 3911 DEGs. Of these, 2021 genes were up-regulated in tumors and 1901 genes were down-regulated (Fig. 2A).

Next, we used WGCNA to identify DEGs involved in the development of GBM associated with the TCGA cohort. In the process of co-expression network construction, we observed that the soft thresholding powerβwas 6 when the fit index of scale-free topology reached 0.90 (Fig. 2B, C). Eight modules were identified based on the average linkage hierarchical clustering and the soft thresholding power (Fig. 2C, D). Based on correlation coefficients and p-values, we observed that turquoise and brown modules were significantly associated with GBM development. We eventually took the intersection of marker genes of ECs and module genes of WGCNA and selected 157 genes to construct an expression matrix for further analysis (Fig. 2E).

Finally, we performed a univariate Cox regression analysis to identify potential prognostic factors for GBM in the TCGA cohort. A total of 28 genes were identified as being prognostically associated (Fig. 2F).

Fig. 2
figure 2

TCGA cohort differential analysis and WGCNA identification of hub genes in GBM development. Volcano plot of up-and down-regulated DEGs in the TCGA-GTEx cohort. Scale-free fit indices for soft threshold powers. Soft threshold powerβin WGCNA was determined by the scale-free R2 (R2 = 0.90). The left panel shows the relationship between β and R2. The right panel shows the relationship between soft threshold power β and average connectivity. Deg tree diagram based on clustering of different metrics. Heat map illustrating the correlation between different gene modules and clinical features (normal vs. tumor). Venn diagram between WGCNA module genes and endothelial cell marker genes. Forest plot of the results of one-way cox analysis of 157 intersecting genes. DEGs, differentially expressed genes; TCGA, cancer genome atlas; GBM, glioblastoma multiforme; WGCNA, weighted gene correlation network analysis

Different molecular subtypes identification

Based on the results of the univariate analysis, all patients were divided into two groups using the NMF algorithm (Fig. 3A). Sankey plots were used to investigate the relationship between different immune subtypes and grou**s. The tumor samples were divided into different immune subtypes according to the GSVA enrichment score of 5 immune gene sets, i.e., wound healing, macrophages, lymphocyte, IFN-gamma, TGF-beta, and each immune subtype representing a specific immune microenvironment. The results showed that all patients in group 1 were classified as immune C4 (lymphocyte depleted) subtype. Due to the malignant nature of gliomas, most of the patients in group 2 were also classified as immune C4 (lymphocyte depleted) subtype, while only a few were classified as C1 (wound healing) subtype and immune C6 (TGF-beta dominant) subtype (Fig. 3B). The results showed that patients in group 2 had better OS compared to patients in group 1 (Fig. 3D). The MCPcounter algorithm was used to estimate the infiltration of immune cells in different clusters. The level of infiltration of cytotoxic lymphocytes was significantly higher in cluster 2; however, fibroblasts’ infiltration level was higher in group 1 (Fig. 3C).

After differential analysis of gene expression between the two groups, 56 genes were down-regulated in cluster 2 and 12 genes were up-regulated in cluster 2 (Fig. 3E). Finally, the R package “clusterProfiler” was used to perform GO and KEGG enrichment analysis in these DEGs, which were associated with a variety of items, including “wound healing” and “negative regulation of hydrolase activity” in the biological processes (BP) category, “collagen-containing extracellular matrix” in the cellular component (CC) category, and “endopeptidase inhibitor activity” in the molecular function (MF) category (Fig. 3F). They were also associated with HIF-1, P53, and signaling pathways in diabetes (Fig. 3G). HIF-1 confers survival to glioma cells, and it drives angiogenesis [31]. P53 is an oncogene whose mutation can affect the secondary GBM [32].

Fig. 3
figure 3

Identification of different subtypes. The NMF algorithm identified two different subtypes. Sankey plots show the association between different subtypes and immune subtypes. Differences in TME between different subtypes. Kaplan-Meier curves for overall survival in different GBM subgroups (log-rank test, P value = 0.03). Heat map of the top 50 genes with the largest | log2FC | for different subtype differences analysis. F, G GO and KEGG enrichment analysis of DEGs. NMF: non-negative matrix decomposition; OS: overall survival; KEGG: Kyoto Encyclopedia of Genes and Genomes; GO: Gene Ontology

Prognostic model construction and validation

We performed LASSO regression analysis on 28 prognostic genes (Fig. 2F), which can effectively reduce features in high-dimensional data and optimize predictors of clinical outcomes, identifying seven genes (ANXA2, TUBA1C, RPS4X, PMP22, PDIA4, KDELR2, and SLC40A1) at this step (Fig. 4A). Ultimately, by multivariate Cox analysis, four genes were identified as independent prognostic factors, including TUBA1C, RPS4X, KDELR2, and SLC40A1. Based on their coefficients, we calculated risk scores using the following formula: risk score = (expression level of TUBA1C*0.41)+(expression level of RPS4X*- 0.65)+(expression level of KDELR2*0.60)+(expression level of SLC40A1*-0.39). All patients were divided into high and low-risk groups according to the median value of the risk score. Survival curves showed that patients in the high-risk group had worse OS compared to those in the low-risk group (Fig. 4B, P < 0.001). Furthermore, the risk score performed well in predicting OS for these individuals in the TCGA cohort (Fig. 4C; AUCs for 1, 3, and 5-year OS: 0.655, 0.774, and 0.955). Similar results were observed in the CGGA cohort (Fig. 4B, C, AUC for 1-, 3-, and 5-year OS: 0.587, 0.643, and 0.701). The risk graph shows detailed survival outcomes for each patient in the TCGA cohort and the CGGA external validation cohort, with patients in the high-risk group mostly having poor prognostic outcomes. The heat map shows the difference in expression of the four genes in the models in the risk group (Fig. 4D), with TUBA1C and KDELR2 having higher expression in the tumor tissues, while RPS4X and RPS4X had the opposite tendency. In summary, the endothelial hub gene risk model had the best prognostic efficacy in GBM patients.

Next, we performed principal component analysis (PCA) to further validate the grou** ability of the four DEGs. PCA was performed to demonstrate the differences between the high and low-risk groups based on the prognostic characteristics of the whole gene expression profile and the expression profile classification of the four model genes. The results showed that the expression of the entire gene was diffusely distributed in both risk groups (Fig. 4E), while the expression of the four DEGs included in this prognostic risk model was well divided into two different risk clusters (Fig. 4F).

Fig. 4
figure 4

Development and validation of a prognostic model for GBM patients. LASSO analysis with 10-fold cross-validation identified four prognostic genes. B, Survival curves and ROC curves for evaluating the risk stratification ability and predicting the constructed risk models for the TCGA and CGGA cohorts. Risk maps were used to illustrate the survival status of each sample in the TCGA and CGGA cohorts; heat maps represent the differences in expression of each gene in the risk groups. E, Principal component analysis between the high- and low-risk groups in TCGA and CGGA entire set. GBM, glioblastoma multiforme; DEGs, differentially expressed genes; LASSO, minimal absolute shrinkage and selection manipulation; ROC, subject operating characteristic curve

Clinical features between the high-risk group and low-risk group

Univariate and multivariate Cox analyses revealed that risk scores could be an independent prognostic factor for GBM patients compared with other common clinical characteristics (Fig. 5A). Based on the risk regression model of the TCGA cohort, age, sex, race, and risk score were incorporated into the nomogram line graph to predict the survival status of patients with GBM as a whole. In the nomogram, risk scores for the endothelial cell hub gene had better predictive power than other clinicopathological features. The calibration curves also demonstrated acceptable agreement between actual and predicted survival at 1, 2, and 3 years (Fig. 5B), indicating that the risk model constructed based on the endothelial cell hub gene is reliable and can predict the prognosis of GBM patients well. The area under the curve (AUC) for the 1, 3, and 5-year risk class was higher than the AUC for the other clinicopathological features (Fig. 5C, Fig. S3), and the temporal c-index values for the risk class were similarly higher than for the other features (Fig. 5D). These results suggested that the prognostic functions of the four genetic features were quite reliable. The histogram of the chi-square test showed that the high-risk grou** was only associated with the mutational status of IDH (Fig. 5E).

Also, GBM patients were grouped by age, gender, and IDH mutation status to investigate the relationship between risk characteristics and prognosis of GBM patients in these clinicopathological variables. For different staging, patients in the low-risk group of the TCGA and CGGA cohorts had significantly longer OS than those in the high-risk group (Fig. 6A-L). The differential results for the TCGA > 60 years group and the female subgroup may be due to poor prognosis in GBM and the limited number of patients. These results suggest that predictive characteristics may also predict the prognosis of GBM patients of different ages, genders, and IDH statuses.

Fig. 5
figure 5

Prognostic value of endothelial cell expression-related signatures in the TCGA cohort. Univariate and multivariate COX analysis for the riskscore and clinical features (including age, race, gender, and IDH state). Nomogram for both the riskscore and clinical features to predict 1-, 2- and 3-year survival rates. The calibration curves test the consistency between the actual outcome and the predicted outcome at 1, 2, and 3 years. AUC values for risk group and clinical features at three years. The concordance index (C-index). Bar chart of clinical characteristics under high and low-risk group

Fig. 6
figure 6

Kaplan-Meier survival curves for the low and high-risk groups in the TCGA and CGGA cohorts sorted by different clinicopathological variables. A, B Age, C, D sex, and E, F IDH mutations in the TCGA cohort. G, H Age, I, J sex, and K, L IDH mutations in the CGGA cohort

Mutation, immune function, enrichment analysis, and drug treatment analysis between the high-risk group and low-risk group

Next, we generated two waterfall plots to explore the detailed gene mutations between the high-risk and low-risk groups, finding that TP53, TTN, and PTEN were the most commonly mutated genes in high-risk and low-risk groups (Fig. 7A, B). Next, we downloaded the immune cell infiltration data from the TCGA database from TIMER 2.0. Spearman correlation analysis revealed a correlation between risk scores and the abundance of immune cells in the GBM tumor microenvironment obtained by various algorithms. E.g., B cells in CIBERSORT, XCELL, and TIMER results were negatively correlated with risk scores (Fig. 7C). Next, using correlation heat maps, we investigated the correlation between the expression levels of the four genes and risk scores and genes associated with common ICIs in the model, respectively. The results showed that higher risk scores were significantly associated with the upregulation of CD276, CD274, and CD44 (Fig. 7D). In addition to this, we explored the correlation between risk score and tumor mutational load (TMB) and the difference in TMB between different risk groups (Fig. 4A), finding no significant association between risk score and TMB. Finally, using the R package “estimate”, we found no significant differences between stromal and immune scores in the high- and low-risk groups (Fig. 4B).

We applied TCIA to predict the susceptibility of patients with high and low-risk scores to immunotherapy. As shown in the figure, neither programmed cell death protein 1 (PD-1) nor cytotoxic t lymphocyte antigen 4 (CTLA4) was significant for treatment in the risk group (Fig. S4C), probably due to the very poor prognosis of GBM. We predicted the IC50 of all chemotherapeutic agents in the high- and low-risk score groups, finding that most of the agents such as AKT inhibitors and pabucirib exhibited a higher IC50 in patients with high-risk scores, thus suggesting that patients with high-risk scores may be more sensitive to these agents (p all < 0.05; Fig. 4D). In addition, we performed GSEA enrichment analysis between the TCGA high and low-risk datasets to assess the biological function of these genes. Using the gene set database MSigDB Collections (c2.cp.kegg.v7.4.symbols.gmt), we selected the eight most significant enriched signalling pathways based on normalized enrichment scores (NES) and P values (< 0.001) (Fig. 7E). p53 signaling pathway, cell cycle, DNA repair, and regulation of the actin cytoskeleton were enriched in the high-risk group, whereas the low-risk group had higher levels of Parkinson’s disease, ribosomal, Alzheimer’s disease, and neuroactive ligand-receptor interactions.

Fig. 7
figure 7

Mutation and immune correlation analysis based on risk score models. A, B Waterfall plots summarizing mutations in high- and low-risk populations. The immune cell bubble of risk groups. Heat map showing the correlation between immune checkpoint genes and TUBA1C, RPS4X, KDELR2, SLC40A1, and risk scores. Gene set enrichment analysis of the top 8 pathways significantly enriched in the risk groups

Discussion

The process of angiogenesis implies the growth of new capillaries from pre-existing vessels. GBM is a highly vascularized tumour, and the growth of glioma is extremely dependent on the formation of new blood vessels [33]. Endothelial cells (ECs) dynamically modify their behavior during angiogenesis, eventually leading to differentiation, proliferation, migration, polarity, metabolism, and cell-cell communication changes. These modifications are assumed to integrate many external inputs; however, they also govern the ability of ECs to respond to environmental stimuli, such as up- or down-regulation of surface receptor expression [34]. In recent studies, TAM-derived factor (SEMA4D) has been shown to promote pericyte recruitment in neovascularization and cellular communication between glioma stem cell-derived perivascular cells and ECs, thus directly contributing to vascular stability in gliomas [35]. FAK proteins may increase angiogenesis in gliomas by triggering endothelial cell migration, according to research on ECs and angiogenesis. High-grade gliomas have higher FAK expression than low-grade gliomas and are associated with poorer survival [36]. As a result, anti-angiogenic therapies targeting ECs, which include inhibiting the proliferation of gliomas through angiogenesis-inhibiting factors and drugs to inhibit the formation of new tumor blood vessels, have gained increasing interest among researchers [37].

Characterization of ECs in normal brain tissue and GBM based on bulk RNAseq data is often limited [38]. In studies of ECs, it is often impossible to infer the effects of other cell types on account of GBM cell heterogeneity. In this study, we characterized the brain and GBM endothelial cells in more detail by integrating 10 × scRNA-seq and bulk RNA-seq data and used the mark gene of ECs to build a prognostic model for GBM patients, finding that the constructed prognostic model could effectively classify patients in the TCGA and CGGA cohorts into high- and low-risk groups. In addition, we explored survival status, clinical relevance, mutational status, and tumor immune infiltration in the different groups. Our results showed that higher risk scores were associated with a poorer prognosis, lower frequency of IDH mutations, and upregulation of immune checkpoints such as PD-L1 in patients. Therefore, patients with higher risk scores may be more likely to receive immunotherapy. In addition, we identified two different subtypes using the NMF algorithm. All patients in cluster 1 were immune C4 subtypes, which were associated with a worse prognosis [39]. We observed that the different subtypes had different prognostic and TME components. Group 1 was associated with poorer clinical outcomes and high infiltration levels of fibroblasts, whereas group 2 was associated with better clinical outcomes and high infiltration levels of cytotoxic lymphocytes. For example, fibroblasts can support tumor growth by consuming glucose [40].

We first identified mark genes in ECs by using single-cell sequencing, followed by LASSO and Cox regression analysis, which were used to identify four hub genes, i.e., TUBA1C, RPS4X, KDELR2, and SLC40A1, to model prognosis. TUBA1C is an isoform of alpha-microtubule protein that serves as a core component of the eukaryotic cytoskeleton and promotes cell division, formation, motility, and intracellular trafficking [41, 42]. In addition, the biological functions of microtubule proteins have been linked to cancer development, neurodevelopment, and neurodegenerative diseases [43]. In a recent study, TUBA1C expression was reported to be significantly higher in gliomas than in normal brain tissue, indicating a poorer prognosis. In addition, knockdown of TUBA1C also inhibited proliferation and migration of glioma cells, leading to apoptosis and G2/M phase arrest [44]. Studies on the oncogenic ribosomal protein S4 X-linked (RPS4X) have found that RPS4X increases cisplatin resistance after the depletion of specific small interfering RNAs. RPS4X is associated with ovarian cancer stage and its low expression is also associated with poor survival and disease progression [45]; however, there are still no reports on RPS4X in glioma. In hepatocellular carcinoma, RPS4X is required for SLFN11 inactivation in the mTOR signaling pathway [46]. Interestingly, the KDEL receptor (KDELR2) can also target and promote the growth of HIF1a through the mTOR signaling pathway to guide glioblastoma [47]. KDELR2 knockdown reduces cell viability, promotes G1 phase cell cycle arrest, and induces apoptosis. Furthermore, KDELR2 can regulate cellular function in glioma cells by targeting CCND1 [48]. Solute carrier family 40 member 1 (SLC40A1) is a gene encoding an iron transporter protein. Previous studies in multiple myeloma and ovarian cancer have shown that SLC40A1 inhibits tumor cell growth and reduces resistance to chemotherapy [49, 50]. Only one recent bioinformatics study has suggested that the ferroptosis suppressor SLC40A1 is associated with immunosuppression in gliomas and that acetaminophen may exert antitumor effects in GBM by modulating SLC40A1-induced death [51].

Our results showed that the developed prognostic model exhibited independent predictive power in predicting OS in GBM patients. We found no significant difference between the high-risk and low-risk groups in terms of gene mutations such as TP53 and PTEN; however, the high-risk group did not have any IDH mutations. According to the 2016 WHO classification, there is a significant difference between IDH mutant GBM and IDH wild-type GBM, which has a poorer prognosis [52]. This further validates the reliability of our model. In addition, we investigated the relationship between risk scores, TMB values, and PD-L1 expression levels. Disappointingly, higher risk scores were not significantly correlated with higher TMB values (Fig. S3A). A key mediator of immunosuppression in GBM is PD-L1, and although only a fraction of GBM cells express PD-L1, PD-L1 expression in the tumor microenvironment is deficient [53].

Finally, immune checkpoint blockade treatment may be more effective in individuals with higher risk ratings. The developed prognostic model might be used as a predictive biomarker for immunotherapy patients. The CGGA external validation cohort was also employed to confirm the model’s accuracy in predicting OS in these individuals. Nonetheless, the present study has some limitations. To begin with, all of the presented findings are based on bioinformatic studies and require further experimental confirmation. To corroborate our findings, we created an endothelial cell-based biomarker that will need to be tested in large-scale clinical studies.

Conclusion

The present study constructed and validated a prognostic model for GBM by integrating 10× scRNA-seq and bulk RNA-seq data. Higher risk scores were significantly associated with poorer survival outcomes, with almost zero IDH mutation rates and upregulation of immune checkpoints such as PD-L1 and CD276. Our prognostic model may be used as a potential biomarker for risk stratification and treatment response prediction in GBM patients.