Background

Hepatocellular carcinoma (HCC) is a multifaceted pathological process caused by various factors, including hepatitis B virus, hepatitis C virus, metabolic disorders, and alcoholism. Unfortunately, most HCC patients are initially diagnosed with advanced HCC, making it one of the deadliest cancers worldwide (Bray et al. 2018). Although there are various treatments for liver cancer, such as surgery, radiotherapy, and chemotherapy, the survival time of most liver cancer patients is still short, and prognostic testing for the disease has been emphasized (Forner et al. 2018). A hepatologic biopsy can effectively analyze the occurrence and progression of HCC and provide a diagnostic basis for the prognostic evaluation of patients (Wang et al. 2019). However, this procedure is not a routine practice because of the invasiveness of HCC biopsy, and the evaluation of HCC patient prognosis through non-invasive liquid biopsy could play a critical role in clinical management (Ahn et al. 2021).

In recent years, various liquid biopsy techniques have shown significant promise as prognostic tools for HCC. The detection of circulating tumor cells, cell-free DNA, and exosomes from blood samples solve the lack of key molecular targets for HCC patient prognostic testing (Chen et al. 2020b). Exosomes are 50–200-nm lipid vesicles containing nucleic acids, proteins, small metabolic molecules, and many other components. Studies have demonstrated that exosomes can enter cells through receptor-ligand interactions or through various mechanisms such as phagocytosis, endocytosis, micropinocytosis, and fusion, to activate signaling pathways among liver cells, different types of immune cells such as macrophages, natural killer cells, and stroma, and various stromal cells such as stellate cells and adipocytes (Shen et al. 2017; Liu and Li 2018; Zhou et al. 2018; Zhang et al. 2019, p. 7). Researchers have found that miR-21 levels in exosomes of HCC patients were significantly higher than in patients with chronic hepatitis and healthy controls and that exosomes were significantly more sensitive to detection based on patient serum, suggesting that specific molecules in exosomes could be a promising new marker for HCC (Wang et al. 2014). Furthermore, exosomal alterations reflect the prognosis and treatment of HCC. High expression of miRNA-21 and lncRNA-ATB in the exosomes suggests poor overall survival for HCC patients, indicating that different molecules in the exosomes of HCC patients reflect different pathophysiologic processes of the disease (Lee et al. 2019). Additionally, miR-122 expression was significantly reduced in the exosomes of patients treated with transarterial chemoembolization, further supporting the potential of exosomes as diagnostic markers for HCC (Suehiro et al. 2018).

Currently, most molecules targeted by exosomes for diagnosing HCC patients are small RNAs, such as miRNAs (Li et al. 2019). However, no studies have been reported to assess patient prognosis based on changes in mRNA expression levels in HCC exosomes, much less relevant risk prognostic models. However, the diagnostic value of mRNA in exosomes has not received sufficient attention. Liquid biopsy techniques based on small extracellular vesicular RNA are gradually entering clinical applications. Exosome Diagnostics is one company that developed ExoDx for blood testing based on EML4-ALK fusion transcripts and ExoDxTM Prostate (IntelliScore) for multiple gene expression features (ERG, PCA3, and SPDEF) in small extracellular vesicles. This study proposed three scientific questions regarding the detection of mRNA in the secretions of liver cancer patients for prognosis. First, are there any mRNA molecules whose differential expression characteristics are shared by liver cancer tissue and peripheral blood secretions of liver cancer patients? Second, can relevant mRNA molecules be used to evaluate the prognosis of liver cancer patients? Finally, which cell types of liver cancer tissue are the source of relevant mRNA molecules? These questions must be answered to determine the diagnostic value of exosomal mRNA in liver cancer patients.

We aimed to fill this gap by screening key mRNAs in the exosomes of HCC patients to establish a risk prognostic model for HCC patients. The risk prognostic risk score obtained through the model as an independent risk factor can more accurately assess the 1-, 3-, and 5-year survival rates of patients and has a high application value.

Materials and Methods

Data Collection

Exosome transcriptome data from healthy human blood and liver cancer patients were obtained from the exoRBase 2.0 database (http://www.exorbase.org/exoRBaseV2/toIndex). mRNA expression data and copy number variation data from liver cancer (https://xena.ucsc.edu/) were obtained from the TCGA database via the ICGC website ( https://dcc.icgc.org/) downloaded transcriptome data of liver cancer. Data from 374 tumor samples and 50 control samples were downloaded from the TCGA database, whereas data from 112 tumor samples and 118 control samples were downloaded from exoRBase 2.0. mRNA expression values were standardized and converted to log2 (TPM + 1), with copy number amplification and deletion thresholds determined using the GISTIC software.

Differentially Expressed Genes and Exosomes Prognostic Genes (Exo-Genes) Screening

HCC and control data from the TCGA and exoRBase 2.0 databases were compared using the limma package. |LogFC|> 0.585 and Calibration P < 0.05 were defined as differentially expressed genes (DEGs). Finally, 123 overlap** exosomes-related differential genes were filtered by intersecting the differential genes from the two databases. To further explore the clinical guidance value of the differential genes, we matched the exosomes-related differential genes from the TCGA database with clinical data and then used the survival package to evaluate the prognostic difference through univariate Cox regression analysis followed by the log-rank test, and the exosomes-related differential genes with a P < 0.05 were defined as exosomes prognostic genes (Exo-genes). The ability of related genes to influence patient prognosis was validated by plotting Kaplan Meier survival curves of exosomes prognostic genes.

Establishment and Validation of Risk Prognosis Model

The gene expression data of Exo-genes were combined with clinical data, and prognostic models were constructed using the glmnet package with Lasso-Penalized Cox regression analysis to exclude genes predisposed for fitting. A risk prognostic model was constructed in the TCGA database of HCC patients with LASSO regression analysis, and the obtained coefficients and gene expression values were used to derive a risk prognostic score, formulated as follows: Risk score = sum (Expgene × coef). Median liver cancer expression data from the TCGA database were categorized into high- and low-risk groups based on risk prognostic score. The prognostic model predictive capacity was assessed using Kaplan–Meier survival curves and receiver operating diagnostic (ROC) curve analysis. The model was repeated in liver cancer data (LIRI-JP) downloaded using ICGC to test the accuracy of the predictive ability of risk scores. To further investigate the robustness of risk prediction models to different clinical factors, we applied risk prognostic scores to different subgroups of clinical factors and mapped Kaplan–Meier survival curves for validation. Single and multivariable Cox regression analyses of risk prognostic scores and clinical variables in TCGA-LIHC and LIRI-JP revealed that risk prognostic scores were independent risk factors in both analyses (P < 0.05).

Establishment of a Nomograph Model

The Rms package was used to develop a prognostic column model to provide more accurate prognostic predictions for clinical patients based on risk scores and clinical characteristics. Identification accuracy was estimated using the consistency index (c index), the area under the ROC curve (AUC), and calibration diagrams. The estimates c Index and AUC > 0.7 are reasonable. Decision curve analysis (DCA) was then used to evaluate the clinical utility of the nomogram model.

Immune Infiltration Analysis

HCC expression data from the TCGA database were categorized into high- and low-risk groups based on median risk prognostic score. We calculated immune checkpoint expression in both groups and presented them with a point matrix. The ratio of 22 immune cell types per sample was calculated by cell-type identification with estimator relative subsets of RNA transcripts (CIBERSORT).

Single-Cell Sequencing Analysis

GSE125449 was downloaded for analysis. Sequencing data were qualitatively controlled using Seurat packets, and the cells were clustered to analyze the expression of exosomal risk genes in different clusters. Different cluster cell types were analyzed using singleR packets for annotation.

Statistical Analysis

The difference between the two groups was compared using the unpaired t-test. Single and multifactorial Cox regression analyses were used to analyze the relationship between the variables and the patient outcome. In all analyses, P < 0.05 was considered statistically significant.

Results

Screening of Exosome-Related DEGs in HCC

We compared the differences in exosomes-related genes between HCC patients and normal controls in the exoRBase 2.0 database and found 393 differentially expressed genes (Fig. 1A). DONSON, XIST, AC018607.1, MALAT1, GOLGA8A, AC005224.3, AC003684.1, VSTM2A, LINC02280, and CDHR2 were under-expressed in exosomes from liver cancer patients, and RPS4Y1, HIST2H2AA3, APOC3, ORM1, APOA2, APOH, HRG, FGB, FGG, FGA were highly expressed in the exosomes of HCC patients (Fig. 1B). We also compared genetic differences between HCC patients and normal controls in the TCGA database and found 4,507 differentially expressed genes (Fig. 1C), with thermal map** revealing the top 20 most differentially expressed genes (Fig. 1D). A cross-section analysis of differentially expressed genes from the two databases resulted in 123 exosomal-associated differential genes in HCC (Fig. 1E).

Fig. 1
figure 1

Screening of 123 HCC exosomes-related differential genes. A Volcano plot of differential gene expression in hepatocellular carcinoma in the exoRBase 2.0 database. B Top 20 differentially expressed hepatocellular carcinoma heat maps in exoRBase 2.0 database. C Volcano plot of differential gene expression in hepatocellular carcinoma in TCGA database. D Top 20 differentially expressed hepatocellular carcinoma heat maps in TCGA database. E Intersection of differentially expressed genes in liver cancer between exoRBase 2.0 database and TCGA database

Identification of Exosomes Prognostic Genes (Exo-Genes)

To further define the relationship between HCC exosomes-related differential genes and patient prognosis, we performed a univariate regression analysis of 123 HCC exosomes-related differential genes and identified 44 genes as Exo-genes. Of these, 14 genes were risk factors and 30 were protective (Fig. 2A). Based on the median expression of these prognostic genes, we divided the data into high- and low-expression groups, mapped Kaplan–Meier survival curves, and selected the top nine genes with the lowest P values for presentation (Fig. 2B). We confirmed that the patients with high expression of DSCC1, CXCL8, S100A9, NT5DC2, E2F1, PLK4, and TYMS had lower overall survival rates. The patients with high expression of CLE3B and CYP2C9 exhibited higher overall survival rates.

Fig. 2
figure 2

Identification of 44 exosomes prognostic genes (Exo-genes) in hepatocellular carcinoma. A Identification of exosomes prognostic genes by univariable Cox regression analysis. B Prognostic curves of high and low expression of exosomes prognostic genes from hepatocellular carcinoma patients in TCGA database

Establishment of Risk Prognostic Model Based on Exosomal Risk Genes

To further evaluate the prognostic power of exosomal prognostic genes in HCC patients, we performed the Lasso Cox regression analysis based on TCGA HCC patient prognostic data. Figure 3A demonstrates regression coefficient changes in each exosomal prognostic gene, with the model having the lowest mean variance when lambda is 0.0678. Consequently, the following six genes (exosomal risk genes) were selected to construct the risk prognostic model (Fig. 3B): CLEC3B, CYP2C9, GNA14, NQO1, NT5DC2, and S100A9. Based on the expression and regression coefficient (Fig. 3C) of these six genes, we constructed a risk prognosis score based on the following formula:

Fig. 3
figure 3

Risk prognosis model based on exosomal risk genes has good diagnostic value. A Changes in regression coefficient of exosomes prognostic genes in Lasso Cox regression analysis. B Determination of λ. C Regression coefficient of CLEC3B, CYP2C9, GNA14, NQO1, NT5DC2 and S100A9. D HCC patients with high-risk prognostic score in TCGA database have lower survival. E Survival curves of high-risk and low-risk prognostic score groups in TCGA database of HCC patients. F ROC validates prognostic performance of HCC risk score in TCGA database. G HCC patients with high-risk prognostic score in ICGC database have lower survival. H Survival curves of high-risk and low-risk prognostic groups of liver cancer patients in ICGC database. I ROC validates prognostic performance of HCC risk score in ICGC database

$${\text{Riskscore}} = \, {\text{S}}100{\text{A}}9 \times 0.0510 - {\text{CYP}}2{\text{C}}9 \times \, 0.0410 + {\text{NT5DC2}} \times 0.0222 - {\text{CLEC3B}} \times \, 0.1280 + {\text{NQO1}} \times 0.0035 - {\text{GNA14}} \times 0.0816$$
(1)

To explore the prognostic power of risk score for patient outcomes, we categorized HCC patients in the TCGA database (TCGA-LIHC) into high- and low-risk groups (Fig. 3D) based on median risk scores and found that overall survival was significantly lower in the high-risk group than in the low-risk group (Fig. 3E). Using AUC, we found that risk prognostic scores were accurately predictive in 1-, 3-, and 5-year survival assessments (1-year AUC = 0.7532, 3-year AUC = 0.7474, and 5-year AUC = 0.7135) (Fig. 3F). The same study was performed in validation data from the ICGC database (LIRI-JP), and similar results were obtained from TCGA-LIHC, suggesting that this risk model has broad applicability (Figs. 3G–I).

Risk Prognostic Score is an Independent Prognostic Factor with Good Robustness

To explore the independent prognostic value of risk score, we performed univariable and multivariable Cox regression analyses with age, sex, stage, grade, and risk score of TCGA-LIHC data, indicating that risk score was an independent prognostic factor for patients in the TCGA-LIHC data (HR = 4.544, P < 0.001) (Fig. 4A). We also observed risk score as an independent prognostic factor for patients in the LIRI-JP data (HR = 4.860, P < 0.05) (Fig. 4B).

Fig. 4
figure 4

Risk prognostic score is an independent prognostic factor with good robustness. A Univariable cox regression analysis and multivariable cox regression analysis of survival of HCC patients in the TCGA database. B Univariable cox regression and multivariable cox regression analysis of HCC in ICGC Database. C HCC patients in the TCGA database were subgroups based on clinical characteristics and Kaplan–Meier demonstrated good predictive ability of risk prognosis scores

We evaluated the robustness of TCGA-LIHC based on identifying risk score as an independent predictor. By classifying the clinical characteristics (age, sex, stage, and grade) of TCGA-LIHC, we evaluated the outcomes of patients with different risk scores in different subgroups, suggesting that patients with higher risk scores had worse outcomes than those with lower risk scores, suggesting a more robust risk model (Fig. 4C).

Evaluation of Nomogram Model for Risk Prognosis Score and Detection of Immune Checkpoint

We constructed a nomogram survival prediction model based on the pathological stages and risk prognostic scores of HCC patients to predict 1-, 3-, and 5-year survival probabilities (Fig. 5A) of TCGA-LIHC. The ROC and calibration curves revealed that the model had an accurate predictive ability for 1-, 3-, and 5-year survival probabilities of patients (Fig. 5B). In addition, the decision curve analysis revealed that the model exhibited the best clinical benefit (Fig. 5C) when incorporating pathologic stage and risk prognosis score to predict clinical outcomes.

Fig. 5
figure 5

A nomogram model constructed based on pathological stage and risk prognostic score was able to effectively predict the survival of liver cancer patients in TCGA database. A Nomogram model based on HCC patients in TCGA database. B Time dependent ROC curve and calibration curve based on Nomogram model of liver cancer patients in TCGA database. C DCA curve of liver cancer patients based on TCGA database

We compared immune cell types between the high- and low-risk groups with risk prognostic scores and found differences in the distribution of multiple immune cells (Fig. 6A). To further assess whether risk prognostic scores can guide immunotherapy in patients, we examined the expression of immune checkpoints in the high- and low-risk groups. We found that the expression of CTLA4, PDL1, CD27, ID01, PD1, HAVCR2, LAG3, and TIGIT was significantly higher in the high-risk group than in the low-risk group, suggesting that immunotherapy may benefit patients in the high-risk group (Fig. 6B).

Fig. 6
figure 6

Differences in immune test sites among patients with different risk prognosis scores. A Difference of distribution of immune cells in high/low-risk groups of HCC patients in TCGA database. B Differential expression of immunological test sites in high/low-risk groups of HCC patients in TCGA database

Distribution of Exosomal Risk Genes in Hepatocellular Carcinoma

Different cell types in the tumor microenvironment (e.g., fibroblasts, immune cells, endothelial cells, and hepatocytes) have broad implications for tumor progression and play a key role in regulating the bioactivity of HCC. To further understand the cell types expressed in HCC tissues by six exosomal risk genes (CLEC3B, CYP2C9, GNA14, NQO1, NT5DC2, and S100A9) that build risk prognostic models, we performed correlation analysis of single-cell sequencing dataset GSE125449. Through down-dimensional analysis, we identified 18 cell types in this single-cell sequencing dataset (Fig. 7A), with low expression of CLEC3B and GNA14 in various cell types, high expression of CYP2C9 in hepatocytes, high expression of NQO1 in epithelial and hepatocytes, high expression of NT5DC2 in epithelial and endothelial cells, and high expression of S100A9 in multiple cell types, suggesting a role in HCC prognosis (Fig. 7B).

Fig. 7
figure 7

Distribution of six exosomal risk genes in HCC tissue. A Cell clustering for single-cell sequencing. B Distribution of six exosomal risk genes in different kinds of cells

Discussion

The tumor microenvironment plays a crucial role in tumor progression, and the interaction between different cells constitutes a significant component of the tumor microenvironment. Apart from direct cell-to-cell interactions, communication between cells can occur through the secretion of exosomes and other signaling molecules. Such communication within the tumor microenvironment can have a significant effect on tumor growth and metastasis (Wu et al. 2019). Exosomes—known to mediate the transfer of various molecules—play a pivotal role in the initiation, progression, invasion, and metastasis of hepatocellular carcinoma (HCC). Consequently, exosomes represent promising targets for develo** diagnostic, therapeutic, and prognostic approaches for HCC. Understanding the mechanisms underlying exosomes-mediated communication within the tumor microenvironment may lead to develo** novel strategies for effectively managing HCC (Abudoureyimu et al. 2019).

The evaluation of risk score models for tumor prognosis is focused on specific pathophysiological processes or signaling pathways involved in tumor progression, such as autophagy, aging, and iron death. However, the practical application of these models in clinical settings is limited by the difficulty of obtaining pathologic tissue samples and the limitations of relevant risk scores (Liang et al. 2020; Qiu et al. 2020; Chen et al.

Conclusion

In this innovative comparative research, we identified 44 exosomal prognostic genes from the genomes of Chinese and foreign HCC patients and normal controls. We performed functional enrichment and copy number analysis on these genes and selected six genes—CLEC3B, CYP2C9, GNA14, NQO1, NT5DC2, and S100A9—as exosomal risk genes through Lasso Cox regression analysis. These genes were used to establish risk prognosis models validated in TCGA and ICGC databases. Our study used Cox regression analysis, robustness analysis, column model evaluation, immune checkpoint testing, and single-cell sequencing to confirm the utility of the risk prognostic score.