Introduction

Hepatocellular carcinoma (HCC) is currently the sixth most common tumor in the world and the fourth most common in terms of mortality [1]. HCC is expected to become the third leading cause of cancer-related deaths by 2030, according to epidemiology [2]. Despite known risk factors for HCC, including hepatitis B, alcoholism and cirrhosis, the incidence of HCC remains high worldwide. At present, there are many ways to treat hepatocellular carcinoma, among which surgical resection is the main method, and intervention, targeted drugs and immunotherapy are also important treatment methods. However, the therapeutic effect of hepatocellular carcinoma is still poor, and its recurrence rate and mortality rate have not been effectively controlled [3]. Only 30% of HCC patients are reported to be in the early stages suitable for radical surgery. In addition, the efficacy of chemotherapeutic agents and targeted agents for advanced HCC is still limited [4].

Although currently targeted drug therapy for unresectable HCC has been proven to be effective [5]. Such as sorafenib, lovastinib and the vascular endothelial growth factor (VEGF) inhibitor ramuzumab are widely used in clinical applications [6,7,8]. However, we found that all of these drugs had drug reactions of varying degrees, such as skin itching, gastrointestinal reactions, and elevated blood pressure [9]. In addition, long-term use of the body is easy to produce drug resistance, its treatment effect is not as expected [10]. In recent years, new therapeutic strategies such as tumor immunosuppressive therapy have extended patients' lives, and the combination of immune checkpoint inhibitors (ICIs) and VEGF inhibitors is currently positioned as the first-line treatment for advanced HCC.

Immunotherapy has opened a new era of tumor therapy, and immunocheckpoint inhibitors (ICIs), including programmed cell death 1(PD-1)/programmed cell death ligand 1(PD-L1) inhibitors, have become a breakthrough in tumor therapy. ICIs are typical immunotherapies that activate anti-tumor immunity by inhibiting negative regulatory receptors such as PD-1 and cytotoxic T lymphocyte antigen 4(CTLA4) [11].

At present, immunotherapy for hepatocellular carcinoma has received more and more attention. The occurrence and development of hepatocellular carcinoma is closely related to tumor microenvironment [12]. Tumor microenvironment is a dynamic system composed of tumor cells, complex cytokine environment, extracellular matrix and immune cell subsets [13].

In this study, we constructed and validated an immune-related prognostic model based on the TCGA-LIHC dataset and ICGC-LIRI-JP dataset. In addition, we explored the relationship between the constructed prognostic model and the clinical and pathological features of HCC patients. We analyzed the characteristics of tumor immune microenvironment, including tumor-infiltrating cell composition, immune escape, TP53 mutation rate, and tumor microenvironment. These findings may provide new insights into novel therapeutic targets for hepatocellular carcinoma.

Materials and methods

Data preparation and processing

From TCGA (https://portal.gdc.cancer.gov/) download the mRNA expression data and clinical data of LIHC. We obtained the data of 374 tumor specimens and 50 normal specimens. Excluding HCC samples with a survival time of less than 30 days, we finally obtained the data of 342 HCC patients. 342 TCGA tumor samples were randomly divided into two equal parts: training set (Set1) and verification set (set2). The total samples of TCGA were used as another verification set (set3). From ICGC (https://dcc.icgc.org/) download the data of the Japan Institute of liver cancer (ICGC-LIRI-JP), exclude patients with metastatic liver cancer and survival of less than 30 days, and finally 229 patients with HCC were included in the study. These sample data were used as an external validation set (Set4).

From the Gene List module of the Immunology Database and Analysis Portal (ImmPort) database, we downloaded complete gene names directly, totaling 1793 immune-related genes.

Differential expression analysis and prognostic gene screening

Based on the data of LIHC in TCGA, we analyzed the mRNA expression differences between 374 tumor samples and 50 normal samples. The "Limma" R package was used to screen out differential genes (DEGs) according to adjust P < 0.01 and |logFC|> 2. The differentially expressed immune genes (DEIGs) were obtained by the intersection of immune-related genes and DEGs of TCGA. Then DEIGs were obtained by univariate Cox analysis to explore the relationship between overall survival (OS) and gene expression level. When P value < 0.05, genes were considered to have significant prognostic potential.

Copy number variation and functional enrichment analysis

The "RCircos" R package was used to show the mutation locations of these prognostic DEIGs on 23 chromosomes. The gain or loss of these genes were visualized. The "org.hs.eg.db", "Enrichment plot" and "clusterProfiler" packages in R were used to analyze GO and KEGG enrichment of prognostic DEIGs in TCGA to explore potential molecular mechanisms and biological functions [14,15,16].

Establishment of immune risk scoring signature (irss) for prognosis

In the TCGA train set (set1), LASSO regression was adopted to process prognostic DEIGs to further identify differentially expressed genes with independent prognostic value. Multivariate cox regression analysis was then used to evaluate whether these genes could be used as independent prognostic predictors and finally determine the genes to construct the model. Next, the following formula was used to calculate the risk score for each patient: 

$$\mathrm{risk}\;\mathrm{score}\;=\;\mathrm{expression}\;\mathrm{for}\;\mathrm{each}\;\mathrm{gene}\;\ast\;\mathrm{coefficient}\;\mathrm{for}\;\mathrm{each}\;\mathrm{gene}$$

The sample was divided into high-risk and low-risk groups based on the median value of the risk score.

Expression validation of immune-related signature genes

After obtaining informed consent from patients, we collected 20 pairs of HCC tissues and paraneoplastic tissues (from the Second Affiliated Hospital of Nanchang University), then cultured one normal hepatocyte line (7702) and four HCC cell lines (LM3, 97H, HepG2 and 7721), and after extracting RNA from the tissues and cells, q-PCR experiments were performed to verify the signature genes. The primer sequences of the signature genes are shown in Table 1.

Table 1 Primer sequences used for RT-qPCR

Immunohistochemistry (IHC) experiments was used to detect protein expression differences, Relative optical density scores were used to compare the differences between the two groups.

Validation of the risk score with TCGA and ICGC datasets

According to the established IRSS scoring system, the risk score of each sample was calculated. Set1, set2 and set3 groups were divided into high risk group and low risk group respectively according to the median value of risk score. Then ROC curve and Kaplan–Meier curve were drawn to verify the prediction accuracy of the risk scoring model. Nomogram was used to assess the survival risk of HCC patients in TCGA, including gender, age, TNM stage, IRSS and other clinical information. Calibration curves (1, 2 and 3 years) were drawn to assess the accuracy of Nomogram predictions. To further verify the accuracy of this model, we compared it with published prediction models of HCC.

Genomic alterations analyses

To determine whether risk score levels were associated with specific genomic traits. Copy number variation (CNV) analysis was performed using the TCGA dataset.

Association between Microsatellite Instability (MSI) and constructed predictive models

The whole TCGA dataset was further analyzed based on the constructed prognostic model after removing the samples without microsatellite status information. Then, according to the microsatellite status information extracted from the phenotypic data, we first compared whether there was any difference in the expression level of MSI in the high and low risk groups, and then divided the total samples into MSI-high and MSI-low groups through the expression level of MSI in each sample, and compared whether there was a significant difference in OS between the two groups. Thus, the relationship between high and low risk groups and OS can be further determined.

Analysis of Tumor Mutation Burden (TMB) and immune escape

To further verify the relationship between the constructed model and tumor microenvironment and immunotherapy, correlation boxplot was constructed by Pearson correlation analysis to study the impact of risk score on TMB. TIDE scores were compared between the high and low risk groups to analyze whether there was a difference in the efficacy of immune checkpoint blocking treatment between the high and low risk groups.

Consensus clustering of prognostic genes

To investigate the functions of the seven selected prognostic DEIGs, we clustered the HCC into different groups with “ConsensusClusterPlus” (50 iterations, resample rate 80% and Pearson correlation). Kaplan–Meier survival curve was used to analyze the OS of each subtype, and the clinical data and gene expression levels between subgroups were compared and analyzed, as well as the immune cell infiltration and immune cell content between subgroups.

Statistical analysis

The data were analyzed by Excel software, and the visualization of DEGs was completed by "ggplot2", "Cairo" and "ggrepel" R packages. The difference in overall survival was calculated by Kaplan–Meier method, and the significant difference was determined by R. Spearman regression analysis by Cox. ROC curve of R package survival was drawn and visualized to calculate AUC [17]. Somatic mutation and CNV data were downloaded from TCGA database. Copy number changes associated with risk scores were analyzed using GISTIC 2. All data and statistical analysis were based on R software 4.1.2, and P < 0.05 was considered statistically significant.

Results

Prognostic DEIGs screening

The overall process of the study is shown in Fig. 1. We first analyzed the differences between 374 tumor samples and 50 normal samples in TCGA. According to adjust P < 0.01 and |logFC|> 2, 2874 differential genes were screened out (Table S1), including 362 down-regulated genes and 2512 up-regulated genes. Visualization using heat and volcano maps (Fig. 2A, B). Then, 154 differential expression immune genes (DEIGs) were obtained through the intersection of immune-related genes (Fig. 2C). Univariate Cox proportional risk analysis was performed on DEIGs, and 28 DEIGs related to prognosis were obtained (Fig. 2D, Table S2).

Fig. 1
figure 1

The flow chart of this study

Fig. 2
figure 2

Differentially expressed immune genes (DEIGs) screening and univariate analysis. (A) Heatmap for DEGs. (B) Volcano plot for DEGs. (C) DEIGs scteening. (D) Univariate analysis for prognostic genes

Prognostic DEIGs Functional Enrichment and Genetic Alterations

GO and KEGG enrichment analysis was performed on 28 prognostic DEIGs. GO enrichment results showed that prognostic DEIGs were mainly enriched in Epithelial cell proliferation, gland development, regulation of cysteine-type endopeptidase activity, regulation of cysteine-type endopeptidase activity involved in apoptotic process, regulation of endopeptidase activity (Fig. 3A). KEGG results showed that prognostic DEIGs were enriched in Bladder cancer, Cytokine-cytokine receptor interaction, IL-17 signaling pathway, Melanoma, Neuroactive ligand-receptor interaction (Fig. 3B).

Fig. 3
figure 3

GO, KEGG function enrichment and CNV variation of prognosis-related DEIGs. (A) GO function enrichment of 28 prognosis-related DEIGs. (B) KEGG pathway enrichment of 28 prognosis-related DEIGs. (C) The location of CNV alteration of 28 prognosis-related DEIGs. (D) The CNV variation frequency of 28 prognosis-related DEIGs in LIHC

We also studied the copy number variation of 28 prognostic genes and summarized the CNV variation frequency of 28 prognostic DEIGs in TCGA-LIHC (Fig. 3C). Visualized the position of CNV variation on chromosomes (Fig. 3D).

Construction and prognostic value of IRSS

In order to calculate the risk score, patients with no survival information and survival time less than 30 days were excluded. Finally, 342 HCC patients were included in the study, and the total sample set (Sum) was randomly divided into the train set and the test set, with 171 patients in each group. Chi-square test was used to determine that there was no statistically significant difference in clinical characteristics among each group (Table 2). LASSO regression analysis was performed on 28 prognostic DEIGs in a training set containing 171 patients, and the model fitted best when the penalty index was 10 (Fig. 4A, B). Then, 7 prognostic genes were obtained through multivariate Cox regression analysis: GAL, NR0B1, MAPT, CCR3, GLP1R, BIRC5 and IL-17B (Table S3). Combined with the corresponding regression coefficients, the final IRSS is established:

Table 2 Chi-square test results of the TCGA training set, the test set and the ICGC cohort
Fig. 4
figure 4

Identification of seven optimal prognosis-related DEIGs. establishment of IRSS. (A) Ten-time cross-validation for tuning parameter selection in the lasso model. (B) lasso coefficient profiles. (C) The expression of 7 genes in TCGA train set. (D) Kaplan–Meier survival plots of high-risk and low-risk groups in TCGA train set. (E) The risk score and survival status of seven genes in patients. (F) Receiver operating characteristic curve (ROC) of risk score in the TCGA train set

$$\mathrm{IRSS}=(GAL\;\exp\;\ast\;0.359)+(NR0B1\;\exp\;\ast\;0.454)+(MAPT\;\exp\;\ast\;0.495)+(CCR3\;\exp\;\ast\;0.641)+(GLP1R\;\exp\;\ast\;0.306)+(BIRC5\;\exp\;\ast\;0.428)+(IL17B\;\exp\;\ast\;2.100)$$

The mRNA expression differences of seven genes in the train group were represented by heat maps (Fig. 4C). The risk score for each patient was calculated based on IRSS, and the sample was divided into high-risk and low-risk groups using the median. In the train set, the probability of OS was lower in the high-risk group than in the low-risk group (P < 0.05) (Fig. 4D). As the risk score increased, the survival time decreased and the number of patients in the state of death increased gradually (Fig. 4E). The accuracy of the model in predicting OS of HCC patients was evaluated by ROC curve, and the AUC values at 1, 2 and 3 years were 0.845, 0.823 and 0.808, respectively (Fig. 4F). The mRNA expression differences of 7 prognostic DEIGs in tumor and normal tissues in the TCGA data set (Figure S1).

Expression validation of Immune-related signature genes

The expression of the signature genes in cells (normal liver cells vs HCC cells) and tissues (HCC tissues vs paraneoplastic tissues) were verified by q-PCR assay (Fig. 5A-N). and we showed representative images of 7 genes, and then compared the expression differences between HCC tissues and paraneoplastic tissues using relative optical density scores (Fig. 6).

Fig. 5
figure 5

Validation of differential expression of 7 signature genes in cells and tissues by q-PCR. (A-G) Differential expression of 7 signature genes in 7702, LM3, 97H, HepG2 and 7721 cells. The results showed that the expression of BIRC5, CCR3, GAL, GLP1R, IL17B, MAPT and NR0B1 in HCC cells was significantly higher than that in normal liver cells. (H-N) Differential expression of 7 signature genes in HCC tissue and paraneoplastic tissue, the expression of BIRC5, CCR3, GAL, GLP1R, IL17B, MAPT and NR0B1 were significantly higher in HCC tissues

Fig. 6
figure 6

Staining images of seven signature genes in HCC tissues and paraneoplastic tissues. Relative optical density scores were used to compare the differences between the two groups

Validation of The Risk Score with TCGA and ICGC Datasets

In the test set, according to the risk score calculated by IRSS, the samples were divided into high and low risk groups using the median value. The analysis found that there was a significant difference in survival probability between the high and low risk groups (P < 0.05), and the high-risk group had a lower survival rate (Fig. 7A). The AUC values assessed by ROC curve at 1, 2 and 3 years were 0.748, 0.740 and 0.684, respectively (Fig. 7B). The mRNA expression differences of prognostic genes in the test set were represented by heat maps (Figure S2A).

Fig. 7
figure 7

Risk score validation. (A) Kaplan–Meier survival plots in TCGA test set. (B) ROC of risk score in the TCGA test set. (C) Kaplan–Meier survival plots in TCGA sum set. (D) ROC of risk score in the TCGA sum set. (E) Kaplan–Meier survival plots in ICGC validation set. (F) ROC of risk score in the ICGC validation set

In TCGA sum set, the total sample is also divided into high and low risk groups according to IRSS calculated risk score. Analysis showed that there was a significant difference in overall survival between the two groups (P < 0.05), and the survival rate was lower in the high-risk group (Fig. 7C). The AUC values of 1, 2 and 3 years were 0.798, 0.756 and 0.706, respectively (Fig. 7D). The mRNA expression differences of prognostic genes in TCGA sets were represented by heat maps (Figure S2B).

According to IRSS, ICGC data were divided into high risk group and low risk group, and the survival rate difference between the two groups was also statistically significant (P < 0.05) (Fig. 7E). The AUC values of 1, 2 and 3 years were 0.757, 0.759 and 0.772, respectively (Fig. 7F). After multiple validation, we find that the model has high robustness and accuracy. The mRNA expression differences of prognostic genes in the ICGC dataset were represented by heat maps (Figure S2C). Meanwhile, in order to further verify the reliability of the model, we compared it with four published prediction models of HCC [18,19,20,21,25].

Currently, systemic therapies including immune checkpoint inhibitors (ICIs), tyrosine kinase inhibitors (TKI), and monoclonal antibodies challenge the use of conventional therapies for HCC. Tumors often upregulate immune checkpoints to avoid detection and killing by the host immune system. Activation of checkpoint cascades, such as those controlled by PD-1 or CTLA-4, leads to tumor-specific T cell inactivation and immune evasion [26, 27]. Treatment with anti-PD-1, anti-PD-L1 or anti-CTLA-4 revitalizes T cells and allows the adaptive immune system to target tumor cells [28, 29]. Inhibitors of PD1, PDL1, and CTLA-4 are pillars of clinical practice or systemic therapies under development for HCC. Data from the CheckMate-040 Phase I/II trial, presented at the American Society of Clinical Oncology Annual Meeting 2019, The combination of Opdivo (PD-1 Antibody) and Yervoy (CTLA-4 Antibody) yielded an objective response rate of 31% and a response time of 17.5 months [30]. This result suggests that immunotherapy has a potential and surprising effect on HCC.

Tumor microenvironment (TME) of HCC refers to a complex and spatially structured mixture of liver parenchymal resident cells, tumor cells, immune cells and tumor-associated fibroblasts [31]. These populations interact dynamically through intercellular contact and the release or recognition of cytokines and other soluble factors. This complex cellular interaction has a significant impact on tumor immune evasion. TMB is often used as a proxy for the number of neoantigens because the probability of recognizing neoantigen-specific T lymphocytes is associated with TMB [32]. The frequency of these genetic changes for each type varies greatly between individual tumors and between tumor types [33]. TMB can be used to predict Immune checkpoint blockade (ICB) efficacy and has become a useful biomarker for multiple cancer types to identify patients who will benefit from immunotherapy [34].

GAL, NR0B1, MAPT, CCR3, GLP1R, BIRC5 and IL17B were the Prognostic DEIGs included in the prognostic signature. which can preliminarily predict the prognosis and immunotherapy effect of HCC patients by analyzing their relationship with tumor microenvironment, immune invasion and immunotherapy. It is helpful for the systemic treatment of HCC. Anti-GAL antibodies have been confirmed to play a role in the immunotherapy of pancreatic cancer [35]. NR0B1 has also been confirmed to be related to the occurrence and development of a variety of tumors, for example, the transformed phenotype of Ewing's sarcoma requires sustained NR0B1 expression [36]. Knockdown of NR0B1 can reduce the tumorigenic and anti-apoptotic potential of lung adenocarcinoma [37]. MAPT is a microtubule-related protein tau, which can inhibit the function of taxanes, and its high expression reduces the sensitivity to taxanes, which is of great significance in breast cancer research [38]. Studies have also confirmed that MAPT is often methylated, and hypermethylation is associated with poor prognosis in patients with stage II colorectal cancer [39]. Both CCR3 and IL17B have been found to be related to the role of tumor microenvironment in regulating tumor growth and metastasis, and may be new immunotherapy targets [40, 41]. Previous studies have shown that glucagon-like peptide-1 receptor (GLP1R) is essential for the regulation of glucose homeostasis, and in recent years, it has been found to be related to the occurrence and development of tumors [42, 43]. BIRC5 has been confirmed to be related to the occurrence of a variety of tumors, and its role in the progression of liver cancer has also been confirmed by studies [44]. However, there are few studies on its immunity to liver cancer, which is worth exploring.

In addition, by exploring tumor immune microenvironment and gene mutations, our study found that mutation rates of MSI and TP53 may also be independent prognostic indicators. The higher the risk score, the lower the survival rate. However, patients also had a lower immune escape frequency fraction, suggesting that the high-risk group may have a better effect on immunotherapy. The expression levels of PD-1 and CTLA-4 were higher in the high-risk group. Therefore, the treatment effect of anti-PD-1 and anti-CTLA-4 is better, which is expected to improve the prognosis of patients. Up to now, there are many prognostic models for HCC, but due to the complexity of HCC, no one model has been considered as the gold standard. Our study focused on the prognosis and immunoassay of hepatocellular carcinoma, combined with the current hot spot of immunotherapy, aiming to find effective immunotherapy targets for hepatocellular carcinoma. The disadvantages of this study are as follows: The clinical data in TCGA and ICGC databases did not specify in detail whether patients had received chemotherapy or radiotherapy, which may have certain influence on patient survival data and immunoassay results.

Conclusion

In summary, a robust immune-related prognosis model was constructed and tumor microenvironment and immune function were analyzed, providing potential targets for immunotherapy of HCC.