Introduction

Global Cancer Statistics 2020 reported that the incidence of colorectal cancer was 10% and that the mortality rate was 9.4%, among all cancers, it is the second most common cause of mortality [1]. Up to 20–30% of colon cancer patients with early-stage illness will develop distant metastases despite complete segmental resection [2]. Colon cancers are not only anatomically different from rectal cancers but also pathologically require different staging procedures. Furthermore, colon and rectal cancers require different neoadjuvant treatments and compatible surgical approaches [3]. The current treatment modalities for colon cancer mainly include surgery and radiotherapy. Although surgery is usually sufficient for early-stage colon cancer, advanced colon cancer requires a combination of preoperative and postoperative radiotherapy [4]. Despite improvements in systemic therapy, the five-year survival rate for CRC metastatic disease patients is around 12–14% [5]. Moreover, patients who are not diagnosed promptly at an early stage often end up with poor treatment outcomes and a poor prognosis. With the development of RNA epigenetics, it has become easier to identify novel biomarkers and therapeutic targets, Moreover, mechanisms of RNA epigenetics may be crucial for improving early cancer diagnosis, treatment, and prognosis.

A regular RNA methylation alteration called m7G has a significant impact on the entire biological process. RNA methylation is a basic process of epigenetic regulation. A large amount of evidence shows that the methylation of RNA plays a crucial function in many biological processes, and RNA methylation’s dysregulation is highly correlated with the development of human cancers, especially gastrointestinal tumors [3A–C). We used principle component analysis (PCA) to further investigate the two classes for the purpose of verifying clustering results. Significant differences between Clusters 1 and 2 were visible in PCA plots (Fig. 3D). We next evaluated whether there were notable variations in clinical parameters and overall survival (OS) between these two groups. As a consequence, OS in cluster 1 was considerably superior to OS in cluster 2 (p < 0.01) (Fig. 4B). Furthermore, Cluster 2 showed upregulated expression levels of most RNA modification regulator genes rather than Cluster 1 (Fig. 4A). Although there were no significant variations in histological grading, pathological stage, or sex, age variances between the two grou**s were noticeably different. (p < 0.05) (Fig. 4A). Consensus clustering's findings, therefore, showed a strong correlation between colon adenocarcinoma's malignancy and patterns of expression of RNA modification regulators in m7G.

Fig. 3
figure 3

Consensus cluster analysis of COAD. A Subgroup correlation when using the k = 2 clustering factor. B For k = 2–9, the cumulative distribution function (CDF) is shown in the picture. C In the CDF curve for k = 2–9, the corresponding variance is in the area below the CDF curve. D An analysis of RNA-seq data based on principal components. Blue dots indicate low-risk clusters, whereas red dots indicate high-risk clusters

Fig. 4
figure 4

Differences in clinical and pathological characteristics of Clusters 1 and 2 as well as overall survival. A Here is a heatmap showing the clinicopathological characteristics of these two clusters. Red and green represent high and low expression, respectively. The differences in age were significant (p < 0.05). B Among Clusters 1 and 2, an OS Comparison is made. *p < 0.05; **p < 0.01; ***p < 0.001. OS, overall survival

We next performed GO and KEGG analyses of the differentially expressed genes between Clusters 1 and 2 to further explain the outcomes of clustering in terms of underlying biological processes. According to analysis of GO, genes which are downregulated were mainly involved in biological processes connected to cancer listed below: neutrophil chemotaxis and migration; antimicrobial humoral immune response mediated by an antimicrobial peptide; antimicrobial humoral response; response to chemokine; cellular response to chemokine and humoral immune response; and response to chemokine (Fig. 5A, B). In addition, according to KEGG analysis, the upregulated genes are associated with cytokine-cytokine receptor interactions, and the majority of the downregulated genes were associated with COVID-19, a coronavirus illness (Fig. 5C, D).

Fig. 5
figure 5

Genes with differential expression between two clusters as determined by GO and KEGG studies. According to GO (A, B) and KEGG pathway analyses, more genes in Cluster 2 were functionally annotated (C, D). GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes

Development of a prognostic risk model for m7G RNA modification-related genes based on the expression levels of all genes

Our study performed univariate Cox regression in order to determine whether expression levels of the important regulators are connected to a patient's prognosis for colon cancer. The findings demostrated that OS as well as 16 of these genes existed significant conection (p < 0.01) (Fig. 6A). Among these 16 regulators, GABBR1, LINC00174, HSF4, LTB4R, EXOC3L4, RPL32P3, MAN2C1, YJEFN3, ZNF692, UPK3B, DNAH1, ZNF767P, MTMR9LP, AGAP9, L3HYPDH, and ADAMTS13 were considered risk genes with HR (hazard ratio) > 1. Among them, LINC00174, MAN2C1, DNAH1, MTMR9LP, AGAP9, and ADAMTS13 were at higher risk with HR > 2. Subsequently, all gene-modifying regulators with the highest prognostic ability were screened by LASSO Cox regression analysis (Fig. 6B, C), which identified the following four genes to estimate the risk of colon adenocarcinoma: heat shock transcription factor 4 (HSF4); uroplakin 3B (UPK3B); zinc finger family member 767, pseudogene (ZNF767P); and ArfGAP with GTPase domain, ankyrin repeat and PH domain 9 (AGAP9) (Fig. 6D). Calculation of risk score in following formula: risk score = (0.0859 HSF4 expression value) + (0.3854 UPK3B expression value) + (0.0592 ZNF767P expression value) + (0.1646 AGAP9 expression value). The four genes highly expressed in tumor cells were verified by qRT-PCR (Fig. 6E). Additionally, Fig. 6F shows the risk distribution score of patients with colon adenocarcinomas, and each patient's survival status was shown using a scatter plot (dot plot) (Fig. 6G).

Fig. 6
figure 6

Establishment of a model for predictive disease risk based on the regulator genes of RNA modification genes in m7G. A Analysis of the regulator genes correlated with m7G RNA methylation using univariate Cox regression. BD The procedure utilized to create the signature with Cox regression and the absolute shrinkage and selection operator (LASSO). E Expression levels of the four genes in FHC (normal) and SW480 (tumor) cells. F Risk score distributions of the risk score model. G A prognostic model of survival status distributions. *p < 0.05; **p < 0.01; ***p < 0.001

In line with median risk scores, colon cancer patients were divided into low and high risk groups to investigate the four-gene signature model's prognostic effects. According to the survival analysis, there was a worse overall survival rate (OS) for patients with high-risk scores than for those with low-risk scores (Fig. 7A, p < 0.001). In the high-risk group, the five-year OS rate was 56.3%, while in the low-risk cluster, the five-year OS rate was 73.3%. The area under the curve (AUC) values for the one-year, two-year, three-year, and five-year OS were 0.648, 0.663, 0.670, and 0.628, respectively, as determined by ROC curve analysis, which indicated the strong predictive potential for survival outcomes (Fig. 7B).

Fig. 7
figure 7

Prognostic model survival analyses based on Kaplan–Meier curves. Using the median risk score as the cutoff, patients from the two datasets were divided into low-risk (blue) and high-risk (yellow) groups. A In TCGA cohort, the group which is in low risk had a higher probability of survival compared to the group which is in high risk (p < 0.001). B The one-, two-, three- and five-year AUC values were 0.648, 0.663, 0.670, and 0.628, in the TCGA cohort. C In GEO cohort, the model for prognosis was verified to be accurate. The low-risk group's chances of survival were higher than those of the high-risk group (p = 0.04). D with one-, two-, three-, and five-year AUC values of 0.557, 0.617, 0.529, and 0.535, respectively. GEO Gene Expression Omnibus, TCGA The Cancer Genome Atlas, AUC area under the curve

Using GEO's database to validate predictive signatures

GEO's database microarray data (GSE39582) as a testing set were conducted to evaluate the four-gene signature's prognostic value. According to the cut-off values of the TCGA cohort, 228 individuals with colon adenocarcinoma in the GSE39582 cohort were grouped into two groups. 117 people were in the category of having high risk. About another 111 people were defined as low risk. Survival analysis showed that patients with colon cancer in the low-risk group had a considerably better OS than patients in the high-risk group, which is in line with the findings in the TCGA cohort. (Fig. 7C , p = 0.04). In the one-, two-, three-, and five-year OS, the AUC values were 0.557, 0.617, and 0.535., which demonstrated that the prediction model accurately predicted the OS of colon adenocarcinoma patients (Fig. 7D).

Prediction of colon cancer patient prognosis using the four-gene risk signature

Comparing low and high risk groups based on pathological stage N revealed significant differences in clinical parameters (p < 0.001). The heatmap shows the expression and clinical correlation of four genes, and all four genes are strongly associated with prognosis between the two groups. (Fig. 8A). In total, 458 cases were added to the Cox regression analysis after cases with insufficient clinical information were eliminated. According to univariate analysis, OS is significantly associated with four-gene risk scores, T stage, N stage, and clinicopathological stage in patients with colon adenocarcinoma (Fig. 8B , p < 0.001). In order to ascertain if four-gene risk marker is a predictive marker for colon adenocarcinoma independent of the other clinicopathological characteristics, multivariate Cox regression analysis was performed. The results revealed that OS in individuals with colon adenocarcinoma was independently correlated with risk score and clinicopathological stage. (Fig. 8C , p < 0.001). These findings indicated that the four-gene risk signature can be utilized as an indicator which can independently prognosticate for colon adenocarcinoma regardless of sex, age, histological grade, and pathological stage.

Fig. 8
figure 8

The prediction ability of the risk score and clinicopathological characteristics for COAD patient prognosis. A The heatmap displays the expression of five m7G RNA modification regulators and the distribution of clinicopathological characteristics in high- and low-risk groups. In the heatmap, it displays the expression levels of five m7G RNA modification regulators, as well as the clinicopathological characteristics’ distribution in groups at high- and low-risk. B Clinicopathological parameter and OS assessments using univariate Cox regression. C Clinicopathological variables and OS were analyzed using multivariate Cox regression. *p < 0.05; **p < 0.01; ***p < 0.001. COAD colon adenocarcinoma, OS overall survival

Differential analysis of immune cell infiltration and immune function in groups at high- and low-risk

We examined the variations in immune cell infiltration connected to colon cancer in the TCGA database. Macrophage, neutrophil, and regulatory T cells (Tregs) infiltration levels were high in group which is at low risk but low in the group which is at high risk (p < 0.001) (Fig. 9A). Differences in immune functions associated with colon adenocarcinoma were analyzed. The levels of APC coinhibition, APC costimulation, chemokine receptors (CCRs), cytolytic activity, inflammation promotion, parainflammation, and type-II IFN-response were high in the low-risk group but low in the high-risk group (p < 0.001) (Fig. 9B).

Fig. 9
figure 9

Analysis of immunological differences between high- and low-risk groups of COAD patients. Groups at low risk are represented by the blue box, while groups at high risk are represented by the red box. The horizontal line in the box showed the median value, which is the expression of different groups. A The levels of macrophages, neutrophils, and regulatory T cells (Tregs) were high in the group at low risk but low in the group at high risk (p < 0.001). B In contrast to the group at high risk, the levels of APC coinhibition, APC costimulation, and CCR were low in the group at low risk (p < 0.001). C The levels of macrophages, neutrophils, and Tregs were significantly higher in the group at low risk and significantly lower in the group at high risk (p < 0.001). D The levels of CCRs, cytolytic activity, and inflammation promotion were high in the group at low risk but low in the group at high risk. *p < 0.05; **p < 0.01; ***p < 0.001. COAD, colon adenocarcinoma; CCRs, chemokine receptors

Immune cell differential analysis was also verified to be significant in the GEO database with high infiltration levels of macrophages, neutrophils, and Tregs in the group which is at low risk but low infiltration of these cells in the group which is at high risk(p < 0.001) (Fig. 9C). Regarding immune function differential analysis, the levels of CCRs, cytolytic activity, and inflammation promotion were high in the group at low risk but low in the group at high risk (p < 0.001) (Fig. 9D).

Construction of a nomogram for colon adenocarcinoma prognosis

To establish a quantitative method to predict individual survival, we created a novel predictive nomogram based upon age, sex, histological grade, pathological stage, and risk score. (Fig. 10). The outcomes demonstrated that in patients with colon adenocarcinoma, one, three, and five-year OS were systematically predicted by the nomogram.

Fig. 10
figure 10

Establishment of a predictive nomogram using a risk score and clinicopathologic traits

Discussion

Based on four modifiers of key regulators closely correlated with prognosis in patients with colon adenocarcinoma, a predictive model was constructed by us. Patients with colon adenocarcinoma can have their prognosis reliably predicted using the established risk score. After selecting patient data from TCGA database that satisfied the inclusion requirements, we first performed a bioinformatics analysis. To find genes that were differently expressed, we performed the Wilcoxon test in R. And these genes can encode regulators of m7G RNA methylation. In colon adenocarcinoma, we then identified the differentially expressed m7G RNA modification regulators. Using the screened gene expression patterns in accordance with k = 2, we divided the colon cancer patients into two groups with distinct clinical outcomes. We also analyzed the variations in immune function and immune cell infiltration between the two groups. Following these procedures, we created the predictive risk model which is based upon all levels of gene expression. Finally, to validate the prognostic model, we repeated the above steps to validate the prognostic model by screening the data of colon adenocarcinoma patients who met the inclusion criteria through the GEO database, which was well validated. Based on the accuracy of the results, we evaluated the prognosis of colon adenocarcinoma patients by generating a nomogram.

Heat shock transcription factor 4 (HSF4) is a heat shock factor and is a member of the HSF family. HSF4 has various physiological functions as follows: regulating the transcriptional program of the heat response or stress response; regulating cell proliferation and differentiation during development; regulating DNA damage repair, and regulating normal physiological processes. HSF4’s Alterations are also strongly linked to cataracts, cancer, and other illnesses [21]. Mice lacking HSF4 produce irregular lenses and develop cataracts early in experiments [22,23,24]. A previous study on a cohort of patients with congenital cataracts in China has indicated that disease development is closely associated with genetic mutations in HSF4 DBD [25]. By playing a critical role in carcinogenesis and tumor progression, HSF4 has been demonstrated to enhance EMT by activating the AKT pathway in a HIF1α-dependent manner in hepatocellular carcinoma; Hepatocellular carcinoma cells are better able to migrate, disseminate, and invade when HSF4 is upregulated, which promotes aggressive tumor behavior, indicating that high HSF4 expression may be a predictor of poor hepatocellular carcinoma after radical resection [26].

The expression of uroplakin 3B (UPK3B), Several tissues, and organs have been shown to include a few of the main structural elements of uroepithelial tissue (UPK3A and UPK3B). For example, in mouse embryos, Cre recombinase activity driven by UPK3B is detected in the liver, heart, kidney, lung, and neural crest cells. UPK3B expression has been detected in mouse testes, epididymal spermatozoa, ovarian follicles, and oviductal mucosa, proving that UPK3B may be extremely important for the development of mouse gametes as well as gamete delivery organs [27]. A whole transcriptome analysis of placental changes in fetuses with prenatal arsenic exposure has reported that UPK3B is one of the most significant ‘off’ genes for arsenic exposure in females [28]. Transcriptome analysis of liver fibrosis has identified UPK3B as a potential regulator of hepatic stellate cell (HSC) activation-induced liver fibrosis [29]. Low FOXA1 expression has been linked to earlier tumor staging, while FOXA1 deletion has been associated with high histological grade. Increased UPK3B expression, decreased E-calcineurin expression, and increased cell proliferation have been observed in FOXA1-deficient RT4 bladder cancer cells, demonstrating a strong relationship between high UPK3B expression and tumor malignancy [30].

These recent studies have demonstrated that HSF4 and UPK3B are both closely associated with tumors. Although few studies have been reported on the ZNF767P and AGAP9 genes, they are promising research targets. What’s more, by analyzing immune cell infiltration and immune activity differently, macrophages contribute a significant part in the formation of tumors too. while the therapeutic effect can also be achieved through the modulatory role of engineered macrophages in the tumor immune microenvironment and tumor therapy [31]. A dysregulation of macrophage-mediated immunosuppression leads to chronic inflammation at low grade due to tissue-specific macrophages and neutrophils, which ultimately leads to the development of cancer [32]. Neutrophils are considered complex cells with many specific functions, and they act as effectors of the innate immune response and play a regulatory role in multiple processes, such as cancer, acute injury, repair, autoimmunity, and chronic inflammation [33]. In the host, neutrophils reflect inflammation, which is a hallmark of cancer [34]. An association between high layilin (LAYN) expression and poor overall survival in colon cancer patients has been demonstrated by Pan et al.. A positive correlation exists between LAYN expression and macrophage and neutrophil infiltration in colon adenocarcinoma (COAD) [35]. The immune system tightly controls Th17/Treg homeostasis through the TGF-/IL-2 and IL-6 cytokine axis. Regulatory T cells (Tregs) are necessary for self-tolerance and defense against autoimmunity, and they are typically linked to the advancement of cancer [36]. By maintaining Treg activity and accumulation in the colon, glycoprotein-A repetition predominant (GARP) reduces cancer immunity [37]. Thus, these findings indicate that macrophages, neutrophils, and Tregs are highly infiltrated in the group at low risk, which reduces the tumorigenesis development.

According to the current study, three immune function differential analyses were validated in both TCGA and GEO databases. Numerous elements of cancer biology have been identified to involve chemokines and their receptors; their possible targets have been evaluated in several studies, and chemokine receptor inhibitors have been used in clinical practice in hematologic malignancies [38]. In patients with gastric cancer, the cytolytic activity score can be employed as a biomarker in antitumor immunity and clinical prognosis [39] but also to evaluate anticancer immunity in colorectal cancer [40]. Nuclear factor-κB (NF-κB), which promotes inflammation, is a central mediator of the inflammatory process, and activation of NF-κB is also prevalent in cancer, which is mainly driven by inflammatory cytokines in the tumor microenvironment [41]; thus, inflammation promotion plays a crucial role in tumorigenesis. In conclusion, chemokine receptors, cytolytic activity, and inflammation promotion are closely related to tumors and play critical roles in the diagnosis, treatment, and prognosis of tumors.

The present study had several benefits. First, based on the patterns of major regulators related to m7G RNA alterations expressed in all genes, we generated the first predictive model for colon cancer. Second, the model was constructed using a variety of statistical techniques, and both the test cohort and the entire cohort were used for validation. As a result, the predictive risk model for patients with colon cancer is precise and trustworthy. The accuracy of the risk score model in predicting OS was higher than that of pathological stage and age, and the risk score model can be employed as a standalone prognostic indicator. Finally, throughout the advancement of colon cancer research, our model can also be utilized to predict immune cell infiltration and to study differential immune function. However, the present study had several limitations. First, we generated an unvalidated prognostic risk model based on a public database rather than a clinical study. In addition, the possible mechanisms through which the important regulators of m7G RNA modification affect colon cancer progression need to be further investigated by basic experiments. Further, there was a lack of in vivo or in vitro experiments exploring the molecular functions of the four genes in the model. Further studies are required to elucidate the mechanism.

Conclusions

In the present study, a four-gene signature of colon cancer, consisting of HSF4, UPK3B, ZNF767P, and AGAP9, was generated and validated. It can be used as an auxiliary predictive variable, and patients with colon adenocarcinoma can be predicted how long their survival will last using this analysis.