Introduction

Methylation of adenosines at the N6 position (m6A) has been shown to be the most prevalent modification of eukaryotic mRNAs1. Dynamic and reversible m6A modification is coordinated through installation by methyltransferases (“writers”), eradication by m6A eraser demethylases (“eraser”), and recognition by binding proteins (“readers”). m6A is the most abundant type of mRNA nucleotide modification and plays diverse roles in physiological development and pathophysiological processes, including cancers.

m6A writers include METTL3, METTL14, WTAP, and KIAA1429, and m6A erasers include FTO and ALKBH5. m6A readers, which are proteins that recognize m6A and regulate mRNA processing, include YTHDC1, YTHDC2, YTHDF1, YTHDF2, YTHDF3, HNRNPA2B1, IGF2BP1, IGF2BP2, and IGF2BP3. In tumor biology, m6A modification plays crucial roles in tumorigenesis, proliferation, and metastasis2. Reporters have revealed that m6A RNA methylation mediates non-small cell lung cancer proliferation and progression3. Liu et al. demonstrated that m6A has oncogenic functions and promotes gastric cancer progression4.

Aberrantly expressed m6A regulators also function as potential biomarkers in cancers5,6,7. Studies have shown that KIAA1429 upregulation predicts a poor prognosis in colorectal cancer6. Through bioinformatics analysis, Guo et al. revealed that decreased m6A regulator expression is related to greater immune cell infiltration and better survival in pancreatic cancer5. The evidence also indicates that METTL14 downregulation in renal cell carcinoma patients is associated with malignant characteristics and predicts a poor prognosis7. These studies suggest that m6A regulators might modulate the tumor microenvironment and exhibit prognostic prediction efficacy in HNSCC.

Increasing evidence demonstrates that m6A in the tumor immune microenvironment plays a critical role in the pathogenesis of various cancers. m6A modification also functions in homeostasis and tumor immunosurveillance by promoting the antitumor activity of NK cells in the tumor microenvironment8. Low m6A regulator-based scores are associated with a better prognosis in small-cell lung cancer after adjuvant chemotherapy9. In addition, this low-score cohort exhibits more CD8+ T-cell infiltration and is more responsive to cancer immunotherapy9. Nevertheless, the expression pattern and pathophysiological role of m6A in HNSCC remain largely unknown.

In this study, we performed bioinformatics analysis, classified HNSCC patients into subtypes with different scores and comprehensively explored the m6A signature-associated clinical features, mutations, phenotype-related gene clusters, and immune infiltration. We also constructed and validated an m6A regulator-based risk score model. Our study may provide novel clues for HNSCC prognosis and immunotherapy predictions.

Results

Copy number variation (CNV) and whole-exome single-nucleotide polymorphism (SNV) analyses of m6A regulators in HNSCC

Genomic alterations have a great impact on tumor development, progression, and therapeutic responses. Aberrant m6A regulator alterations are closely related to malignant tumor activities. To investigate the alterations in m6A regulators, we explored SNVs and CNVs in HNSCC from TCGA. The results revealed alterations in m6A regulator CNVs, as shown in Table 1, and RBM15 accounted for 1.01% amplification, and 47.92% of deletion rate, the landscape of regulators was listed. The proportion of YTHDF3 amplification was highest, reaching 20.88%. The various m6A regulator CNVs are illustrated in Supplementary Fig. 1. We also identified the SNVs of m6A regulators and recognized alterations in 86 (16.93%) of the 508 total samples, being the more prevalent alterations in KIAA1429 and LRPPRC (Supplementary Fig. 2).

Table 1 m6A modification proportions among 20 genes.

m6A regulators are differentially expressed and associated with prognosis

m6A modifications have a close relationship with tumor progression. To investigate the m6A regulator expression levels, we adopted the genomic profiles of 20 selected m6A regulators. The results revealed that 18 of these 20 genes showed significant differences between tumor tissues and paired normal tissues (Fig. 1A). To investigate whether the m6A regulator status is associated with clinical outcomes, we adopted univariate Cox regression models for the clinical follow-up data. The results revealed that HNRNPC exhibits a close relationship with the overall survival time of the patients and might play key roles in tumor progression (p = 0.016) (Fig. 1B). To better investigate the connection between m6A regulators and prognosis, we classified these 20 genes using the Euclidean method with the “dist” function based on the expression levels of these 20 genes. We then classified these 20 m6A regulator genes into four regulator clusters, namely, regulator cluster A, regulator cluster B, regulator cluster C, and regulator cluster D, based on their genomic expression levels. These results suggest that clusters with similar biological functions might present negative correlations. The red line represents positive correlations between YTHDC2 and IGF2BP1, and other clusters connected with gray lines are negatively correlated each other (Fig. 1C).

Figure 1
figure 1

Most m6A regulators are significantly differentially expressed in HNSCC tumor tissues and are associated with patient prognosis. (A) m6A regulators were significantly increased in tumor tissues compared with normal tissues. (B) Univariate Cox analysis forest map of regulators. (C) Construction of the m6A-associated regulatory network. The data were analyzed by t tests: *p < 0.05; **p < 0.01; ***p < 0.001.

m6A signature-based subtype

Distinct m6A regulator-based clusters were characterized by markedly different prognoses. To better understand the m6A signatures, samples were clustered based on RNA-seq data through unsupervised clustering using ConsensusClusterPlus. We used the cumulative distribution function (CDF) and found that the clustering result obtained with k = 2 was relatively stable (Fig. 2A,B). The results demonstrated that cluster 1 exhibited a poorer prognosis than C2 (p = 0.018, Fig. 2C). We also constructed disease-specific survival (DSS), disease-free interval (DFI) and progression-free interval (PFI) curves to evaluate the prognosis of HNSCC patients. The C2 cluster showed significantly improved DSS (p = 0.032; Supplementary Fig. 3A) and PFI (p = 0.014, Supplementary Fig. 3B). The DFI (p = 0.67) showed no significant difference between the C1 and C2 clusters (Supplementary Fig. 3C). Our evaluation of the differences in tumor-associated pathway scores between the two subtypes revealed that the C2 cluster was significantly enriched in the HIPPO, MYC, NOTCH, NRF1, TGF-beta, PIK3, RAS and TP53 pathways (Fig. 2D).

Figure 2
figure 2

Construction of m6A-related subtypes. (A) CDF curve and CDF curve delta area of the TCGA HNSCC cohort. (B) T Sample clustering heatmap, consensus K = 2. (C) KM curve of the two subtypes in the TCGA-HNSCC cohort. (D) Estimation of the scores of different subtypes in 10 abnormal tumor pathways. (E) Proportions of immune cells in different subtypes.

Clusters C1 and C2 show significantly different tumor mutation burdens (TMBs) and alterations

Different m6A regulator-based clusters were also characterized by significantly different TMBs and alterations. To characterize the TMBs in the two clusters, we analyzed the distribution pattern of the TMBs. The TMBs showed that the alterations occupied 71.65% of all samples (364 of 508) (Fig. 3A). The results indicated that C1 showed a higher TMB than C2 (p = 0.011, Fig. 3B).

Figure 3
figure 3

m6A modification subtypes show a close relationship with common genomic mutations. (A) Significantly altered genes and signatures in the C1 and C2 subgroups. (B) The TMB differed between the C1 and C2 subgroups. (C) Number of mutated genes between the C1 and C2 subgroups. (DK) The m6A-associated subgroups C1 and C2 presented significantly different clinical feature distributions.

Clinical characteristics of the C1 and C2 clusters

Distinct m6A regulator-based clusters also featured significantly different clinical characteristics. To evaluate the clinical feature distribution in the two subtypes, we found no significant difference in the survival status, M stage, N stage, age, or HPV16 status (Fig. 3C–G), but significant differences in sex, T stage, stage, and tumor grade were detected (Fig. 3H–K). These results suggest that m6A signatures are closely related to clinical characteristics. The detailed clinical information is illustrated in Supplementary Table 1.

Identification of m6A-associated DEGs

In this analysis, we identified 945 DEGs between C1 and C2, which included 242 upregulated genes and 703 downregulated genes, as shown in the volcano plot (Fig. 4A). Furthermore, we performed unsupervised consensus clustering of the DEGs between the C1 and C2 subgroups and identified 3 clusters (cluster A, cluster B, and cluster C). C1 was mainly located in cluster A (Fig. 4B). According to the KM curve, cluster A presented the worst prognosis, whereas cluster C exhibited the best prognosis according to 15 years follow data (Fig. 4C). The adjusted pairwise comparisons demonstrated no significant differences between clusters C and B and between clusters A and C. Cluster B showed a shorter overall survival time than cluster A (p = 0.0066) (Supplementary Fig. 3D–F). In addition, a gene set variation analysis (GSVA) revealed that Clusters A, B, and C were enriched in significantly different pathways (Fig. 4D). Cluster A was mainly enriched in the retinal metabolic process, peptidyl methionine modification, and isoprenoid biosynthetic process categories. Cluster B was mainly enriched in natural killer cell activation involved in the immune response, peptide cross linking, Golgi ribbon formation, desmosome organization, T helper 2 cell cytokine production, positive regulation of ruffle assembly, epidermal cell differentiation, keratinization, cornification and regulation of water loss via skin. Cluster C was enriched in cell migration in the hindbrain, positive regulation of endothelial cell chemotaxis, synaptic vesicle clustering, cardiac lift ventricle morphogenesis, keratan sulfate catabolic process, regulation of developmental pigmentation and regulation of endothelial cell chemotaxis (Fig. 4D). To detect the m6A regulator expression levels in different clusters, we examined the distribution of these 20 genes in C1 and C2 (Supplementary Fig. 4A). We also examined the distribution of these 20 genes in clusters A, B, and C (Supplementary Fig. 4B).

Figure 4
figure 4

m6A-associated DEGs. (A) Volcano plot of m6A-associated DEGs. (B) Unsupervised cluster analysis diagram of m6A-associated DEGs. (C) KM curves of three m6A gene clusters. (D) A GSVA indicated that three clusters, A, B, and C, were enriched in different pathways.

Construction and validation of an m6A regulator signature-based prognostic model

In detail, to better construct the prognostic model of m6A regulators, we applied Cox regression and lead absolute shrinkage and selection operator (LASSO) regression analysis and described the changing trajectory of each independent variable. The number of independent variable coefficients approaching zero increased with decrease in lambda (Fig. 5A). The confidence interval (CI) for each lambda was examined, as shown in Fig. 5B. The optimal model was acquired with a lambda value equal to 0.0173. We adopted the step AIC function from the MASS R package and identified 5 genes, namely, CYP26A1, KIF13B, CPVL, FCGBP, and MB. The 5-gene signature-based model was constructed according to the following equation: RiskScore = 0.178*CYP26A10.446*KIF13B + 0.305*CPVL0.238*FCGBP + 0.087*MB. The risk score landscape of each sample is illustrated in Fig. 5C. The AUC at 1, 3, and 5 years suggested that this 5-gene model presented good prognostic prediction efficacy (Fig. 5D). We calculated the risk score of each sample according to the expression level of the TCGA training dataset samples and obtained the risk scores of the samples. A Z score analysis of risk scores was then performed to categorize samples with scores > 0 into the high-risk group and those with scores < 0 into the low-risk group. The high-risk-score group presented a poorer prognosis than the low-risk-score group (uncorrected p = 0.001, HR = 2.11, 95% CI 1.61–2.76) (Fig. 5E). The TCGA gene profile was utilized as the validation dataset. The risk score distribution landscape and the AUC indicated a good and stable prediction efficiency (Fig. 6A,B). Consistent with the training set, the high-risk score subgroup showed a poor predicted prognosis (uncorrected p = 0.0026, HR = 1.66, 95% CI 1.29–2.13, Fig. 6C). To better evaluate the efficiency of the 5-gene model in the TCGA all-sample set, we calculated the risk score of each sample and the AUCs at 1, 3, and 5 years (Fig. 6D,E). The high-risk-score group presented a shorter survival time than the low-risk-score subgroup (p < 0.0001, HR = 1.84, 95% CI 1.53–2.22, Fig. 6F). The results agreed well with previous findings. Based on the previous 5-gene model, we also constructed an m6A regulator-associated gene model, and based on the expression of these m6A regulators, we classified HNSC patients into high- and low-risk score subgroups. The m6A score model efficiency was evaluated based on the TCGA training set and validated based on the TCGA validation set. A total of 424 samples from TCGA-HNSC were classified into a 212-sample training set and a 212-sample validation set. The detailed clinical information of the high- and low-m6A-score subgroups is illustrated in Supplementary Table 1. The high-m6A-score group presented a poorer prognosis than the low-m6A-score group (Supplementary Fig. 5A–D).

Figure 5
figure 5

Construction and validation of a m6A phenotype-associated prognostic model. (A) The horizontal axis represents the log value of lambda, and the vertical axis represents the coefficient of the independent variable. (B) Confidence intervals for each lambda. (C) Risk score, survival time, survival status, and expression of 5 genes in the TCGA training group. (D) ROC curve and AUC of the 5-gene signature classification. (E) KM survival curve distribution between the high- and low-risk scores of the 5-gene signature in the training set.

Figure 6
figure 6

Validation of the m6A phenotype-associated prognostic model. (A) Risk score, survival time, survival status, five-gene expression, ACU curve, and KM curve of the TCGA validation dataset. (B) Risk score, survival time, survival status, five-gene expression, ACU curve, and KM curve of the TCGA all-sample validation dataset.

The high-m6A-score group presented distinct immune-associated signatures. We calculated the infiltration levels of 22 immune cell types in the two m6A score-based groups, and the results demonstrated that 8 (36.36%) immune cell types showed a significant difference (Fig. 7A). We also performed a single-sample gene set enrichment analysis (ssGSEA) to calculate the IFNγ score of each patient, and significant differences in IFNγ scores were observed between the high- and low-score groups (Fig. 7B). To evaluate the differences in chemokine gene expression between the two groups, we calculated the chemokine gene expression levels and found that 12 chemokines showed significantly differential expression between the two groups (Fig. 7C). We also calculated the chemokine receptor gene expression levels and found that the expression levels of 6 of 18 (33.33%) genes were markedly different (Fig. 7D). Furthermore, we obtained 47 immune checkpoint genes and found that 14 (29.79%) immune checkpoint genes presented low expression levels in the high-m6A-score group (Fig. 7E). To investigate the correlation between m6A regulators and the hypoxia score, we performed GSEA and calculated the hypoxia score of each regulator. The results demonstrated that METTL3 (p < 0.05), ZC3H13 (p < 0.001), ALKBH5 (p < 0.001), FTO (p < 0.001), YTHDF1 (p < 0.001), YTHD3 (p < 0.001), HNRNPA2B1 (p < 0.01), FMR1 (p < 0.01), and ELAVL1 (p < 0.01) were significantly correlated with m6A regulators (Supplementary Fig. 6A) (the p value in this section is uncorrected). The m6A regulator-based cluster C had a markedly higher hypoxia score than cluster A (uncorrected p = 0.001). The high-m6A-score subgroup exhibited a higher hypoxia score than the low-m6A-score subgroup. The hypoxia score did not significantly differ between C1 and C2 (Supplementary Fig. 6B). We performed univariate and multivariate Cox regression analyses to evaluate the independence of the m6A score signature model in terms of clinical application. The univariate Cox regression analysis showed that the m6A score was significantly correlated with survival, and the corresponding multivariate Cox regression analysis revealed that the m6A score-based group (uncorrected p < 0.05, HR = 1.76, 95% CI 1.23–2.53) was significantly correlated with survival (Fig. 8A,B). These outcomes revealed that the m6A score model can exhibit good prognostic prediction value in the clinic.

Figure 7
figure 7

Correlation analysis of immune infiltration in the m6A-associated high- and low-risk-score groups. (A) Infiltration levels of 22 types of immune cells between the high- and low-m6A-risk score-based subgroups in the TCGA cohort. (B) CYT scores between the high- and low-m6A-risk-score subgroups. (C) Differences in chemokine expression between the high- and low-risk-score groups. (D) Distribution of chemokine receptor expression between the high- and low-risk-score groups. (E) Distribution of immune checkpoint gene expression between the high- and low-risk-score groups.

Figure 8
figure 8

The m6A risk score model presents good independence in clinical application. (A) Univariate analysis of the m6A risk score model based on clinical factors. (B) Multivariate analysis of the m6A risk score model based on clinical factors. (C) Correlation coefficient clustering of risk scores and KEGG pathways. (D) Pathway enrichment analysis of different risk score subgroups.

Biological functional enrichment in the m6A score model

To explore the relationship between the m6A score and associated biological functions, we performed ssGSEA and determined that five functional pathways were negatively correlated with the m6A score and that 10 pathways were positively correlated with the sample m6A score. In addition, a Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was performed in this study to investigate the enriched signaling pathways10. The results from the KEGG enrichment-based cluster analysis of the pathways are illustrated in Fig. 8C. The KEGG pathways cardiac muscle construction, glycerophospholipid metabolism and olfactory transduction showed increased enrichment in the high-m6A-score group (Fig. 8D).

Discussion

Epigenetic modifications have emerged as abundant modifications throughout the transcriptome and play very important roles in genome evolution and innovation

Table 2 The TCGA training set and TCGA test set.

LASSO regression analysis

The m6A regulator-based signatures might have important prognostic value in HNSCC. First, two molecular subtypes were constructed based on the consensus clustering of m6A-related genes, and the differentially expressed genes related to the m6A subtype were screened by differential analysis. We applied LASSO regression analysis to build prediction models based on the training dataset and validated their predictive effectiveness using the validation cohort. The genes associated with prognosis were screened by univariate Cox analysis of the initial input of 27 genes, namely, CYP26A1, KIF13B, CPVL, CMYA5, FCGBP, XIRP2, NEB, CASQ1, MYH2, NRAP, MYH7, LMOD2, ACTN2, MYOT, HSPB7, APOBEC2, MYL1, MYH1, ACTA1, MB, MYOZ1, CSRP3, MYL2, COX6A2, SMPX, CKM and SLN. Then, LASSO analysis and multivariate Cox analysis were applied to construct a risk-score based prognoistic model. Among these differentially expressed genes, five key genes were screened by univariate Cox analysis.

The glmnet R package was used for LASSO Cox variable selection and model building. The survival R package was used for the comparison of survival curves. Survival analysis based on the Kaplan–Meier (KM) curve was applied to evaluate prognosis.

Statistics

Statistical analyses and data plotting were performed using the R program (3.6.2)29. Fisher’s exact and equal-variance t tests were used in the group comparisons of categorical and continuous variables, respectively. Spearman’s correlation analysis test was applied to evaluate the correlation relationships of different cancer types. p ≤ 0.05 indicates statistical significance.