Abstract
Head and neck squamous cell carcinoma (HNSCC) has become the sixth most common malignant disease worldwide and is associated with high mortality, with an overall 5-year survival rate of less than 50%. Recent studies have demonstrated that aberrantly expressed m6A regulators are involved in multiple biological and pathological processes, including cancers, but the specific mechanisms of m6A regulators in HNSCC are not well elucidated. In this study, we adopted The Cancer Genome Atlas (TCGA)-HNSCC database and performed a consensus clustering analysis to classify the HNSCC samples. Least absolute shrinkage and selection operator (LASSO) regression was applied to construct an m6A signature-based HNSCC risk prediction model. Cell type identification based on estimating relative subsets of RNA transcripts (CIBERSORT) algorithms was adopted to evaluate the immune cell infiltration level in the tumor microenvironment. Based on the expression of m6A regulators in HNSCC, we identified two clusters, cluster 1 (C1) and cluster 2 (C2). C2 showed a better prognosis than C1 and was mainly enriched in the HIPPO, MYC, NOTCH, and NRF signaling pathways. We constructed an m6A signature-based risk score model and classified patients into high- and low-risk score subgroups. The high-risk-score group showed poor clinical characteristics, higher immune infiltration levels, higher chemokine and chemokine receptor expression levels, and lower immune checkpoint gene expression than the low-risk-score subgroup. In conclusion, our comprehensive analysis suggests that the m6A signature-based risk score might function as a good prognostic predictor. Our study may provide novel therapeutic clues and help predict the prognosis of HNSCC.
Similar content being viewed by others
Introduction
Methylation of adenosines at the N6 position (m6A) has been shown to be the most prevalent modification of eukaryotic mRNAs1. Dynamic and reversible m6A modification is coordinated through installation by methyltransferases (“writers”), eradication by m6A eraser demethylases (“eraser”), and recognition by binding proteins (“readers”). m6A is the most abundant type of mRNA nucleotide modification and plays diverse roles in physiological development and pathophysiological processes, including cancers.
m6A writers include METTL3, METTL14, WTAP, and KIAA1429, and m6A erasers include FTO and ALKBH5. m6A readers, which are proteins that recognize m6A and regulate mRNA processing, include YTHDC1, YTHDC2, YTHDF1, YTHDF2, YTHDF3, HNRNPA2B1, IGF2BP1, IGF2BP2, and IGF2BP3. In tumor biology, m6A modification plays crucial roles in tumorigenesis, proliferation, and metastasis2. Reporters have revealed that m6A RNA methylation mediates non-small cell lung cancer proliferation and progression3. Liu et al. demonstrated that m6A has oncogenic functions and promotes gastric cancer progression4.
Aberrantly expressed m6A regulators also function as potential biomarkers in cancers5,6,7. Studies have shown that KIAA1429 upregulation predicts a poor prognosis in colorectal cancer6. Through bioinformatics analysis, Guo et al. revealed that decreased m6A regulator expression is related to greater immune cell infiltration and better survival in pancreatic cancer5. The evidence also indicates that METTL14 downregulation in renal cell carcinoma patients is associated with malignant characteristics and predicts a poor prognosis7. These studies suggest that m6A regulators might modulate the tumor microenvironment and exhibit prognostic prediction efficacy in HNSCC.
Increasing evidence demonstrates that m6A in the tumor immune microenvironment plays a critical role in the pathogenesis of various cancers. m6A modification also functions in homeostasis and tumor immunosurveillance by promoting the antitumor activity of NK cells in the tumor microenvironment8. Low m6A regulator-based scores are associated with a better prognosis in small-cell lung cancer after adjuvant chemotherapy9. In addition, this low-score cohort exhibits more CD8+ T-cell infiltration and is more responsive to cancer immunotherapy9. Nevertheless, the expression pattern and pathophysiological role of m6A in HNSCC remain largely unknown.
In this study, we performed bioinformatics analysis, classified HNSCC patients into subtypes with different scores and comprehensively explored the m6A signature-associated clinical features, mutations, phenotype-related gene clusters, and immune infiltration. We also constructed and validated an m6A regulator-based risk score model. Our study may provide novel clues for HNSCC prognosis and immunotherapy predictions.
Results
Copy number variation (CNV) and whole-exome single-nucleotide polymorphism (SNV) analyses of m6A regulators in HNSCC
Genomic alterations have a great impact on tumor development, progression, and therapeutic responses. Aberrant m6A regulator alterations are closely related to malignant tumor activities. To investigate the alterations in m6A regulators, we explored SNVs and CNVs in HNSCC from TCGA. The results revealed alterations in m6A regulator CNVs, as shown in Table 1, and RBM15 accounted for 1.01% amplification, and 47.92% of deletion rate, the landscape of regulators was listed. The proportion of YTHDF3 amplification was highest, reaching 20.88%. The various m6A regulator CNVs are illustrated in Supplementary Fig. 1. We also identified the SNVs of m6A regulators and recognized alterations in 86 (16.93%) of the 508 total samples, being the more prevalent alterations in KIAA1429 and LRPPRC (Supplementary Fig. 2).
m6A regulators are differentially expressed and associated with prognosis
m6A modifications have a close relationship with tumor progression. To investigate the m6A regulator expression levels, we adopted the genomic profiles of 20 selected m6A regulators. The results revealed that 18 of these 20 genes showed significant differences between tumor tissues and paired normal tissues (Fig. 1A). To investigate whether the m6A regulator status is associated with clinical outcomes, we adopted univariate Cox regression models for the clinical follow-up data. The results revealed that HNRNPC exhibits a close relationship with the overall survival time of the patients and might play key roles in tumor progression (p = 0.016) (Fig. 1B). To better investigate the connection between m6A regulators and prognosis, we classified these 20 genes using the Euclidean method with the “dist” function based on the expression levels of these 20 genes. We then classified these 20 m6A regulator genes into four regulator clusters, namely, regulator cluster A, regulator cluster B, regulator cluster C, and regulator cluster D, based on their genomic expression levels. These results suggest that clusters with similar biological functions might present negative correlations. The red line represents positive correlations between YTHDC2 and IGF2BP1, and other clusters connected with gray lines are negatively correlated each other (Fig. 1C).
m6A signature-based subtype
Distinct m6A regulator-based clusters were characterized by markedly different prognoses. To better understand the m6A signatures, samples were clustered based on RNA-seq data through unsupervised clustering using ConsensusClusterPlus. We used the cumulative distribution function (CDF) and found that the clustering result obtained with k = 2 was relatively stable (Fig. 2A,B). The results demonstrated that cluster 1 exhibited a poorer prognosis than C2 (p = 0.018, Fig. 2C). We also constructed disease-specific survival (DSS), disease-free interval (DFI) and progression-free interval (PFI) curves to evaluate the prognosis of HNSCC patients. The C2 cluster showed significantly improved DSS (p = 0.032; Supplementary Fig. 3A) and PFI (p = 0.014, Supplementary Fig. 3B). The DFI (p = 0.67) showed no significant difference between the C1 and C2 clusters (Supplementary Fig. 3C). Our evaluation of the differences in tumor-associated pathway scores between the two subtypes revealed that the C2 cluster was significantly enriched in the HIPPO, MYC, NOTCH, NRF1, TGF-beta, PIK3, RAS and TP53 pathways (Fig. 2D).
Clusters C1 and C2 show significantly different tumor mutation burdens (TMBs) and alterations
Different m6A regulator-based clusters were also characterized by significantly different TMBs and alterations. To characterize the TMBs in the two clusters, we analyzed the distribution pattern of the TMBs. The TMBs showed that the alterations occupied 71.65% of all samples (364 of 508) (Fig. 3A). The results indicated that C1 showed a higher TMB than C2 (p = 0.011, Fig. 3B).
Clinical characteristics of the C1 and C2 clusters
Distinct m6A regulator-based clusters also featured significantly different clinical characteristics. To evaluate the clinical feature distribution in the two subtypes, we found no significant difference in the survival status, M stage, N stage, age, or HPV16 status (Fig. 3C–G), but significant differences in sex, T stage, stage, and tumor grade were detected (Fig. 3H–K). These results suggest that m6A signatures are closely related to clinical characteristics. The detailed clinical information is illustrated in Supplementary Table 1.
Identification of m6A-associated DEGs
In this analysis, we identified 945 DEGs between C1 and C2, which included 242 upregulated genes and 703 downregulated genes, as shown in the volcano plot (Fig. 4A). Furthermore, we performed unsupervised consensus clustering of the DEGs between the C1 and C2 subgroups and identified 3 clusters (cluster A, cluster B, and cluster C). C1 was mainly located in cluster A (Fig. 4B). According to the KM curve, cluster A presented the worst prognosis, whereas cluster C exhibited the best prognosis according to 15 years follow data (Fig. 4C). The adjusted pairwise comparisons demonstrated no significant differences between clusters C and B and between clusters A and C. Cluster B showed a shorter overall survival time than cluster A (p = 0.0066) (Supplementary Fig. 3D–F). In addition, a gene set variation analysis (GSVA) revealed that Clusters A, B, and C were enriched in significantly different pathways (Fig. 4D). Cluster A was mainly enriched in the retinal metabolic process, peptidyl methionine modification, and isoprenoid biosynthetic process categories. Cluster B was mainly enriched in natural killer cell activation involved in the immune response, peptide cross linking, Golgi ribbon formation, desmosome organization, T helper 2 cell cytokine production, positive regulation of ruffle assembly, epidermal cell differentiation, keratinization, cornification and regulation of water loss via skin. Cluster C was enriched in cell migration in the hindbrain, positive regulation of endothelial cell chemotaxis, synaptic vesicle clustering, cardiac lift ventricle morphogenesis, keratan sulfate catabolic process, regulation of developmental pigmentation and regulation of endothelial cell chemotaxis (Fig. 4D). To detect the m6A regulator expression levels in different clusters, we examined the distribution of these 20 genes in C1 and C2 (Supplementary Fig. 4A). We also examined the distribution of these 20 genes in clusters A, B, and C (Supplementary Fig. 4B).
Construction and validation of an m6A regulator signature-based prognostic model
In detail, to better construct the prognostic model of m6A regulators, we applied Cox regression and lead absolute shrinkage and selection operator (LASSO) regression analysis and described the changing trajectory of each independent variable. The number of independent variable coefficients approaching zero increased with decrease in lambda (Fig. 5A). The confidence interval (CI) for each lambda was examined, as shown in Fig. 5B. The optimal model was acquired with a lambda value equal to 0.0173. We adopted the step AIC function from the MASS R package and identified 5 genes, namely, CYP26A1, KIF13B, CPVL, FCGBP, and MB. The 5-gene signature-based model was constructed according to the following equation: RiskScore = 0.178*CYP26A10.446*KIF13B + 0.305*CPVL0.238*FCGBP + 0.087*MB. The risk score landscape of each sample is illustrated in Fig. 5C. The AUC at 1, 3, and 5 years suggested that this 5-gene model presented good prognostic prediction efficacy (Fig. 5D). We calculated the risk score of each sample according to the expression level of the TCGA training dataset samples and obtained the risk scores of the samples. A Z score analysis of risk scores was then performed to categorize samples with scores > 0 into the high-risk group and those with scores < 0 into the low-risk group. The high-risk-score group presented a poorer prognosis than the low-risk-score group (uncorrected p = 0.001, HR = 2.11, 95% CI 1.61–2.76) (Fig. 5E). The TCGA gene profile was utilized as the validation dataset. The risk score distribution landscape and the AUC indicated a good and stable prediction efficiency (Fig. 6A,B). Consistent with the training set, the high-risk score subgroup showed a poor predicted prognosis (uncorrected p = 0.0026, HR = 1.66, 95% CI 1.29–2.13, Fig. 6C). To better evaluate the efficiency of the 5-gene model in the TCGA all-sample set, we calculated the risk score of each sample and the AUCs at 1, 3, and 5 years (Fig. 6D,E). The high-risk-score group presented a shorter survival time than the low-risk-score subgroup (p < 0.0001, HR = 1.84, 95% CI 1.53–2.22, Fig. 6F). The results agreed well with previous findings. Based on the previous 5-gene model, we also constructed an m6A regulator-associated gene model, and based on the expression of these m6A regulators, we classified HNSC patients into high- and low-risk score subgroups. The m6A score model efficiency was evaluated based on the TCGA training set and validated based on the TCGA validation set. A total of 424 samples from TCGA-HNSC were classified into a 212-sample training set and a 212-sample validation set. The detailed clinical information of the high- and low-m6A-score subgroups is illustrated in Supplementary Table 1. The high-m6A-score group presented a poorer prognosis than the low-m6A-score group (Supplementary Fig. 5A–D).
The high-m6A-score group presented distinct immune-associated signatures. We calculated the infiltration levels of 22 immune cell types in the two m6A score-based groups, and the results demonstrated that 8 (36.36%) immune cell types showed a significant difference (Fig. 7A). We also performed a single-sample gene set enrichment analysis (ssGSEA) to calculate the IFNγ score of each patient, and significant differences in IFNγ scores were observed between the high- and low-score groups (Fig. 7B). To evaluate the differences in chemokine gene expression between the two groups, we calculated the chemokine gene expression levels and found that 12 chemokines showed significantly differential expression between the two groups (Fig. 7C). We also calculated the chemokine receptor gene expression levels and found that the expression levels of 6 of 18 (33.33%) genes were markedly different (Fig. 7D). Furthermore, we obtained 47 immune checkpoint genes and found that 14 (29.79%) immune checkpoint genes presented low expression levels in the high-m6A-score group (Fig. 7E). To investigate the correlation between m6A regulators and the hypoxia score, we performed GSEA and calculated the hypoxia score of each regulator. The results demonstrated that METTL3 (p < 0.05), ZC3H13 (p < 0.001), ALKBH5 (p < 0.001), FTO (p < 0.001), YTHDF1 (p < 0.001), YTHD3 (p < 0.001), HNRNPA2B1 (p < 0.01), FMR1 (p < 0.01), and ELAVL1 (p < 0.01) were significantly correlated with m6A regulators (Supplementary Fig. 6A) (the p value in this section is uncorrected). The m6A regulator-based cluster C had a markedly higher hypoxia score than cluster A (uncorrected p = 0.001). The high-m6A-score subgroup exhibited a higher hypoxia score than the low-m6A-score subgroup. The hypoxia score did not significantly differ between C1 and C2 (Supplementary Fig. 6B). We performed univariate and multivariate Cox regression analyses to evaluate the independence of the m6A score signature model in terms of clinical application. The univariate Cox regression analysis showed that the m6A score was significantly correlated with survival, and the corresponding multivariate Cox regression analysis revealed that the m6A score-based group (uncorrected p < 0.05, HR = 1.76, 95% CI 1.23–2.53) was significantly correlated with survival (Fig. 8A,B). These outcomes revealed that the m6A score model can exhibit good prognostic prediction value in the clinic.
Biological functional enrichment in the m6A score model
To explore the relationship between the m6A score and associated biological functions, we performed ssGSEA and determined that five functional pathways were negatively correlated with the m6A score and that 10 pathways were positively correlated with the sample m6A score. In addition, a Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was performed in this study to investigate the enriched signaling pathways10. The results from the KEGG enrichment-based cluster analysis of the pathways are illustrated in Fig. 8C. The KEGG pathways cardiac muscle construction, glycerophospholipid metabolism and olfactory transduction showed increased enrichment in the high-m6A-score group (Fig. 8D).
Discussion
Epigenetic modifications have emerged as abundant modifications throughout the transcriptome and play very important roles in genome evolution and innovation
LASSO regression analysis
The m6A regulator-based signatures might have important prognostic value in HNSCC. First, two molecular subtypes were constructed based on the consensus clustering of m6A-related genes, and the differentially expressed genes related to the m6A subtype were screened by differential analysis. We applied LASSO regression analysis to build prediction models based on the training dataset and validated their predictive effectiveness using the validation cohort. The genes associated with prognosis were screened by univariate Cox analysis of the initial input of 27 genes, namely, CYP26A1, KIF13B, CPVL, CMYA5, FCGBP, XIRP2, NEB, CASQ1, MYH2, NRAP, MYH7, LMOD2, ACTN2, MYOT, HSPB7, APOBEC2, MYL1, MYH1, ACTA1, MB, MYOZ1, CSRP3, MYL2, COX6A2, SMPX, CKM and SLN. Then, LASSO analysis and multivariate Cox analysis were applied to construct a risk-score based prognoistic model. Among these differentially expressed genes, five key genes were screened by univariate Cox analysis.
The glmnet R package was used for LASSO Cox variable selection and model building. The survival R package was used for the comparison of survival curves. Survival analysis based on the Kaplan–Meier (KM) curve was applied to evaluate prognosis.
Statistics
Statistical analyses and data plotting were performed using the R program (3.6.2)29. Fisher’s exact and equal-variance t tests were used in the group comparisons of categorical and continuous variables, respectively. Spearman’s correlation analysis test was applied to evaluate the correlation relationships of different cancer types. p ≤ 0.05 indicates statistical significance.