1 Introduction

Although skin cutaneous melanoma (SKCM) accounts for only a small percentage of skin cancers, it is responsible for more than 75% of skin cancer deaths with an increasing incidence [1, 2].

Fibrosis is the result of a tissue repair response and is a common pathological feature of chronic inflammatory diseases. As a normal and important stage of organ tissue repair, fibrosis is characterized by excessive accumulation of the extracellular matrix (ECM). When tissues are injured, local tissue fibroblasts are activated with increased contractility, secreting inflammatory mediators and ECM components to initiate the wound healing response [3].

The relationship between fibrosis and cancer has been demonstrated in a variety of cancers [4,5,6,7,8,9], where it induces and promotes the development of cancer through a "non-healing wound" like mechanism, both activated tumor stroma and fibrosis are characterized by large numbers of activated fibroblasts and increased deposition of extracellular matrix [7]. In addition, chemotherapies and radiotherapies can induce the formation of a fibrotic-like microenvironment [10].

In studies related to melanoma, co-morbidity between cancer and the fibrotic disease scleroderma is very frequent, and the incidence of melanoma is 3.3 times higher in patients diagnosed with scleroderma than in the general population [7]. When using therapies combining BRAF and mitogen-activated protein kinase (MEK) inhibitors for metastatic melanoma, BRAF-mutant melanomas will develop a paradoxical fibro-mechanic reprogramming that produces a therapy-induced fibrotic-like phenotype, giving cancer cells the cell-autonomous ability to resist treatment and escape the challenging tumor microenvironment, resulting in drug resistance [10].

In recent years, there have been many studies on model construction based on immune-related genes [11], metastasis-related genes [12], and chemokine members [13] in the field of melanoma, and Bagaev et al. constructed conserved pan-cancer tumor microenvironment (TME) subtypes containing fibrosis information based on melanoma [14]. However, studies on the fibrotic gene signature of melanoma have not been reported, and prognostic models of melanoma based on fibrotic signature genes are still lacking. Therefore, in this study, we constructed the melanoma fibrotic gene signature for the first time and constructed and validated an 8-gene prognostic model based on the fibrotic signature genes to explore the potential value of fibrotic-related genes in melanoma.

2 Materials and methods

2.1 Data Source and Preprocessing

The RNA-seq counts data and clinical information of patients with SKCM from The Cancer Genome Atlas (TCGA) were downloaded from the TCGA database, then 1) convert Ensemble IDs to gene symbols, 2) take the median expression of genes with multiple gene symbols, 3) remove non-protein-coding genes, and 4) retain the primary and metastatic melanoma samples.

The GSE65904 cohort, which is a microarray that contains gene expression profiles and clinical follow-up information of metastatic melanoma samples, was downloaded from the Gene Expression Omnibus (GEO) database, the downloaded data is in a normalized format. Then 1) convert Probe IDs to gene symbols, 2) take the median expression of genes with multiple gene symbols, 3) remove non-protein-coding genes.

The fibrotic information of the samples was obtained from the TME subtype in the study of Bagaev et al. [14].

2.2 WGCNA and Construction of Fibrotic Gene Signature

After preprocessing, we normalized the gene expression data, calculated the median absolute deviation, and selected the top 5000 genes, and only those samples with both TME subtype information and expression data were retained. Then we used the “goodSamplesGenes” function in the R package “WGCNA” for a further filter, and outlier samples were removed according to the sample clustering results. After the above steps were completed, 462 samples in TCGA-SKCM and 209 samples in the GSE65904 cohort were used in the follow-up study.

Firstly, we used “pickSoftThreshold” function to selected suitable soft thresholds, which was 3 in both the TCGA and GSE65904 cohort, to construct a gene co-expression scale-free network and calculated the adjacency matrix and topological overlap matrix (TOM). Additionally, we performed hierarchical clustering of modules with at least 30 genes per module (minModuleSize = 30). Subsequently, the dynamic tree cut algorithm was used to identify similar modules, and those with similarities greater than 0.75 were merged (abline = 0.25). Finally, we calculated correlations between modules and the fibrotic trait to identify important modules and extract genes within modules. The intersection between extracted genes of TCGA-SKCM and GSE65904 were defined as fibrotic gene signature (FGS).

2.3 Construction of the Prognostic Model

Firstly, we defined 0.01 as the p value to filter genes when employing the univariate Cox proportional hazard regression model to identify prognostic genes in the FGS gene set. Subsequently, the least absolute shrinkage and selection operator (LASSO) regression analysis was used to narrow the range and five-fold cross-validation was used to build the model. Finally, the Akaike information criterion (AIC) method was employed to conduct stepwise regression, by which the multivariate COX model obtains an adequate degree of fit with fewer genes.

2.4 Functional Enrichment Analysis

The gene ontology (GO) functional annotation and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis were performed using the R package “clusterProfiler” (v4.0.5) to analyze the functional enrichment of the gene set in FGS.

2.5 Analysis of Tumor Microenvironment and Signatures

R package “IOBR” was used in several parts of our study. We use the function “calculate_sig_score” to evaluate the fibrotic gene signature enrichment score (FGES) of samples, TME deconvolution methods to assess the immune infiltration, and function “iobr_cor_plot” to calculate the expression of signatures or genes and draw the plots [15].

2.6 Statistical Analysis

All statistical analyses were performed using the R software version 4.1.0.

2.7 Technical Route and Flow Chart

The overall analysis process and flow chart of the experiment are shown in Fig. S8.

3 Results

3.1 Construction, Analysis, and Validation of Fibrotic Gene Signature

In the analysis of the TCGA-SKCM data, we selected 1234 genes in the blue module (Fig. S1A), while 2037 genes in the blue and green modules were co-selected in the WGCNA results of GSE65904 (Fig. S1B), additional WGCNA results are shown in Fig. S2, Tables S1 and S2. After selecting the intersection, 301 genes were used to construct the fibrotic gene signature (FGS) (Table S3).

The GO analysis was divided into the biological process (BP), cell component (CC), and molecular function (MF). The results showed that the genes in the BP group showed a significant relationship with extracellular matrix and cell-substrate adhesion. In addition, cell components associated with collagen and organelle lumen were enriched in multiple categories. Moreover, MF group genes were mainly related to extracellular matrix structural constituent and multiple binding functions (Fig. 1A). Notably, the results of KEGG analysis showed that the mainly enriched pathways (Fig. 1B), such as ECM-receptor interaction, focal adhesion, and signaling pathways like PI3K-Akt, PPAR, and Wnt, were tumor-related pathways.

Fig. 1
figure 1

Functional enrichment analysis of genes in FGS. A GO enrichment analysis. B KEGG enrichment analysis

In order to quantify the enrichment degree of the FGS, we calculated the signature enrichment score for each sample to represent the degree to which the genes in the gene set are coordinately regulated. Samples without follow-up information were removed, with 454 samples in TCGA and 210 samples in GSE65904 were selected. The remaining samples were divided into two groups by the cutpoint in the Kaplan–Meier survival analysis, KM survival curves are shown in Figs. 2A and  3A. Besides, the receiver operating characteristic (ROC) curves indicated that the FGES could predict TME subtype with approximately over 85% accuracy (Figs. 2B and 3B, Fig. S3), and the violin plot further verified the significant difference between FGES of TME subtype (Figs. 2C and 3C).

Fig. 2
figure 2

Effectiveness of FGES and immune infiltration analysis of FGES groups in TCGA-SKCM. A KM survival curve. B ROC curve of the classification effectiveness of FGES to TME subtype. C Violin plot showed the difference of FGES between two TME subtypes. D Comparison of ESTIMATE scores between two FGES groups. E Comparison of MCPcounter scores between two FGES groups. F Heatmap for the comparison of TME deconvolution scores between two FGES groups (NS, no significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001)

Fig. 3
figure 3

Effectiveness of FGES and immune infiltration analysis of FGES groups in GSE65904. A KM survival curve. B ROC curve of the classification effectiveness of FGES to TME subtype. C Violin plot showed the difference of FGES between two TME subtypes. D Comparison of ESTIMATE scores between two FGES groups. E Comparison of MCPcounter scores between two FGES groups. F Heatmap for the comparison of TME deconvolution scores between two FGES groups (NS, no significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001)

3.2 Differences in Immune Infiltration Between FGES Groups

ESTIMATE [16] and MCPcounter [17] are commonly used deconvolution methods to analyze immune infiltration. The ESTIMATE results showed that the group with higher FGES has a higher stromal score, ESTIMATE score, and lower tumor purity (Fig. 2D), which showed the same trend in GSE65904 (Fig. 3D). However, the MCPcounter displayed in Figs. 2E and 3E only showed significant differences in neutrophils, endothelial cells, and fibroblasts. Figures 2F and 3F combined these results and presented them as a heatmap.

3.3 Construction of a Prognostic Model Based on FGS Genes

The TCGA cohort was used to construct the prognostic model. After univariate Cox proportional hazard regression, 16 genes in the FGS gene set were obtained as the prognostic hub genes. Then we used LASSO regression analysis and five-fold cross-validation, the coefficient trajectory of each gene and the confidence interval under each lambda are shown in Fig. 4A separately. According to the results, we selected 13 genes as candidates with the optimal model at lambda = 0.01595426 (log(lambda) = -4.138) (Fig. 4B). Finally, as a result of multivariate COX analysis, eight genes were selected to construct the model to calculate the fibrotic gene signature risk score (FGRS):

Fig. 4
figure 4

LASSO regression analysis and prediction effect in the TCGA-SKCM cohort. A The changing trajectory of each independent variable, the abscissa represents log value of lambda, and the ordinate represents the coefficient of the independent variable. B The log value lambda, the abscissa represents the confidence interval of each lambda, and the ordinate represents errors in cross validation. C KM survival curve of two risk groups. D ROC curve of the prognostic model. E FGRS, survival status and time, and expression trend of eight genes

$$\boldsymbol{FGRS}=-\bf{0.079}\times\boldsymbol{SV2A}+\bf{0.114}\times\boldsymbol{HEYL}+\bf{0.170}\times\boldsymbol{OLFML2A}-\bf{0.101}\times\boldsymbol{PROX1}-\bf{0.082}-\boldsymbol{ACOX2}-\bf{0.071}\times\boldsymbol{PRRX1}+\bf{0.075}\times\boldsymbol{PHACTR1}+\bf{0.104}\times\boldsymbol{LHX6}$$

According to the model, samples were divided into two risk groups by the median of the FGRS, the KM curve presented a significant difference in survival between the high-risk group (HRG) and low-risk group (LRG) (Fig. 4C), Survival analyses for each gene are shown in Fig. S4. Further, we used the R package “timeROC” to analyze the prognostic efficiency of the FGRS, the prediction performance of our model was represented by the area under the curves (AUCs), which reached 0.695 (1 year), 0.653 (3 years), and 0.686 (5 years) (Fig. 4D). In addition, as the risk score increased, the changing trend of gene expression levels was correlated with their interpretable coefficients, which means, specifically, that genes with positive coefficient had an upward trend while those with negative coefficient showed a downward performance (Fig. 4E).

3.4 External Validation of the Robustness of the Model

To determine the robustness of our model, GSE65904 was introduced as an independent external cohort for validation. Each sample in GSE65904 obtained a FGRS through the same formula operated in the TCGA cohort and categorized into a risk group according to its FGRS. Like is in the TCGA cohort, the high-risk group had a significantly lower survival probability (Fig. S5A), and the AUCs of 1-year, 3-year and 5-year survival were 0.673, 0.649, and 0.611 respectively (Fig. S5B). Besides, a similar trend was observed for gene expression (Fig. S5C).

3.5 Correlation Between FGRS and Clinical Characteristics

To investigate the correlation between FGRS and clinical characteristics, we studied the following clinical features in the TCGA-SKCM cohort: age, gender, stage, and Clark level (Table 1). The results showed that there was no difference between FGRS of male and female (Fig. 5A), and no significant relationship was discovered between FGRS and stage (Fig. 5B). Interestingly, older patients (> 65 years old) had a higher FGRS (Fig. 5C), and patients in Clark level V were more likely to receive a high score (Fig. 5D). However, due to a large number of missing values in the Clark level information, the results may not always be reliable.

Table 1 Clinical features in the TCGA-SKCM cohort
Fig. 5
figure 5

Correlation between FGRS and clinical characteristics, and clinical value of the model. A Correlation diagram between FGRS and gender. B Correlation diagram between FGRS and stage. C Correlation diagram between FGRS and age. D Correlation diagram between FGRS and Clark level. E Nomogram, the red element shows the points and the probability of 1-, 3-, or 5-year survival of a sample in the TCGA cohort. D Calibration curves, demonstrate the relationship between nomogram-predicted 1-, 3-, and 5-year OS and actual survival

3.6 Construction and Evaluation of Nomograms Comprising FGRS

The univariate and multivariate Cox regression analysis were used to evaluate the independence of our model in clinical application. The results revealed that age, Clark level, stage, and FGRS were significant and independent risk factors for prognosis, while FGRS in multivariate Cox regression analysis showed a strong significance (P < 0.001, HR = 2.15, Table 2), indicating that our eight-gene prognostic model had a good capability in clinical prediction. To further improve the clinical significance of our study, we combined various risk factors based on the results of multivariate Cox regression analysis to construct a nomogram that is more comprehensive and intuitive in clinical decision-making (Fig. 5E). The consistencies between the nomogram-predicted overall survival and actual survival are shown in Fig. 5F by calibration curves, the results indicated that the nomogram had a good calibrating ability.

Table 2 Univariate and multivariate analyses of risk factors in TCGA-SKCM cohort

3.7 Differences in Immune Infiltration Between FGRS Groups

The differences in overall immune infiltration were assessed by ESTIMATE and MCPcounter. Different from the results represented in the FGES group, almost all of the immune infiltrating cells in the MCP results were scored higher in the low-risk group, both in the TCGA and GSE65904 cohort (Fig. S6A and S7A), which indicated that the degree of inflammatory response and immune response was more active in the low-risk group. Meanwhile, the ESTIMATE results of cohort GSE65904 showed that immune score and ESTIMATE score were lower in the high-risk group, and tumor purity of the high-risk group is higher (Fig. S7B). However, this result was not that significant in the TCGA cohort (Fig. S6B).

The difference in immune infiltration between the two groups indicated that our model might be helpful for predicting the immunotherapeutic responses. Therefore, we used the R package “IOBR” to calculate and demonstrate the differences in expression of some immune checkpoint genes between the two FGRS groups. Not surprisingly, almost all of the immune checkpoint genes we tested were upregulated in the low-risk group (Fig. 6A and B), which predicted that patients in the low-risk group may benefit more from immunotherapy.

Fig. 6
figure 6

Comparison of the expression differences of immune checkpoint genes between two risk groups. A Expression of immune checkpoint genes in TCGA. B Expression of immune checkpoint genes in GSE65904

4 Discussion

Based on the growing number of studies focused on the relationship between fibrosis and cancer, the frequent co-morbidity between melanoma and fibrotic disease, and the lack of research on fibrotic gene signature in melanoma, we did the study to fill the gap.

Orthogonal partial least squares discrimination analysis(OPLS-DA) is a commonly used method to screen for differential metabolites [18,19,20], and it was used to screen for survival-associated differential Long non-coding RNAs (lncRNAs) in the study by Cao et al.[21]. Based on the fact that there is no current concept of gene sets for fibrosis-related traits, in our study we chose the more appropriate approach, weighted gene co-expression network analysis(WGCNA) [22], to screen gene sets as fibrotic gene signature(FGS) when identifying gene sets that vary highly synergistically with the target trait, i.e., fibrosis. We used the information provided by the previous research on melanoma TME subtype, then the intersect hub 301 genes we screened from the TCGA cohort and GEO cohort were selected and defined as fibrotic gene signature. To further analyze the functional enrichment of genes in FGS, we did functional enrichment analysis, the enrichment of ECM-receptor interaction, cell-substrate adhesion, focal adhesion, and signaling pathway like PI3K-Akt, PPAR, and Wnt suggested that the genes in FGS involved in the process of tumor shedding, adhesion, degradation, movement, and hyperplasia.

The fibrotic gene signature enrichment score was used to quantify the degree of FGS enrichment in the samples, based on the results that FGES can be used as an effective classifier for TME subtype as the ROC curve demonstrated, KM survival analysis suggested that samples with high FGES had a worse prognosis, further proved that the degree of tissue fibrosis correlates with poor patient prognosis as described in the study by Piersma B et al. [23]. In the results of the immune infiltration analysis, the group with higher FGES has a higher stromal score and lower tumor purity, suggesting a higher degree of stromal cell infiltration in this group of samples, which is thought to have an important role in tumor growth, disease progression and drug resistance [24,25,26]. In addition, in several solid tumors, the high expression level of neutrophils has been associated with detrimental outcomes [17], matching the results of MCPcounter and OS curves.

Based on the gene set of FGS, univariate Cox regression identified 16 prognostic genes in the TCGA-SKCM cohort, and then we used the LASSO algorithm for dimensionality reduction to select 13 candidates, followed by stepwise regression using the AIC method to finally construct an 8-gene prognostic model. In selecting the method of gene reduction, Cao et al. [21] used a 1.5-fold change as the threshold for screening target lncRNAs in differential lncRNAs to narrow down the number of lncRNAs for constructing the prediction panel. In our study, considering the possible multicollinearity in the FGS gene set, we used the least absolute shrinkage and selection operator (LASSO) regression analysis to compress the estimated parameters before constructing the model. In the validation of model validity, there was a significant survival difference between risk groups classified by FGRS, and the predictive performance of the model was consistently higher than 0.65 at 1 year, 3 years, and 5 years. In addition, we analyzed the GSE65904 cohort and the results showed that the 8-gene model has convincing validity and stability.

After the construction of the prognostic model, we analyzed the correlation between FGRS and clinical traits. It was suggested that FGRS was associated with the age and Clark level of the patients. The FGRS was higher in patients over 65 years old, indicating that our model can identify risk groups by age to some extent. Clark level is a classification system to describe the level of invasion in human cutaneous melanomas. Clark I-IV indicates that tumors located in the skin layer(Clark I represents carcinoma in situ), and tumors invaded the subcutaneous layer are defined as Clark V [27]. Our study showed that patients in Clark level V were more likely to get a high score, but because of the missing values of Clark level, the universality of the results is still debatable. Subsequently, the univariate and multivariate Cox regression analysis suggested that FGRS was just one of the independent risk factors for prognosis, so we combined all the risk factors to construct a nomogram with well-calibrating ability.

Finally, we assessed the overall immune infiltration of two FGRS groups. The differences in immune infiltration landscape between HRG and LRG showed that there was a suppressed immune microenvironment of HRG. Previous studies demonstrated that immune infiltration of the tumor microenvironment can be used to assess the prognostic of multiple tumors, such as glioblastoma [28], breast cancer [29], and melanoma [30], which may mechanistically explain the poor prognosis of patients in HRG.

Immune checkpoint inhibitors have provided durable responses and improved survival in the treatment of patients with melanoma [31,32,33] and many other solid tumors [34,35,36].

To further analyze the potential of our model for immunotherapy response prediction, we compared the differences in expression of immune checkpoint genes between the two groups. Interestingly, almost all immune checkpoint genes were upregulated in LRG, suggesting that this group may be more likely to benefit from immune checkpoint inhibitor therapy like anti-CTLA-4 and anti-PD-1 or anti-PD-L1 therapies.

The current study does have some limitations. First, we did not score the degree of fibrosis of the tissue to validate the relationship between FGES and the degree of fibrosis, and second, we were unable to validate the model in a larger cohort due to the lack of data with a sufficient sample size with clinical information.