Background

Ranking seventh in cancer incidence and sixth in mortality worldwide, esophageal carcinoma is highly aggressive and its patients have poor outcomes, with a 5-year survival rate lower than 20% [1, 2]. Esophageal cancer comprises two major histologic subtypes: squamous cell carcinoma (ESCC) and adenocarcinoma (EAC). These two subtypes have distinct clinical characteristics. ESCC occurs predominantly in the upper and mid-esophagus; EAC is prevalent in the lower esophagus near the gastroesophageal junction (GEJ) and is associated with the precursor lesion known as Barrett’s esophagus (BE). Biologically, ESCC arises from the squamous epithelial cells and has common features with other squamous cell carcinomas (SCC), such as head and neck SCC (HNSCC). In comparison, EAC has columnar cell features and shares many characteristics with tubular gastrointestinal adenocarcinomas. In particular, EAC is almost indistinguishable from GEJ adenocarcinoma in terms of genomic, biological and clinical features.

Epigenetically, multiple studies have reported molecular changes in esophageal cancer, especially at the DNA methylation level [3,4,5,25]. We chose esophageal cancer as the disease model considering that the two subtypes are developed from distinct cell-of-origins, and we hypothesized that characterization of their methylome profiles might reveal cell-type- and cancer-specific methylation changes, together with underlying epigenetic mechanisms.

Fig. 1
figure 1

Identification of PMDs in esophageal samples by a sequence-aware multi-model PMD caller (MMSeekR). A A graphic model of the present study design. B Dot plots showing average methylation levels for all CpGs across the whole genome, CpGs within CGI promoters, common PMDs, SINE, LINE, and LTR in different samples. The annotations from Takai et al. [26] were used for CGI methylation quantification. C Development of a new PMD caller. The MethylSeekR α score measures the distribution of methylation levels in sliding windows with 201 consecutive CpGs across the genome. α score < 1 corresponds to a polarized distribution towards a high or low methylation level (that is, HMDs), while α score ≥ 1 corresponds to the distribution towards intermediate methylation levels (that is, PMDs). PCC shows the correlation between the predicted hypomethylation score based on a NN model, and the actual methylation level. A strong negative correlation indicates regions favoring PMDs, while weak/null correlation favors HMDs. D PCA analysis of 45 esophageal samples using the top 5000 most variable 30-kb tiles for the three PMD callers. E, F Representative windows showing PMDs successfully identified by MMSeekR but failed to be detected by either MethPipe (E) or MethylSeekR (F)

Results

Development of a novel sequence-aware calling method to identify PMDs

To characterize the esophageal cancer methylome, we analyzed WGBS profiles of 45 esophageal samples from two different cancer subtypes and their corresponding nonmalignant tissues [27] (Fig. 1A, Additional file 1: Fig. S1A). All of the nonmalignant esophageal squamous (NESQ) tissues showed high inter-sample correlation despite that they were from two different cohorts (Additional file 1: Fig. S1B and Additional file 2: Table S1). To analyze the overall methylation pattern, we first investigated the methylation level at various genomic domains (Fig. 1B). As anticipated, both global hypomethylation (especially in common PMDs, defined as shared PMDs identified from 40 different cancer types [19]), and CGI promoter hypermethylation were observed in tumor samples. EAC tumors harbored notably higher methylation levels in CGI promoters than ESCC tumors, in line with TCGA results showing that gastrointestinal adenocarcinoma had higher frequency of CGI hypermethylation than cancers from most other tissues [28]. Interestingly, most NGEJ tissues showed higher CGI promoter methylation levels than NESQ tissues, and usually even higher than ESCC tumor samples. Similar to EAC, BE samples (a recognized precursor lesion of EAC) were reported to have a hypermethylation pattern at CGI promoters [7]. Since our NGEJ tissues were pathologically confirmed as inflammatory tissues but devoid of apparent BE, this result suggests that CGI hypermethylation may occur in inflamed GEJ. Interestingly, CGI hypermethylation has been observed in long-term-cultured colon organoids and cells upon prolonged exposure to cigarette smoke extract [39]. B Different PMD categories were identified based on the frequency and overlap between the two esophageal cancer types. C Line plots showing average methylation levels for different PMD categories in esophageal tumors, where each line represents one sample. D Similar line plot patterns were observed using TCGA methylation datasets, showing the mean and standard deviation across samples. Each row in the heatmap below shows an individual sample. E Bar plots showing the percentage of WGBS PMDs overlap** with chromatin B compartments, which were predicted using TCGA methylation datasets and analyzed by minfi package. Methylation datasets in D and E are from the TCGA ESCA HM450k arrays, including 91 ESCC and 75 EAC samples. F Somatic mutation rates based on WGS in the indicated studies, calculated separately for each of the WGBS PMD categories. EAC WGS datasets: 276 samples; ESCC WGS datasets: 508 samples