Background

Epithelial-mesenchymal transition (EMT) is known to promote cellular plasticity during the formation of the mesoderm from epiblasts and the neural crest cells from the neural tube in the develo** embryo as well as during adult wound healing [1]. During EMT, epithelial cells lose their epithelial characteristics and acquire mesenchymal morphology, which facilitates cellular dissociation and migration. Similar to embryo development, neoplastic cells have been shown to reactivate EMT leading to cancer metastasis [2]. Induction of EMT is also involved in the development of resistance to cytotoxic chemotherapy and targeted agents [35]. In addition, EMT imparts stem cell properties to differentiated cells [6]. Since cancer cells seem to acquire stem cell properties dynamically in response to the tumor microenvironment and become differentiated at distant sites, it has been suggested that major epigenetic remodeling would occur during EMT to facilitate metastasis. Although DNA methylation changes at specific loci have been established during EMT [7, 8], changes in the global DNA methylation landscape are not well understood. Indeed, a recent report demonstrated that DNA methylation is largely unchanged during EMT mediated by transforming growth factor beta (TGF-β) [9], while another showed that EMT is associated with specific alterations of gene-related CpG-rich regions [10]. Moreover, another report showed a striking difference in DNA methylation in non-small cell lung cancers between mesenchymal-like tumors and epithelial-like tumors, which display a better prognosis and exhibit greater sensitivity to inhibitors of epidermal growth factor receptor [11].

In addition to DNA methylation, EMT mediates epigenetic reprogramming through widespread changes in post-translational modifications of histones [12]. However, it is unknown if switches in histone marks coordinate EMT and, in particular, whether genome regulation by Polycomb group (PcG) and Trithorax group (TrxG) proteins are critical regulators for this transition, as is the case for germ cell development and stem cell differentiation. Indeed, the TrxG complex activates gene transcription by inducing trimethylation of lysine 4 of histone H3 (H3K4me3) at specific sites, whereas the PcG complex represses gene transcription by trimethylation of lysine 27 on histone H3 (H3K27me3). Of note, a subset of promoters in embryonic stem cells are known to have methylation at both H3K4 and H3K27 (the bivalent state), which poise them for either activation or repression in different cell types upon differentiation [19] and neuronal cells [20], as well as in several cancer types [15]. PMDs overlap with domains of H3K27me3 and/or H3K9me3, transcriptional-repression associated histone marks, in IMR90 fibroblasts [14]. In breast cancer, widespread DNA hypomethylation occurs primarily at PMDs in normal breast cells [21]. However, whether DNA methylome changes during EMT recapitulate tumor formation remains unknown.

EMT is often a transient process, with changes in gene expression, increased invasiveness, and acquisition of stem cell properties such as increased tumor initiation, metastasis and chemotherapeutic-resistance. It’s transient nature suggests that significant features of an EMT could be regulated by epigenetic fluidity triggered by key transcription factors and signaling events in response to an alteration in the tumor microenvironment. We present genome-wide changes in DNA methylation and histone modifications in H3K4me3 and H3K27me3 following the induction of EMT by the ectopic expression of the transcription factor Twist1 using immortalized human mammary epithelial cells (HMLE) [2]. Additionally, we compared the Twist1-expressing HMLE cells, hereafter HMLE Twist, cultured in a monolayer to the same cells cultured as mammospheres (MS), which enriches for cells with stem cell properties [22]. We found that EMT is characterized by major epigenetic reprogramming required for phenotypic plasticity, with predominant alterations to polycomb targets. Moreover, we have shown that inhibition of the H3K27 methyltransferases EZH2 and EZH1 - part of the polycomb repressive complex 2 (PRC2) - either by short hairpin RNA (shRNA) or pharmacologically, blocks EMT and stemness properties.

Results

Aberrant promoter DNA methylation induced by epithelial-mesenchymal transition is cell-type specific and regionally coordinated

According to the EMT model of cancer progression, epithelial cells undergo a phenotypic change during the sequential progression of primary tumors towards metastasis, accompanied or not by DNA methylation changes [9, 10]. Although aberrant promoter methylation on some specific promoters was previously reported [23], genomic distribution and genome-wide map** of methylome changes during this process remains unclear. To identify DNA methylation changes in EMT, we used digital restriction enzyme analysis of methylation (DREAM), which yields highly quantitative genome-wide DNA methylation information [24]. Because small changes in DNA methylation could be important, we focused the analysis on sites with a threshold of 100-fold coverage per sample (average = 1,178 tags per CpG site). By examining approximately 30,000 CpG sites spanning the promoters of around 5,000 genes (Table S1 in Additional file 1), we observed the expected relationships: lower DNA methylation at CpG islands (CGI) compared to non-CGI; lowest methylation around the transcription start site (TSS) (Figure S1A in Additional file 2); and a strong negative-correlation between promoter DNA methylation and gene expression (Spearman’s R < −0.50, P <0.0001), independent of Twist1 expression. Interestingly, the quantitative nature of the data allowed us to establish that genes with completely unmethylated promoters (methylation ≤1%) were highly expressed in comparison to promoters with an appreciable level of methylation (>1%; Figure 1A). As there is an important overlap between PMD regions in different tissues [20, 25], we analyzed gene expression according to the localization of genes in PMDs. As expected, genes located within PMDs had lower baseline expression in our model, regardless of methylation (Figure 1A), and average gene body methylation was lower for CpG sites located within PMDs compared to those located outside PMDs (21.5% versus 40% respectively, P <0.0001; Figure S1B in Additional file 2). Thus, we conclude that even low levels of DNA methylation at promoters are inhibitory for gene expression and genes within PMDs tend towards lower expression.

Figure 1
figure 1

DNA Methylation changes occurring following epithelial-mesenchymal transition. (A) Box-plot for gene expression levels according to promoter DNA methylation level of genes located in partially methylated domains (PMDs) (red) and outside PMDs (blue) in human mammary epithelial cells (HMLE). The x-axis represents promoter % methylation and y-axis represents normalized expression level. *** P <0.0001, ** P <0.001, ns: non-significant. (B) Correlation of DNA methylation level of CpG sites in HMLE vector cells (Pw) (x-axis) and HMLE Twist cells (y-axis), showing dramatic changes in DNA methylation following EMT. (C) Correlation of DNA methylation level of CpG sites in HMLE vector cells transduced with two different ‘control’ vectors Pw (x-axis) and GFP (y-axis) showing no change in DNA methylation. (D) Box plots of average gene expression levels of genes with a gain, loss or no change in DNA methylation. Note that a gain of DNA methylation (increase to ≥2% DNA methylation from gene promoters which were fully unmethylated, ≤1%) was associated with 3.8-fold decrease of their expression levels, while demethylation of gene promoters leading to fully unmethylated promoters (≤1%) was associated with an increase in their gene expression levels by about 2-fold. (E) GSEA showing that genes losing gene body methylation following EMT are enriched for genes which are down-regulated in a CDH1-knockdown model of EMT (P <0.0001). The bottom graph represents the rank-ordered, non-redundant list of genes. Genes on the far right (blue) correlated most strongly with decreased gene expression in the CDH1-knockdown model of EMT. FDR, false discovery rate. (F) Box plots showing decreased expression levels of genes losing gene body methylation following Twist-induced EMT in two different models: knockdown of CDH1 and basal breast cancers compared to luminal breast cancers. y-axis represents the log2-fold change of gene expression.

Expression of Twist1 caused a dramatic change in DNA methylation both at CGIs and at non-CGIs (Figure 1B) whereas no changes were seen between cells with independent control vectors, suggesting that the methylation changes observed are related to Twist1 expression and not to random clonal drift (Figure 1C). To study the impact of these changes on gene expression, we focused on completely unmethylated genes (<1%) and identified 90 genes out of 3,008 (3%) that switched from <1% to >2% with an average gain of 5.4% DNA methylation. As expected, this was associated with about a four-fold decrease in the expression of these genes (P <0.0001; Figure 1D). The gain of methylation was higher in genes located within PMDs (12%; 37 out of 309) versus outside PMDs (2%; 53 out of 2,699; χ2 test, P <0.0001). Conversely, there were 39 genes that become unmethylated upon Twist1 expression, concomitant with around a two-fold increased expression of the respective genes (Figure 1D), such as FOXC2, a master regulator of EMT [2628]. In contrast with promoter methylation, promoter hypomethylation was more frequent outside PMDs (4.6%; 31 out of 670) than within PMDs (1.8%; 8 out of 455; P <0.02). Gene ontology (GO) analysis for genes with methylation change associated with gene expression change showed enrichment for cell adhesion genes such as DSCAM, NID1 and NID2 (P = 0.002), consistent with the functional change of motility and migration of mesenchymal cells. Moreover, we found an enrichment of genes (P = 5e-05) involved in calcium binding protein coding genes (that is, FBN1, NPNT), suggesting a functional role for orchestrated calcium-binding proteins in EMT that may represent a novel therapeutic target for controlling cell plasticity. Collectively, these data suggest that induction of EMT by Twist1 results in a moderate change in the DNA methylation of core promoters.

Twist1 promotes global demethylation outside of core promoters

To understand the global methylation and demethylation changes that occur in response to induction of EMT by Twist1, we focused on 4,903 CpG sites with a threshold detection of a minimum of 100 tags that had a baseline methylation ≥70%, as is typical of most of the genome [14]. Among these 4,903 CpG sites, one fifth (18.6%) lost DNA methylation following EMT (Table S2 in Additional file 1). We obtained comparable results using thresholds of 10 tags, and three tags per CpG site, covering 7,081 and 11,117 CpG sites respectively (data not shown). This widespread hypomethylation was mainly observed in PMDs (P <0.0001; Table S2 in Additional file 1) and was independent of the genomic CpG location in repeats and lamina-associated domains (Figure S2A-C in Additional file 3). Moreover, we found decreased methylation of repetitive elements at short interspersed nuclear elements, long interspersed nuclear elements and satellite repeats (Figure S2D in Additional file 3). Concomitant with global PMD demethylation, we also observed focal hypermethylation specific to those promoters (Figure S1C,D in Additional file 2), consistent with data recently reported in colon cancer [25]. These data suggest that methylome change during EMT is reminiscent of methylome changes observed in cancer.

To understand the functional relevance of gene body methylation changes following the induction of EMT by Twist1, we performed Gene Set Enrichment Analysis (GSEA). GSEA is a computational method that assesses whether a defined set of genes (herein, gene bodies) shows statistically significant difference between two conditions (herein, between epithelial and mesenchymal states) [29]. While there was no enrichment for any pathway associated with gain of gene body methylation, GSEA reveals enrichment for gene body hypomethylation for EMT targets in the CDH1-knockdown model (P <0.0001 [30]; Figure 1E), and for MIR34B and MIR34C targets [31] (Table S3 in Additional file 1). Concomitantly, average expression level of those hypomethylated genes was lower after knockdown of CDH1, as well as in basal-like compared to luminal-like breast cancer subtypes [32, 33] (P <0.004; Figure 1F). Collectively, these data suggest that following the induction of EMT by Twist expression, Twist reprograms the genome by demethylating gene bodies of epithelial cell-specific genes, leading to a decrease of their expression levels.

Twist1 increases the number of promoters with H3K4me3 by more than one fifth

Overall, the number of genes marked by H3K4me3 and also by both H3K4me3 and H3K27me3 (bivalent) was increased following Twist1-induced EMT (Figure 2A,B). Specifically, we observed that more than 20% (3,253 out of 15,853) of tallied genes acquired H3K4me3 but less than 3% (424 out of 15,853) of genes lost H3K4me3 (Figure 2C). As expected, acquisition of H3K4me3 was associated with increased mRNA expression whereas loss of H3K4me3 led to reduced expression of the corresponding genes (Figure 2D). GO analysis indicated that the set of genes that lose H3K4me3 is significantly enriched for genes associated with cell adhesion and differentiation (Figure 2E). Conversely, gain of H3K4me3, which is mediated by the TrxG complex, was found in EMT-promoting transcription factors, including zinc-ion binding proteins (i.e. ZNF75A), highlighting the dramatic effect of TrxG machinery in chromatin remodeling during EMT (Figure 2F). GSEA showed enrichment for estrogen receptor (ESR1) targets (P <0.0001, false discovery rate (FDR) q <0.05) within genes losing H3K4me3 (Table S4 in Additional file 1). As a result, ESR1 targets in HMLE vector cells lose the active mark H3K4me3 consistent with the three-fold decrease of ESR1 expression in HMLE Twist cells (data not shown). Importantly, genes losing H3K4me3 were also enriched for genes down-regulated in blood vessel cells from the wound site, suggesting epigenetic conservation of the EMT process between wound healing and cancer (Table S4 in Additional file 1). Thus, EMT is accompanied by a widespread gain in H3K4me3-mediated gene activation, and loss of H3K4me3 at ESR1 targets.

Figure 2
figure 2

H3K4me3 dynamic modifications are coupled with transcriptional changes related to epithelial-mesenchymal transition genes. (A) Pie chart showing the distribution of H3K4me3 and H3K27me3 marks in human mammary epithelial cells (HMLE) vector cells and HMLE Twist cells. (B) Landscape of H3K4me3 for CDH1 (loss of H3K4me3 in vector cells) and ZNF75A (gain of H3K4me3) in HMLE Twist cells. (C) Venn diagram of H3K4me3 at gene promoters in HMLE vector cells and HMLE Twist cells. (D) Box plots for gene expression changes in genes losing or gaining the H3K4me3 mark. (E) Gene ontology analysis using DAVID for genes losing the H3K4me3 mark. The x-axis represents the P-value levels and y-axis the gene ontology pathways. (F) Gene ontology analysis using DAVID for genes gaining the H3K4me3 mark. The x-axis represents the P-value levels and y-axis the gene ontology pathways.

Switches between H3K4me3 and H3K27me3 modulate transcriptional dynamics

Using chromatin immunoprecipitation sequencing (ChIP-seq) for the H3K27me3 repressive histone modification, we found that the genomic distribution of H3K27me3 was significantly reduced in HMLE Twist cells (250 Megabases in vector cells compared to 153 and 138 Megabases in HMLE Twist cells cultured in monolayer and spheres, respectively; data not shown). This is consistent with the notion that cells that have undergone EMT are less differentiated and have acquired stem cell properties [6]. Given these changes in the landscape of H3K27me3, we investigated switches between H3K27me3 and H3K4me3 during EMT. Expression of Twist1 caused a loss of H3K27me3 in more than 50% of the genes marked by H3K27me3 in HMLE cells (Figure 3A,B). Of the 2,070 genes that lost H3K27me3, approximately 11% (225 out of 2,070) switched to H3K4me3 (Figure 3C) and we found that transcription of these genes was dramatically induced (around five-fold; P <0.0001). Conversely, the genes that lost H3K27me3 without gain of H3K4me3 had no average change in their respective gene expression (Figure 3C). Overall, 102 genes switched from H3K4me3 to H3K27me3 after Twist1-induced EMT, and were transcriptionally repressed by more than 32 fold (Figure 3C). These chromatin switches were associated with differential gene expression, particularly at typical EMT markers. For example, the repression of E-cadherin expression during EMT correlated with a switch from H3K4me3 to H3K27me3 (Figure 3B). Conversely, gain of N-cadherin expression correlated with a switch from H3K27me3 to H3K4me3. Strikingly, the same interplay between H3K4me3 and H3K27me3 occurs for master genes involved in the EMT process, such as PDGFRα, which is essential for Twist1 to promote tumor metastasis via invadopodia [34], and the splicing regulator ESRP1, which is repressed by Snail1 to promote EMT [35] (Table S5 in Additional file 1). Among genes with highly altered expression during EMT increasing or decreasing at least nine fold), 23.1% of them switched between H3K4me3 and H3K27me3 marks, as compared to only 2.8% for genes without highly altered expression (Figure S3 in Additional file 4). Altogether, these data suggest that an epigenetic program orchestrated by TrxG or PcG complexes regulate key EMT genes.

Figure 3
figure 3

H3K27me3 switches orchestrate a mesenchymal cell-type specific gene expression signature. (A) Venn diagram of genes marked by H3K27me3 in human mammary epithelial cells (HMLE) vector cells and HMLE Twist cells. (B) Landscape of H3K27me3 mark in CDH1 gene (gain of H3K27me3 mark in HMLE Twist cells). (C) Box-plot of gene expression (the bars represent 10% and 90% extremes) for genes switching in HMLE vector cells from H3K4me3, H3K27me3, bivalent or neither marks to other histone mark combinations in HMLE Twist cells.

During EMT, parallel to the dramatic loss of H3K27me3 occupancy in nearly 50% of genes (n = 2,070) marked in HMLE vector cells, mesenchymal cells gained H3K27me3 at 1717 genes. GSEA analysis showed that these genes were enriched for functional categories in a cell-type specific manner. Indeed, the set of genes which gained H3K27me3 is related to genes down-regulated in a previously described CDH1 knockdown model of EMT [30] (Figure 4A) and to genes with low expression in basal-like as compared to luminal-like breast cancer cell lines [32] (P <0.0001; Figure 4B; Table S6 in Additional file 1). Of interest, the majority of genes down-regulated by CDH1 knockdown and which gained H3K27me3 in Twist1-induced cells were pre-marked by H3K4me3 in HMLE cells, highlighting the importance of TrxG and PcG switches in defining cell identity during EMT (Figure 4C). Furthermore, GSEA revealed that genes belonging to pathways that gained H3K27me3 were associated with DNA repair and mRNA splicing (Table S6 in Additional file 1). Conversely, genes belonging to pathways that lost H3K27me3 were associated with mitotic pre-metaphase and undifferentiated cancer signature (Table S6 in Additional file 1).

Figure 4
figure 4

Gene set enrichment analysis of H3K27me3 marks. (A) Gene set enrichment analysis showing positive enrichment for H3K27me3 marks in genes which are down-regulated by the CDH1-knockdown model of EMT (P <0.0001). The bottom graph represents the rank-ordered, non-redundant list of genes. Genes on the far left (red) correlated the most with decreased gene expression following CDH1-knockdown. The vertical black lines show the position of each of the genes of the studied gene set in the ordered, non-redundant data set. The green curve is related to the enrichment score curve. (B) Gene set enrichment analysis showing positive enrichment for H3K27me3 marks in genes distinguishing luminal from basal like breast cancer (P <0.0001). (C) Pie chart showing that the majority of downregulated genes in the CDH1-knockdown model of EMT which gain H3K27me3 in Twist1-cells were also pre-marked by H3K4me3 in vector cells. FDR, false discovery rate; NES, normalized enrichment score.

Importantly, we sought to investigate if the changes we observed in Twist cells could be replicated in other EMT model systems such as Snail and TGF-β1-induced model systems. If we found similar findings across multiple EMT models, this would rule out adaptation and suggest that the effect we observed in Twist cells was due to EMT and not necessarily adaptation. In fact, we found that the majority of sites (14 out of 17) demonstrated the same directional change in H3K4me3 and/or H3K27me3 by ChIP-qPCR in HMLE Snail, TGF-β1 and Twist cells as we observed by ChIP-seq in HMLE Twist cells (Figure S4 in Additional file 5). The Pearson correlation coefficients for Snail versus Twist (r = 0.8982, P <0.0001), for Snail versus TGF-β1 (r = 0.4613, P = 0.006) and for TGF-β1 versus Twist (r = 0.1791, P = 0.3108) point to close similarities between Snail- and Twist-induced EMTs in their effects on H3K4me3 and H3K27me3 whereas expression of TGF-β1 has a less similar effect (Figure S5 in Additional file 6). We also observed similar results for the methylation of DNA elements assessed using bisulfite sequencing in the promoters of seven genes randomly chosen out of genes switching between H3K27me3 and H3K4me3 (Figure S6 in Additional file 7, Figure S7 in Additional file 8 and Figure S8 in Additional file 9). Collectively, our data suggest that our a majority of changes due to Twist expression are not due to adaptation but rather shared with cells undergoing EMT through other means.

Enrichment in bivalent genes upon Twist1 induction

Bivalent genes were characterized initially in stem cells by the co-occurrence of H3K27me3 (repression) and H3K4me3 (activation) at genes which become either transcriptionally active (H3K4me3) or repressed (H3K27me3) upon differentiation [52] with few modifications. Library preparation and sequencing were performed on an Illumina/Solexa Genome Analyzer II or Hiseq 2000 in accordance with the manufacturer’s protocols. ChIP-seq reads were aligned to the human genome (hg18) using the Illumina Analyzer pipeline.

Unique reads mapped to a single genomic location were called peaks using the MACS software (version 1.3.7.1) for H3K4me3 marks (the window was 400 bp, and the P-value cutoff = 1e-5) [53].

For peak calling of H3K27me3, SICER (version 1.03) was used to detect peaks and enriched domains as the peaks were large and not as sharp as for H3K4me3 [54]. The window size was set as 200 bp as default. The gap size was determined as recommended by Zang et al. [54], or at most 2 kb, since the performance worsens as the gap size increases beyond more than 10 times the window size. Following Wang et al. [55], the E-value was set at FDR ≤5%, which was estimated as E-value (the expected number of significant domains under the random background) divided by the number of identified candidate domains. The FDR cutoff to further filter out the candidate domains by comparing to control was set as 5%.

Sequencing reads for histone H3 DNA were used as control for MACS and SICER. Annotated RefSeq genes with a peak located at their promoters (−1 kb to +0.5 kb of TSS) were identified as being marked by H3K4me3 or H3K27me3 modifications. For the pathway analysis, GO analysis was done using DAVID [56, 57]. DAVID analyses were performed online using parameters of EASE value of <1 × 10–5, count of >10, fold enrichment of >2 and Bonfferroni of <1 × 10–2. For GSEA, gene sets were downloaded from the Broad Institute’s MSigDB website [29]. Gene set permutations were used to determine statistical enrichment of the gene sets using the fold enrichment difference in histone modifications between H3K4me3 and H3K27me3 of mesenchymal cells (Twist1 cells) and vector cells.

To exclude the possibility of technical variations, we performed technical (independent IP) replicates for the ChIP of H3K4me3 and H3K27me3 in HMLE cells transduced with Twist1 and cultured in spheres followed by sequencing. Likewise, we performed a technical replicate for ChIP of H3K27me3 in HMLE vector cells. We obtained high correlations between the technical replicates (r >0.82; Table S10 in Additional file 1), suggesting that our findings were not due to chance. A list of primers used for ChIP-qPCR validation of selected genes is available in Table S11 in Additional file 1.

RNA-sequence library generation and map**

RNA extraction from vector cells and Twist1-transduced cells (monolayer and sphere) were done with Trizol reagent (Invitrogen, 15596–026). Library preparation was done using a SOLiD™ Total RNA-seq Kit according to the manufacturer’s protocol (Life Technologies, Carlsbad, CA, USA). Reads sequenced produced by the SOLiD analysis pipeline were aligned with to the National Center for Biotechnology Information BUILD hg19 reference sequence. Short reads were mapped to the human reference genome (hg19) and exon junctions using the ABI Bioscope (version 1.21) pipeline with default parameters. Only the tags that mapped to the hg19 reference at full 35-nucleotide length were used. Reads that aligned to multiple positions were excluded. Tags mapped to RefSeq genes were counted to derive a measure of gene expression. To compare the gene expression values, we reasoned that cell type change associated with EMT could result in a change in the total amount of RNA. We therefore used the most conservative normalization by assuming most genes did not change their expression. This was done by constructing a histogram of expression ratio and by assuming that the maximum of the histogram corresponded to no change in gene expression. When compared to the normalization procedure where the total tags mapped to the genes were assumed to be constant, the differences were less than 10%.

Data availability

All sequencing data and processed files are available on Gene Expression Omnibus accession number [GEO:GSE53026].

Authors’ information

GGM and JHT are first co-authors.

SAM and JPI are senior co-authors.