The epigenomic landscape of transposable elements across normal human development and anatomy

Pehrsson, Erica C.; Choudhary, Mayank N. K.; Sundaram, Vasavi; Wang, Ting

doi:10.1038/s41467-019-13555-x

The epigenomic landscape of transposable elements across normal human development and anatomy

Article
Open access
Published: 10 December 2019

Volume 10, article number 5640, (2019)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

The epigenomic landscape of transposable elements across normal human development and anatomy

Download PDF

13k Accesses
55 Citations
49 Altmetric
2 Mentions
Explore all metrics

Abstract

Transposable elements (TEs) have deposited functional regulatory elements throughout the human genome. Although most are silenced, certain TEs have been co-opted by the host. However, a comprehensive, multidimensional picture of the contribution of TEs to normal human gene regulation is still lacking. Here, we quantify the epigenomic status of TEs across human anatomy and development using data from the Roadmap Epigenomics Project. We find that TEs encompass a quarter of the human regulatory epigenome, and 47% of elements can be in an active regulatory state. We demonstrate that SINEs are enriched relative to other classes for active and transcribed marks, that TEs encompass a higher proportion of enhancer states in the hematopoietic lineage, and that DNA methylation of Alu elements decreases with age, corresponding with a loss of CpG islands. Finally, we identify TEs that may perform an evolutionarily conserved regulatory function, providing a systematic profile of TE activity in normal human tissue.

Dynamic changes in the epigenomic landscape regulate human organogenesis and link to developmental disorders

Article Open access 06 August 2020

Transposable elements generate regulatory novelty in a tissue-specific fashion

Article Open access 18 June 2018

Primate-specific transposable elements shape transcriptional networks during human development

Article Open access 23 November 2022

Introduction

Transposable elements (TEs) are repetitive sequences that comprise approximately half of mammalian genomes due to historic proliferation within the host. Although the vast majority of human TEs have lost the ability to transpose, many encode functional regulatory elements that can disrupt gene regulatory networks if left unchecked. In consequence, most TEs are epigenetically silenced by DNA methylation and/or repressive histone modifications in normal somatic tissues^{Full size image}

The aggregate epigenetic profile of TEs also differs by class, which includes DNA transposons, LINE (long interspersed nuclear element), SINE (short interspersed nuclear element), and LTR (long terminal repeat) retrotransposons, the primate-specific SVA class (SINE-VNTR-Alu), and Other class TEs, older elements whose sequence is too degraded to accurately classify (Fig. 1b; Supplementary Fig. 3a–c). For instance, the SINE class has a higher proportion of bases in the 4_Tx transcribed state and most active regulatory states than other classes, while the SVA class has more CpGs missing methylation data. The latter is likely due to SVA length (median 1,151.5 bp vs. 232 bp for all TEs; Supplementary Fig. 3d) and recent propagation in the genome, which decreases mappability (i.e., the likelihood that a read is correctly mapped). Indeed, the number of epigenomes in which individual SVA elements are missing methylation data is negatively correlated with mappability (Spearman correlation, ρ = −0.38, P-value < 0.001). Interestingly, a higher proportion of the LTR class is annotated with the 9_Het heterochromatin state, but the SINE class is more hypermethylated than other classes, suggesting that the two classes are subject to different mechanisms of epigenetic repression. Regression analysis on individual TEs confirms that this difference is driven primarily by TE class, not CpG density (Supplementary Discussion; Supplementary Table 2).

Despite their depletion in active epigenetic states, TEs have enormous representation in the genome, encompassing ~45% of its length and ~50% of CpGs. As a result, TEs comprise 26% of active regulatory and 44% of transcribed chromHMM states across all Roadmap epigenomes, as well as 22% of DHS peaks and 24% of H3K27ac peaks (Fig. 1c). Some TE classes contribute a disproportionate amount of each epigenetic state (Fig. 1d). For instance, the SINE class contributes 37–53% of the active and poised (10_TssBiv, 11_BivFlnk, 12_EnhBiv) chromHMM states and 41% of H3K27ac peaks within TEs, although it encompasses only 28% of TE bases. Therefore, although TEs are depleted in active epigenetic states, their contribution to them is substantial, and the extent of the contribution varies by TE class.

Potential for individual TEs to be epigenetically active

We next investigated the likelihood of each of the 4,430,788 human TE fragments to be annotated with an active epigenetic state in a Roadmap tissue. Within each epigenome, only a small fraction of individual TEs are in each active epigenetic state (see Methods; median ≤3% for all active regulatory states and DHS/H3K27ac peaks; Fig. 2a). However, the majority of TEs are annotated with an active epigenetic state in a Roadmap epigenome: 47% of individual TEs are in an active regulatory chromHMM state and 75% are in a transcribed state at least once. Specifically, 4% of TEs are in the 1_TssA state (characterized by H3K4 tri-methylation, H3K4me3), 44% in the 7_Enh state, and 33% and 24% overlap a DHS or H3K27ac peak, respectively. In contrast, 82% of TEs are in a repressed state (heterochromatin, 9_Het, or Polycomb repressed, 13_ReprPC and 14_ReprPCWk) in at least one Roadmap epigenome.

**Fig. 2: Individual TE potential to be annotated with an epigenetic state.**

Despite the extensive quality control performed on the Roadmap datasets, our results are dependent on variability between similar epigenomes, and some fraction of the 47% of TEs may be noise. The method of annotating TEs with chromHMM states also has a moderate impact on this result. When TEs are required to overlap the center of a 200 bp chromHMM annotation window, the statistic decreases to 37%, drop** further to 26% when TEs are required to overlap the center of a chromHMM annotation block (Supplementary Fig. 4a, b; Supplementary Discussion). Furthermore, although a TE must overlap the summit of a DHS or H3K27ac peak (the predicted binding location²⁴) to be annotated with that state, we cannot say with complete certainty that the TE is responsible for the peak.

However, the results are robust to the chromHMM model used. With the 18-state model generated by the Roadmap Project, which includes H3K27ac as well as the modifications included in the 15-state model, the number of TEs in an active regulatory state (states 1–4 and 7–11) in at least one epigenome increases slightly to 48%, despite a 23% reduction in the number of epigenomes under consideration (n = 98; Supplementary Fig. 5a, b). The Roadmap Project also trained separate 50-state chromHMM models on seven deeply profiled reference epigenomes (embryonic stem cells (ESCs), ES-derived cells, and IMR90) using all available epigenetic marks. Using those state calls, a median of 8% of TEs are in an active regulatory state in each epigenome (based on the corresponding 18-state model state; Supplementary Fig. 5c), as opposed to 3% with the 15-state model. Therefore, more nuanced annotation models increase the observed regulatory role of TEs.

The fraction of individual TEs ever in an active regulatory state is slightly lower than expectation, as estimated by shuffling the locations of all TEs ten times. On average, 49% (standard deviation, 0%) of shuffled TEs are in an active regulatory state in at least one epigenome. However, the particular TEs in an active regulatory state in each epigenome are not random, and we observe subfamily-specific patterns that are lost with shuffling (see below).

Although LTR elements comprise 16% of TEs, they account for 27% of TEs ever in the 8_ZNF/Rpts state and 23% of those ever in the 9_Het heterochromatin state, as well as 21–22% of TEs overlap** DHS or H3K27ac peaks (Fig. 2b; Supplementary Fig. 6a). SINE elements (40% of all TEs) represent 50–51% of TEs ever in the transcribed 4_Tx state, the transcribed flanking 3_TxFlnk state, the genic enhancer 6_EnhG state, or expressed RPKM >1 (reads per kilobase per million reads), likely due to greater overlap with protein-coding introns (Supplementary Fig. 6b).

TE regulatory signatures in the Roadmap epigenomes are restricted rather than universal. A TE in an active regulatory state or overlap** a DHS or H3K27ac peak remains in that state in only 1–6% of epigenomes (median; Fig. 2c), while a TE remains in a transcribed chromHMM state in 6–9% of epigenomes. The degree of restriction also exhibits class-specific variation: for example, SINE and SVA elements remain in the 5_TxWk state longer than other TEs (Supplementary Fig. 6c). In contrast, RefSeq promoters and exons exhibit a much more universal activity profile (Supplementary Fig. 7a–d). 78% of promoters are in the 1_TssA state and >92% overlap a DHS and H3K27ac peak summit in at least one epigenome, while 93% of RefSeq exons are expressed RPKM >1 at least once (vs. 32% of TEs). A substantial proportion of promoters and exons (26–48%) also remain in those states in ≥90% of epigenomes. These results demonstrate that the ability of individual TEs to be regulatory elements, while greater than previously expected, is dramatically less than that of dedicated regulatory and genic elements.

To ensure that biological outliers did not skew our results, we repeated the analyses excluding five cancer cell lines and IMR90, which was identified as an epigenetic outlier in ref. ¹⁹ (Supplementary Fig. 7e, f). The results are very similar, except that the number of TEs hypo- or intermediately methylated in any epigenome decreases by 6–8% with the exclusion of IMR90, which has a much higher proportion of lowly methylated CpGs (Supplementary Fig. 1).

A small fraction of TEs is consistently annotated with the same epigenetic state in all epigenomes, and most are repressed (Fig. 2d). Of the 2% of TEs consistently in the same chromHMM state (n = 106,228), 99.9% are in the 15_Quies quiescent state, which lacks ChIP-seq signal for the five constituent histone modifications. However, there are a few examples of TEs in an active epigenetic state in all epigenomes (Supplementary Data 1; Supplementary Discussion), and even more in 90% of epigenomes (Fig. 2e). The latter are frequently enriched for SINE elements, although LTR elements are enriched among TEs overlap** DHS peaks (Fig. 2f). Many of these TEs overlap RefSeq promoters and genes, including 92% of TEs expressed RPKM >1 in all epigenomes (79% of which overlap protein-coding exons), but others are intergenic. This includes 13 TEs overlap** a DHS peak in all epigenomes that are >50 kb from the nearest RefSeq gene and do not overlap a GENCODE annotation. They are frequently in the 1_TssA promoter state, the 7_Enh enhancer state, or the 8_ZNF/Rpts state and may represent uncharacterized transcripts or enhancers.

Epigenetic state dynamics of TEs

Because most TEs exhibit restricted regulatory activity, we next examined the dynamics of their epigenetic profiles across Roadmap epigenomes. Each TE is annotated with a median of four chromHMM and two methylation states across all epigenomes, although nine TEs are annotated with all 15 chromHMM states (Fig. 3a, b). However, the set of states with which each TE is annotated varies (Fig. 3c, d). For example, TEs in the poised promoter states (10_TssBiv and 11_BivFlnk) in any epigenome (<1% of TEs) spend an average of 10–16% of epigenomes in the corresponding active promoter and promoter flanking states (1_TssA and 2_TssAFlnk) and 13–15% of epigenomes in the Polycomb repressed state (13_ReprPC), a pattern typical of a poised regulatory element becoming active in a differentiating cell lineage. In contrast, the average TE spends ≤1% of epigenomes in those states.

Interestingly, TEs ever in repressed states are less dynamic than those ever in active states (Supplementary Fig. 8a, b). However, this may be partially due to long TEs that overlap multiple chromHMM states within a single epigenome (Supplementary Discussion; Supplementary Fig. 8c, d).

Although all TEs exhibit similar chromHMM dynamics regardless of class (Fig. 3a), SINE and SVA elements have less dynamic methylation profiles (Fig. 3b; Supplementary Fig. 8e). They are less likely to be hypo- or intermediately methylated in any epigenome (Supplementary Fig. 6a), and they remain intermediately methylated in fewer epigenomes (median 3–5% of epigenomes vs. 8–14% for other classes). This may be due to the tendency of SINE and SVA elements to overlap CpG islands (1% of SINE and 35% of SVA elements vs. <0.5% for other classes), which have greater coordination of CpG methylation levels and may result in a more bimodal distribution of average methylation. Taken together, these results suggest that some TEs have biochemical activity specific to particular tissues or developmental states, and that TE phylogeny contributes to this profile.

Compared to background (ten iterations of shuffled TEs), true TEs are slightly less dynamic than expected, although the results are almost identical (Supplementary Fig. 9a, b; Supplementary Discussion).

Finally, we confirmed the concordance of the five experimental techniques used to query TE epigenetic status (Supplementary Fig. 10a–e). TEs are significantly more likely to overlap a DHS or H3K27ac peak, be in an active regulatory chromHMM state, or be hypo- or intermediately methylated when also in one of the other states (pairwise by-technique Chi-squared tests, P-value = 0). In fact, although only 2% of TEs overlap a DHS peak and 0.7% overlap an H3K27ac peak (of the TE-by-epigenome instances with data for both metrics), 0.2% are annotated with both peaks, ten times the expected number if the peaks were independent. Furthermore, the proportion of TEs overlap** both peaks increases to 3–4% when the TE is also in the 1_TssA or 7_Enh state. However, because the epigenetic marks are not completely redundant, we have included all in our analyses.

Differences in TE activity by tissue classification

In our analyses across all Roadmap epigenomes, we established that TEs contribute different proportions of each epigenetic state to the genome (Fig. 1c), and that this pattern differs by TE class (Fig. 1d). Figure 4a demonstrates that there is also variation between epigenomes in the contribution of TEs to epigenetic states. Besides CpGs missing methylation data, the 8_ZNF/Rpts state exhibits the largest variation, ranging from 21% of the state within TEs (E002, ES-WA7 Cells) to 71% (E051, Primary hematopoietic stem cells G-CSF-mobilized Male). Indeed, TEs can be either enriched or depleted in this state compared to their genomic representation (dashed line) depending on the epigenome.

**Fig. 4: Variation in TE contribution to epigenetic states by epigenome classification.**

This variation across epigenomes is driven in part by tissue. The proportion of most epigenetic states within TEs is significantly different across the group classifications assigned by the Roadmap Project (P-value < 0.05, Kruskal–Wallis test, Bonferroni correction; Supplementary Discussion). As expected, ESCs are enriched among the samples with a higher contribution of TEs to the 1_TssA state than the total across epigenomes (solid black lines; ESC mean 16%; P-value < 0.1, permutation test with 1,000 permutations, FDR (false discovery rate) correction; Supplementary Data 2)^25,26,27. Interestingly, blood groups (Blood & T-cell and HSC (hematopoietic stem cell) & B-cell) are also enriched among epigenomes with a higher contribution from TEs to the weakly transcribed (5_TxWk) and enhancer (6_EnhG and 7_Enh) states, while Brain epigenomes are depleted (mean 5_TxWk: 48–49% vs. 45%; 6_EnhG: 30–32% vs. 21%; 7_Enh: 37–38% vs. 25%). Other trends can be observed using additional Roadmap-assigned classifications (Anatomy and Type) or other epigenome metadata (e.g., donor age, germ layer of origin, and cancer cell lines; Supplementary Data 3), including that cancer cell lines are enriched among epigenomes with a higher proportion of many active and poised regulatory states in TEs (P-value < 0.05, Kruskal–Wallis test, Bonferroni correction; Supplementary Fig. 11).

The proportion of each epigenetic state within TE classes also shows tissue-specific patterns (Fig. 4b). For instance, only the LTR class shows enrichment of ESCs among epigenomes with a higher contribution of TEs to the 1_TssA, 2_TssAFlnk, 3_TxFlnk, 7_Enh, and H3K27ac states (P-value < 0.1, permutation test with 1,000 permutations, FDR correction). Similarly, only the SINE class shows enrichment of both HSC & B-cell and Blood & T-cell epigenomes among those with a higher contribution of TEs to the enhancer states (6_EnhG and 7_Enh). Thus, the patterns observed in Fig. 4a are not evenly distributed across TE classes.

Finally, we compared the average genome-wide expression level to the TE expression level, which recapitulates the results observed with epigenetic marks (Fig. 4c). The median ratio of TE-to-genome expression is 14% across all epigenomes, but the range is 2–45%, with blood, brain, and ESC epigenomes exhibiting a much higher TE expression level relative to the rest of the genome. All classes exhibit higher relative expression in blood and brain, while the ESC pattern is observed only for LTRs (Fig. 4d). Together, these results point to a potential biological role for specific TE subgroups in specific organs, which may be facilitated by tissue-specific expression of transcription factors whose binding sites are encoded by various phylogenetic lineages of TEs.

Tissue-specific subfamily enrichment in epigenetic states

We next asked whether the tissue specificity we observed at the class level was present at the finer phylogenetic resolution of the TE subfamily. The vast majority of TE subfamilies overlap all 15 chromHMM states (759 subfamilies, 78%), all four methylation states (100% of subfamilies with CpGs), and DHS and H3K27ac peak summits (965 and 963 subfamilies, respectively) in at least one Roadmap epigenome (Fig. 5a; Supplementary Fig. 12a), and most subfamilies are in most states in ≥75% of epigenomes (Supplementary Fig. 12b). These numbers generally match expectation (ten iterations of shuffled TEs; Supplementary Fig. 12a), except that 92% of shuffled subfamilies overlap all 15 chromHMM states (n = 894, standard deviation = 6). Supplementary Fig. 12b suggests that this is due to more overlap with the small active and poised regulatory states, possibly due to exclusion of true TEs from developmentally important gene promoters.

**Fig. 5: Tissue-specific enrichment of TE subfamilies in epigenetic states.**

To identify subfamilies exhibiting coordinated epigenetic profiles, we calculated the log odds ratio (LOR) enrichment of each TE subfamily in each epigenetic state in each epigenome compared to genomic background (Equation (1)). In total, there are 32,947 enrichments with LOR > 1.5 (approximately a three-fold enrichment over genomic background), 70% of which are in the 8_ZNF/Rpts and 9_Het states. Shuffling TE locations abrogates the vast majority of enrichments and enriched subfamilies (Supplementary Fig. 12c, d), confirming that many subfamily elements have coordinated epigenetic profiles.

Subfamily enrichment in epigenetic states can differentiate epigenome categories. Principal component analysis (PCA) performed on chromHMM state enrichments clearly separates Roadmap groups, including ESCs/iPSCs (induced pluripotent stem cells) and blood epigenomes along the second principal component (PC2) (Fig. 5b). Brain and digestive epigenomes and other organs form a large cluster in the center of the plot. This analysis closely recapitulates the results observed using genome-wide epigenetic profiles of the Enh and ReprPC states, which also separated these major epigenome groups¹⁹ . Interestingly, in our analysis, fetal epigenomes form a subcluster within the tissue-based clusters, suggesting that developmental stage influences the epigenetic profile of TE subfamilies. Similar results are observed for different epigenetic assays (Supplementary Fig. 13a–d) and epigenome classifications (Supplementary Fig. 13e–h), including that cancer cell lines cluster to one side of PC1. Subfamilies can also be distinguished by their epigenetic profiles (Fig. 5c; Supplementary Fig. 13i–k).

**Fig. 6: SINE and Alu methylation as a function of age.**

To identify TE subfamilies that may perform tissue-specific coordinated regulatory functions, we next identified subfamilies that are enriched LOR > 1.5 more often than expected in an epigenome category in an active state (active regulatory or transcribed chromHMM state, hypo- or intermediately methylated states, or DHS or H3K27ac peak overlap; FDR-corrected P-value < 0.05, permutation test; Fig. 5d; Supplementary Data 4; Supplementary Discussion). As in ref. ^{19a–d). This pattern is most striking for SINE elements (Fig. 6a, b). Interestingly, SINE TEs decrease not just in length but in CpG density with age, which would not be expected if CpGs were evenly distributed across the TE (Fig. 6c). Indeed, older SINE TEs are significantly less likely to overlap CpG islands (logistic regression, Jukes-Cantor evolutionary distance predictor coefficient P-value < 0.001), and the median age of SINE elements that overlap CpG islands is much lower than those that do not (0.04 vs. 0.17). Taken together, these results indicate that SINE elements with CpG islands are subject to greater levels of repressive DNA methylation, but that the CpG islands are eliminated from the TEs as they age.}

To confirm that the trends observed for all SINE elements are not the result of bimodal age distribution between the older SINE families (MIR, Deu, tRNA, and SINE, median Jukes-Cantor evolutionary distance 0.37 ± interquartile range (IQR) 0.10; Fig. 6a) and the younger Alu subfamilies (median distance 0.13 ± IQR 0.07; Fig. 6d), we repeated the analysis with only Alu elements (66% of SINE TEs), which recapitulated the results (Fig. 6e, f; Supplementary Table 3).

Evolutionary conservation of TE regulatory signatures

Finally, we investigated whether the regulatory signatures we observed in human TEs are evolutionarily conserved by comparing them to orthologous TEs in the mouse genome. Although human and mouse diverged approximately 90 million years ago³⁴, 6% of hg19 TEs have an identifiable corresponding region in the mm10 genome (n = 269,096) and overlap an mm10 TE from the same subfamily (n = 269,801), which we consider orthologous pairs (Supplementary Fig. 20a).

We profiled the methylation level and chromHMM state of the orthologous TEs in twelve samples interrogated by the mouseENCODE project^21,22,23 that anatomically matched human Roadmap epigenomes (Supplementary Table 4). Although only a small fraction of orthologous TEs is hypomethylated in either human or mouse, those hypomethylated in one species are significantly more likely to be hypomethylated in the other in the corresponding tissue than expected by random chance (P-value < 0.001, Chi-squared test for seven human-mouse epigenome pairs with WGBS data; Cramer’s V 0.10 to 0.14; Fig. 7a, b). This result holds true when all methylation states are considered (Cramer’s V 0.16 to 0.23). Thus, the DNA methylation level of orthologous TEs is more conserved than expected between human and mouse.

**Fig. 7: Conserved epigenetic profiles of orthologous TEs in human and mouse.**

The histone modification profile of orthologous TEs is also conserved across species. Although the human and mouse epigenomes were annotated with different chromHMM models, there is a non-random distribution of orthologous TEs in each human/mouse chromHMM state combination (P-value < 0.001, Chi-squared test for twelve epigenome pairs; Cramer’s V 0.22 to 0.26). Figure 7c demonstrates a clear enrichment of the promoter state (TssA) in mouse for TE orthologs annotated with the promoter state (1_TssA) in human, as well as an enrichment in multiple active regulatory mouse chromHMM states (mouse: TssA, TssAFlnk1, TssAFlnk2, Enh, EnhLo1, and EnhLo2) for orthologs in human active regulatory states (human: 1_TssA, 2_TssAFlnk, 3_TxFlnk, 6_EnhG, 7_Enh). The epigenetic profile of shared TE subfamilies (578 of 968 human subfamilies; Supplementary Fig. 20b) is also conserved across species (Supplementary Fig. 20c–e).

We next identified TEs that exhibit conserved tissue-specific epigenetic profiles (Fig. 7d). Of the TEs with tissue-specific promoter state (1_TssA) annotation in humans (n = 162), 0–29% also exhibit tissue-specific promoter state (TssA) annotation in mouse in the same tissue. When considering all active regulatory states (n = 7,442), 3–14% of tissue-specific TEs exhibit the same profile in mouse. All TEs with conserved tissue-specific promoter annotations (n = 19) are located within or near human RefSeq genes, and in many cases, their conserved epigenetic profile likely stems from overlap with the promoter region of a gene whose tissue-specific expression profile is conserved between the species. For example, two conserved MIR elements (chr12:57443327–57443493 and chr19:39302694-39302815) overlap MYO1A and LGALS4 or their promoters, which have high intestinal expression. However, 54 of the TEs with conserved active regulatory annotations are >50 kb from the nearest RefSeq gene, including 28 that are >100 kb. Four such elements are displayed in Supplementary Fig. 21, including three brain-specific enhancers and an intestine-specific element that is an enhancer in human and a promoter in mouse. These include a UCON29 element, a subfamily that has been shown to have brain-specific enhancer activity in this paper and others³⁹.

As mentioned above, Cao et al.²⁰ used a machine learning model trained on ENCODE histone modification data to demonstrate that 35% of TEs have signatures of cis-regulatory elements. Although this is moderately lower than our estimate, they reveal via saturation analysis that more TEs may perform regulatory functions in additional cell types. They also confirmed that TE enhancer activity is highly tissue-specific and can be used to distinguish cell types, and they profiled the tissue-specific enrichment of TE families as enhancer- or promoter-like elements in Roadmap tissues. Interestingly, they found that TEs with ESC/iPSC enhancer activity had highly conserved epigenetic profiles in primates, but less so in less related species, and they uncovered widespread 3D interactions between MIR and L2 elements that modulate their enhancer activity.

An additional study that identified TE-derived promoters and enhancers using FANTOM5 CAGE-seq (cap analysis gene expression) data¹⁸ determined that 45% of enhancers and 5% of promoters overlap TEs, lower than random expectation, with more tissue-specific enhancers overlap** TEs, particularly those specific to blood and testis. Older TE families had more members overlap** enhancers, as did ERVs. Additionally, they found that although enhancers are biased toward overlap with mammal-specific and ancient TEs, enhancers that overlap younger TEs are more likely to be active in only a single tissue. Villar et al.⁴⁰ also identified an enrichment of younger repeat families in recently evolved enhancers, although ancient ERV families are more likely to be exapted as recently evolved promoters.

An important caveat to our results is the exclusive use of uniquely mapped, short (36 bp) reads to generate chromHMM state assignments and peak calls. TEs present a mappability challenge due to their repetitive nature, particularly for short reads and young TE subfamilies that have not accumulated sequence changes over time, such as the SVA subfamilies, young Alu and L1 subfamilies, and some ERVs. Although we capture enrichment of very young subfamilies with low mappability in active regulatory states, it is possible that using multimapped reads assigned at the subfamily level would reveal additional TE activity and that our results represent a lower bound. In particular, many SVA and SINE elements have low mappability due to a combination of age, length, and prolific expansion (Supplementary Fig. 3d), and these classes may be disproportionately affected by mappability. However, mappability is not correlated with the number of epigenomes a TE is annotated with the 15_Quies state (Spearman correlation, rho = 0.01, P-value < 0.001), and the TEs in each chromHMM state exhibit histone modification, chromatin accessibility, and DNA methylation profiles characteristic of that epigenetic state (Supplementary Fig. 22a).

In cancer, the genome undergoes global DNA hypomethylation and exhibits dysregulated chromatin⁴¹. The enormous number of TEs with narrowly restricted regulatory profiles suggests that they are subject to tight epigenetic control and could be rapidly de-repressed during malignant transformation. In line with this hypothesis, we observe greater contribution of TEs to active regulatory states in cancer cell lines. Aberrantly activated TEs can alter the expression of nearby genes, serve as alternative promoters that form chimeric or immunorestricted transcripts, or even drive oncogene expression³⁶. However, epigenetic therapies may also potentiate cancer immunotherapy through the activation of TEs, which increase tumor immunogenicity by forming double-stranded RNA and immunogenic proteins^{42,43,44,45,46}. Resources such as this study could inform predictions of which TEs are activated in cancer and in response to epigenetic therapies, including in normal tissues to prevent potential off-target effects.

In conclusion, this study represents an important synthesis of epigenetic data in the context of TEs that could serve as a resource for investigations of this underexplored aspect of the human genome in healthy and diseased states.

Methods

Data download

Data from the Roadmap Epigenomics Project were downloaded from the data portal (http://egg2.wustl.edu/roadmap/; see Data Availability). For this study, all consolidated epigenomes were included (n = 127), although not all epigenetic marks were profiled for each epigenome. chromHMM state assignments for the 15-state model are presented with the same colors in this paper as in the original Roadmap paper (https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state), with the exception of 15_Quies (light gray instead of white). Colors for the 18-state chromHMM model are from the Roadmap data portal (https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#exp_18state). Supplementary Fig. 4 of ref. ¹⁹ provides the 18-state chromHMM model state for which each 50-state model state is most enriched, which was used to assign colors and broader state categories to each of the 50-state model states. The figure was also used to identify histone modifications that characterize each state. Epigenome Group assignments also use the same colors as in the Roadmap paper. TE class colors are those used in the RepeatBrowser track of the WashU Epigenome Browser, except for SVA (Other on the track).

mm10 chromHMM assignments and CpG methylation levels were downloaded from the ENCODE data portal (https://www.encodeproject.org/) (see Supplementary Table 4 for accessions). Human epigenomes from the Roadmap Epigenomics Project and mouse epigenomes from ENCODE were paired based on anatomy. Mouse chromHMM state assignments were generated with a different 15-state model incorporating 8 epigenetic marks (the five core histone modifications used in the hg19 state assignments plus H3K27ac, H3K9ac, and H3K4me2)^21,22,23. Although the promoter and active regulatory states are not identical between the hg19 and mm10 chromHMM state assignments, we confirmed that characteristic histone modifications are similar for each (e.g., 1_TssA and TssA, 2_TssAFlnk and TssAFlnk1/TssAFlnk2, and 7_Enh and Enh/EnhLo1/EnhLo2). Colors were assigned to mm10 chromHMM states based on similarity to hg19 chromHMM states.

hg19 chromosome sizes; RepeatMasker annotations for hg19 and mm10; hg19 RefSeq genic feature locations; hg19 CpG island locations (cpgIslandExtUnmasked.txt); the hg19 36 bp mappability track (wgEncodeCrgMapabilityAlign36mer.bw); and GENCODE v19 comprehensive genes were downloaded from the UCSC Genome Browser.

Epigenome metadata

In addition to the Group, Anatomy, and Type metadata assigned to each epigenome by the Roadmap Project, we assigned three additional categories, Cancer, Age, and Germ layer (Supplementary Data 3). Cancer: Cancer and cancer cell lines were designated “yes”; all other epigenomes designated “no”. Age: Epigenomes were split into age categories based on the age of the donor or presence of the words “fetal” or “adult” in the epigenome name. ENCODE cell lines were assigned to “Unknown” or “Cell line” based on whether they were listed as “primary cells” or “cell line” in the epigenome name or in ref. ²¹. Germ layer: Epigenomes were split into germ layer of origin, including endoderm, ectoderm, and mesoderm, “mixed” (organs with multiple germ layer origins), “pluripotent” (ESC/iPSC cells plus mesendoderm cells), and “other” (placenta/trophoblast).

Feature pre-processing

All analyses were restricted to chromosomes 1–22, X, Y, and M for hg19 and chromosomes 1–19, X, Y, and M for mm10. 31,580 TEs are on chromosome Y (0.7%) and are missing from six Roadmap epigenomes without chrY annotations.

RepeatMasker-annotated repeats were restricted to classes LTR, DNA, SINE, LINE, Unknown, Unknown?, DNA?, LINE?, SINE?, LTR?, RC, RC?, and Other. For hg19 by-class analyses, RepeatMasker class Other was renamed “SVA”, as it includes only SVA subfamilies. Classes Unknown, Unknown?, DNA?, LINE?, SINE?, LTR?, and RC were combined into a single category (“Other” class).

Promoters were generated by extending the region 2000 bp upstream of RefSeq transcription start sites to 500 bp downstream using bedtools slop⁴⁷. RefSeq promoters and exons were collapsed by coordinate and strand to identify unique features. Intergenic regions were generated by taking the complement of RefSeq genes using bedtools complement. RefSeq features were split into protein-coding and non-coding features based on accession (NM vs. NR).

The Jukes-Cantor evolutionary distance for each TE was calculated from the substitution rate compared to the RepBase consensus sequence. A mappability score for each TE was calculated in⁶ using the 36 bp mappability track from the UCSC Genome Browser, as all ChIP-seq reads were trimmed to 36 bp for use in chromHMM state assignment.

Intersection with Roadmap data

Features were intersected with epigenetic data or other features using bedtools intersect without regard to strand, except where noted (Supplementary Fig. 2a). Features were considered overlap** if they overlapped by ≥1 bp. CpGs where one base did not overlap the feature were included in the total as half a CpG. As in the Roadmap Epigenomics Project, CpGs whose read coverage was ≤3 reads were considered missing values.

The proportion of the entire genome, chromosome, TEs, TE classes, TE subfamilies, and RefSeq features in each state or overlap** another feature was calculated as the proportion of all unique bases or CpGs within the feature that were (1) annotated with the chromHMM state, (2) at that methylation level (WGBS, divided into four methylation states), (3) overlap** a DHS or H3K27ac peak (not limited to peak summits), or (4) overlap** the second feature. The exceptions are chromosome and subfamily enrichment analyses, where the number of unique DHS/H3K27ac peak summits overlap** the feature were counted (Supplementary Fig. 17c, d; Fig. 6b–d). The average expression level over the entire genome, all TEs, and each TE class was calculated as the average read coverage over the feature.

The contribution of TEs to the DHS and H3K27ac states (Figs. 1c and 4a) is roughly equivalent whether total bases overlap** peaks or number of peaks is considered. TEs overlap 27% of DHS peak summits across all epigenomes, with a range of 19% vs. 35% per epigenome, and 26% of H3K27ac peaks, with a range of 17% vs. 38% per epigenome.

For Roadmap analyses, an individual TE or promoter was considered annotated with a state if it overlapped the center of a 200 bp chromHMM state annotation or the summit of a DHS or H3K27ac peak. Additionally, TEs located between 200 bp chromHMM window centers (18% of TEs) were assigned the state with the largest overlap with the TE, such that every TE was assigned to at least one chromHMM state. The same rules were applied for the 15-state, 18-state, and 50-state chromHMM models. Because there is variation in the number and average width of chromHMM blocks and DHS/H3K27ac peaks by epigenome, a TE is more likely to be annotated with a state in some epigenomes because there are more peak summits available to overlap (Supplementary Fig. 22b; Supplementary Discussion). A TE or promoter was considered in a methylation state if the mean methylation level (calculated over all CpGs overlap** the TE/promoter with >3 reads) fell within the range of that state. Features not overlap** CpGs were not considered for methylation analyses. The expression level of individual TEs and exons was calculated as the average read coverage over the feature, normalized by the normalization factors provided by Roadmap. RNA-seq coverage on chromosome Y in epigenomes lacking that chromosome (n = 6) was not included in the analyses.

Two alternative methods of annotating individual TEs with a chromHMM state were explored: (1) counting only TEs overlap** the center of a chromHMM annotation block and (2) counting only TEs overlap** the center of a 200 bp chromHMM annotation window (Supplementary Fig. 4a, b). The number of TEs annotated with the state per epigenome and in any Roadmap epigenome were calculated in a manner identical to that for the standard annotation rules.

Across all instances of an individual TE overlap** a DHS peak summit in an epigenome, 93% overlap only one peak, but they can overlap up to 17 DHS peaks in a single epigenome. Similarly, 97% of TEs overlap** H3K27ac peak summits overlap only one peak, but they can overlap up to 14 H3K27ac peaks in a single epigenome.

The chromHMM state of mm10 TEs was any state the TE overlapped by 1 bp or more. Methylation states were calculated as for hg19 TEs.

Shuffled TEs

The genomic positions of all TEs were shuffled using bedtools shuffle with default arguments, excluding chrM, contigs, and genome gaps. Ten iterations of shuffled TEs were generated. Individual shuffled TEs were annotated with epigenetic states using the same rules as for real TEs. Epigenetic dynamics, including the total number of states per TE across all epigenomes and aggregate profiles of epigenetic states across epigenomes (Supplementary Fig. 9), and subfamily potential and enrichment (Supplementary Fig. 12) were calculated in an identical manner as for real TEs. For subfamily enrichment in methylation states, the number of CpGs per subfamily was recalculated using the shuffled subfamily.

Permutation tests

To identify epigenome categories (i.e., Group: ESC) with enriched representation among the epigenomes with a higher or lower proportion of the state within TEs or a TE class than the cross-epigenome total, we permuted the category labels 1,000 times and compared the true representation to this distribution. Tests were performed simultaneously across all categories (e.g., Group, Germ layer; Supplementary Data 2), and epigenomes with proportions higher and lower than the cross-epigenome total were tested separately. FDR correction was performed only on epigenome categories with any epigenome higher or lower than the cross-epigenome total.

Subfamily enrichment

The enrichment of a TE subfamily i in a state j in an epigenome k was calculated as the log odds ratio (LOR) as in ref. ⁶, as follows:

$${\mathrm{log}}_2\left( {1{\mathrm{e}}^{ - 20} + \frac{{\frac{{\left( {ijk,{\mathrm{bp}},{\mathrm{peak}}\,{\mathrm{summits}},\,{\mathrm{or}}\,{\mathrm{CpGs}}\,{\mathrm{in}}\,{\mathrm{subfamily}}\,{\mathrm{in}}\,{\mathrm{state}}\,{\mathrm{in}}\,{\mathrm{epigenome}}} \right)}}{{\left( {ik,{\mathrm{bp}}\,{\mathrm{or}}\,{\mathrm{CpGs}}\,{\mathrm{in}}\,{\mathrm{subfamily}}\,{\mathrm{in}}\,{\mathrm{epigenome}}} \right)}}}}{{\frac{{\left( {jk,{\mathrm{bp}},\,{\mathrm{peak}}\,{\mathrm{summits}},\,{\mathrm{or}}\,{\mathrm{CpGs}}\,{\mathrm{in}}\,{\mathrm{state}}\,{\mathrm{in}}\,{\mathrm{epigenome}}} \right)}}{{\left( {k,{\mathrm{bp}}\,{\mathrm{or}}\,{\mathrm{CpGs}}\,{\mathrm{in}}\,{\mathrm{epigenome}}} \right)}}}}} \right)$$

(1)

A pseudocount of 1e⁻²⁰ was included to avoid –Inf values. For chromHMM states, ijk and jk were calculated as the total bp in the state overlap** the subfamily or in the entire genome; for WGBS, total CpGs; and for DHS and H3K27ac, total peak summits. ik and k refer to the lengths of the subfamily and genome. The LOR enrichment of each subfamily in the 50-state chromHMM model states was calculated using the same rules and thresholds as for the 15-state model.

For subfamily potential analyses, a subfamily was considered annotated with a state if ijk was greater than zero (Fig. 5a).

PCA was performed using the prcomp() function on variables with variance >0 between the epigenomes/subfamilies. Only subfamilies with >30 members (>30 members with CpGs for WGBS) were included. Matrices were scaled and centered before PCA was performed.

Only subfamilies with >30 total members and >10 members in the state in the epigenome were considered enriched (LOR > 1.5). With these thresholds, 31 subfamilies were excluded from chromHMM/DHS/H3K27ac enrichment analysis (plus 2 additional subfamilies in epigenomes where chrY was absent). For WGBS analyses, only members overlap** CpGs were considered, which excluded 23 additional subfamilies. Three TE subfamilies (Charlie1b_Mars, CheshMITE, and HAL1N1_MD) overlap no CpG. 34,892 enrichments were excluded by the second threshold, 10,757 of which were 8_ZNF/Rpts.

To identify subfamilies preferentially enriched in an epigenetic state in an epigenome category (e.g., Group: ESC), we performed permutation tests as above (Supplementary Data 4), testing whether epigenome categories were unusually represented among all enriched epigenomes for that state/subfamily combination. FDR multiple hypothesis correction was performed only on subfamily-by-state-by-category combinations with at least one epigenome with enrichment LOR > 1.5 and only on active states, and only categories with a corrected P-value < 0.05 were considered.

For candidate subfamilies, GREAT analysis was performed on all TEs in the state in epigenomes where the subfamily was enriched in the state. The default basal plus extension model (5 kb upstream, 1 kb downstream, 1,000 kb distal) was used, and only GO Biological Processes were considered. Terms were considered significant if the FDR-corrected P-value < 0.05 for both binomial and hypergeometric tests with a minimum region-based fold enrichment of 2. The default genomic background was used.

HOMER (findMotifsGenome.pl) was performed on candidate subfamilies to identify enriched known transcription factor motifs. TEs in the state in epigenomes where the subfamily was enriched in the state were compared to TEs from that subfamily that are never in the state in any Roadmap epigenome as background, with flags -size given –nomotif.

Epigenetic profile modeling

To test the relative contribution of class and CpG density to the mechanism of TE repression, quasi-Poisson generalized linear regression models were constructed for the number of epigenomes a TE is in the 9_Het or hypermethylated states (function glm(), quasipoisson family; Supplementary Table 2). Class (LTR vs. SINE) and CpG density (CpGs per kbp) were included as predictors. ANOVA (analysis of variance) with F tests for significance was performed to test differences in performance between models.

Generalized additive models of TE length, mappability, CpG density, and the number of epigenomes a SINE or Alu element is in each methylation state were constructed using Jukes-Cantor evolutionary distance as the predictor (function gam(); Supplementary Table 3). Logistic regression models of the likelihood a SINE or Alu element overlaps a CpG island were constructed using Jukes-Cantor evolutionary distance (JC) as the predictor (function glm(), binomial family). All SINE elements with CpGs (n = 1,430,171) or Alu elements with CpGs (n = 1,105,597) were included in the model.

Orthologous TEs

The mm10 genome has 3,663,513 TEs and 1,134 TE subfamilies. Orthologous TEs were identified by converting hg19 TE coordinates to mm10 using liftOver (−minMatch = 0.1), then identifying overlap** hg19 -to-mm10 and mm10 TEs with identical subfamily names using bedtools intersect. 29% of hg19 TEs (n = 1,265,775) lift over to mm10. See the Supplementary Discussion for more detail on overlap between mm10 and hg19 TEs.

Chi-squared test effect size was calculated using Cramer’s V (from the R package rcompanion).

Epigenetic profile of TEs in chromHMM states

For each chromHMM state, for all instances of a TE in that state in an epigenome, the average histone modification ChIP-seq/DHS signal fold enrichment ratio over input for several epigenetic marks was calculated in 50 bp bins over a 10 kb region centered on the TE (excluding 26 TEs that would extend beyond the end of the chromosome). The epigenetic marks include the five core histone modifications used to generate chromHMM (H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K36me3), H3K9ac, H3K27ac, and DHS. The average methylation over the same window was calculated for epigenomes with WGBS data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding authors upon reasonable request. All datasets used in this manuscript were made publicly available as part of previous publications. Data from the Roadmap Epigenomics Project were downloaded from the data portal, including: epigenome metadata [http://egg2.wustl.edu/roadmap/web_portal/meta.html]; chromHMM state assignments using the 15-state model, 127 epigenomes [http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/all.mnemonics.bedFiles.tgz]; chromHMM state assignments using the 18-state model, 98 epigenomes [http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/core_K27ac/jointModel/final/all.mnemonics.bedFiles.tgz]; chromHMM state assignments using separate 50-state models, 7 epigenomes [http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/class1Models_50states/] (file format [EID]/[EID]_50_segments.bed.gz); WGBS fractional methylation, 37 epigenomes [http://egg2.wustl.edu/roadmap/data/byDataType/dnamethylation/WGBS/FractionalMethylation.tar.gz]; DHS narrow peaks, 53 epigenomes [http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/narrowPeak] (file format [EID]-DNase.macs2.narrowPeak.gz); H3K27ac narrow peaks, 98 epigenomes [http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/narrowPeak/] (file format [EID]-H3K27ac.narrowPeak.gz); strand-agnostic, unnormalized mRNA signal coverage [http://egg2.wustl.edu/roadmap/data/byDataType/rna/signal/unnormalized_wig/strandagnostic/] (file format [EID].wig.gz) and normalization factors [http://egg2.wustl.edu/roadmap/data/byDataType/rna/signal/unnormalized_wig/all.EGID.N.readlength], 56 epigenomes; and histone modification ChIP-seq and DHS signal fold enrichment ratios over input [http://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/foldChange/] (file format [EID]-[mark].fc.signal.bigwig) for H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K36me3 (127 epigenomes each), H3K9ac (62 epigenomes), H3K27ac (98 epigenomes), and DHS (53 epigenomes). mm10 chromHMM assignments and WGBS CpG methylation levels were downloaded from the ENCODE data portal [https://www.encodeproject.org/]; see Supplementary Table 4 for unique accessions. The source data underlying all figures is provided in the source data file (SourceData.tar.gz), except for Supplementary Fig. 21. A reporting summary for this Article is available as a Supplementary Information file.

Code availability

Software used in this study includes R version 3.3.0 and R packages plyr (1.8.4), reshape2 (1.4.3), ggplot2 (3.1.0), gridExtra (2.2.1), RColorBrewer (1.1.2), scales (1.0.0), mgcv (1.8.24), extraFont (0.17), and rcompanion (2.0.10); python versions 2.7, 3.6.5, and 3.7; bedtools (2.27.1); bedops; kentsrc; kentUCSC; htslib; numpy; pandas; HOMER (4.9); and GREAT (v3). Custom code used in this study is available at https://github.com/epehrsson/TE_landscape.

References

**e, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841 (2013).
Article CAS PubMed PubMed Central Google Scholar
Friedli, M. & Trono, D. The developmental control of transposable elements and the evolution of higher species. Annu. Rev. Cell. Dev. Biol. 31, 429–451 (2015).
Article CAS PubMed Google Scholar
Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752–1762 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
Article CAS PubMed Google Scholar
Sundaram, V. et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sundaram, V. et al. Functional cis-regulatory modules encoded by mouse-specific endogenous retrovirus. Nat. Commun. 8, 14550 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Britten, R. J. & Davidson, E. H. Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. Q. Rev. Biol. 46, 111–138 (1971).
Article CAS PubMed Google Scholar
Lynch, V. J., Leclerc, R. D., May, G. & Wagner, G. P. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat. Genet. 43, 1154–1159 (2011).
Article CAS PubMed Google Scholar
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Schmidt, D. et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335–348 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. MIR retrotransposon sequences provide insulators to the human genome. Proc. Natl. Acad. Sci. USA 112, E4428–E4437 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lin, L. et al. The contribution of Alu exons to the human proteome. Genome Biol. 17, 15 (2016).
Article PubMed PubMed Central CAS Google Scholar
St Laurent, G. et al. VlincRNAs controlled by retroviral elements are a hallmark of pluripotency and cancer. Genome Biol. 14, R73 (2013).
Article CAS Google Scholar
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Faulkner, G. J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).
Article CAS PubMed Google Scholar
Jacques, P.-E., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 9, e1003504 (2013).
Article CAS PubMed PubMed Central Google Scholar
Simonti, C. N., Pavlicev, M. & Capra, J. A. Transposable element exaptation into regulatory regions is rare, influenced by evolutionary age, and subject to pleiotropic constraints. Mol. Biol. Evol. 34, 2856–2869 (2017).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article CAS Google Scholar
Cao, Y. et al. Widespread roles of enhancer-like transposable elements in cell identity and long-range genomic interactions. Genome Res. 29, 40–52 (2019).
Article CAS PubMed PubMed Central Google Scholar
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS PubMed Central CAS Google Scholar
Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016).
Article CAS PubMed Google Scholar
Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central CAS Google Scholar
Wang, J. et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516, 405–409 (2014).
Article ADS CAS PubMed Google Scholar
Lu, X. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423–425 (2014).
Article ADS CAS PubMed Google Scholar
Göke, J. et al. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 16, 135–141 (2015).
Article PubMed CAS Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sasaki, T. et al. Possible involvement of SINEs in mammalian-specific brain formation. Proc. Natl. Acad. Sci. USA 105, 4220–4225 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Bejerano, G. et al. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441, 87–90 (2006).
Article ADS CAS PubMed Google Scholar
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Imbeault, M., Helleboid, P. Y. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017).
Article ADS CAS PubMed Google Scholar
Subramanian, R. P., Wildschutte, J. H., Russo, C. & Coffin, J. M. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology 8, 90 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Article CAS PubMed Google Scholar
Britten, R. J. & Davidson, E. H. Gene regulation for higher cells: a theory. Science 165, 349–357 (1969).
Article ADS CAS PubMed Google Scholar
Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).
Article PubMed PubMed Central CAS Google Scholar
Ohtani, H., Liu, M., Zhou, W., Liang, G. & Jones, P. A. Switching roles for DNA and histone methylation depend on evolutionary ages of human endogenous retroviruses. Genome Res. 28, 1147–1157 (2018).
Article CAS PubMed PubMed Central Google Scholar
Muotri, A. R. et al. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature 435, 903–910 (2005).
Article ADS CAS PubMed Google Scholar
Choudhary, M. N. et al. Co-opted transposons help perpetuate conserved higher-order chromosomal structures. Preprint at https://www.biorxiv.org/content/10.1101/485342v2 (2018).
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).
Article CAS PubMed PubMed Central Google Scholar
Baylin, S. B. & Jones, P. A. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).
Article PubMed PubMed Central CAS Google Scholar
Azad, N., Zahnow, C. A., Rudin, C. M. & Baylin, S. B. The future of epigenetic therapy in solid tumours–lessons from the past. Nat. Rev. Clin. Oncol. 10, 256–266 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chiappinelli, K. B. et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell 162, 974–986 (2015).
Article CAS PubMed PubMed Central Google Scholar
Roulois, D. et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).
Article ADS CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the Roadmap Epigenomics Project and ENCODE Consortium for generating and processing the data. In particular, we thank Zhi** Weng and Arjan van der Velde for their assistance with the mm10 chromHMM annotations and the mouseENCODE consortium for generating the raw data. We also thank **aoyu Zhuo for his advice on TE evolution. E.C.P. is supported by a Postdoctoral Fellowship, PF-17-201-01, from the American Cancer Society. E.C.P., M.N.K.C., and T.W. are also supported by NIH grants R01HG007354, R01HG007175, R01ES024992, U01CA200060, U24ES026699, U01HG009391, and American Cancer Society RSG-14-049-01-DMC. V.S. was partially supported by the Cancer Biology Pathway, Washington University, and is currently supported by a fellowship from the EMBL Interdisciplinary Postdoc (EIPOD) programme under Marie Skłodowska-Curie actions COFUND (grant number 664726).

Author information

Authors and Affiliations

Department of Genetics, Washington University School of Medicine, 4515 McKinley Avenue, St. Louis, MO, 63110, USA
Erica C. Pehrsson, Mayank N. K. Choudhary, Vasavi Sundaram & Ting Wang
The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
Erica C. Pehrsson, Mayank N. K. Choudhary, Vasavi Sundaram & Ting Wang
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Vasavi Sundaram
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, 63108, USA
Ting Wang

Authors

Erica C. Pehrsson
View author publications
You can also search for this author in PubMed Google Scholar
Mayank N. K. Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Vasavi Sundaram
View author publications
You can also search for this author in PubMed Google Scholar
Ting Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.C.P., M.N.K.C., V.S. and T.W. conceived the study; E.C.P. performed the analysis and wrote the paper; M.N.K.C. and V.S. contributed to data analysis and interpretation; M.N.K.C., V.S. and T.W. edited the paper; and T.W. supervised the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Erica C. Pehrsson or Ting Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pehrsson, E.C., Choudhary, M.N.K., Sundaram, V. et al. The epigenomic landscape of transposable elements across normal human development and anatomy. Nat Commun 10, 5640 (2019). https://doi.org/10.1038/s41467-019-13555-x

Download citation

Received: 15 January 2019
Accepted: 12 November 2019
Published: 10 December 2019
DOI: https://doi.org/10.1038/s41467-019-13555-x
Springer Nature Limited

This article is cited by

Statistical learning quantifies transposable element-mediated cis-regulation
- Cyril Pulver
- Delphine Grun
- Didier Trono
Genome Biology (2023)
Transposable elements as essential elements in the control of gene expression
- Alemu Gebrie
Mobile DNA (2023)
H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function
- Debosree Pal
- Manthan Patel
- Madapura M. Pradeepa
Nature Structural & Molecular Biology (2023)
The role of transposable elements in aging and cancer
- Pouria Mosaddeghi
- Mitra Farahmandnejad
- Mohammad M. Zarshenas
Biogerontology (2023)
Isoform-resolved transcriptome of the human preimplantation embryo
- Denis Torre
- Nancy J. Francoeur
- Robert Sebra
Nature Communications (2023)

The epigenomic landscape of transposable elements across normal human development and anatomy

Abstract

Similar content being viewed by others

Introduction

Potential for individual TEs to be epigenetically active

Epigenetic state dynamics of TEs

Differences in TE activity by tissue classification

Tissue-specific subfamily enrichment in epigenetic states

Evolutionary conservation of TE regulatory signatures

Methods

Data download

Epigenome metadata

Feature pre-processing

Intersection with Roadmap data

Shuffled TEs

Permutation tests

Subfamily enrichment

Epigenetic profile modeling

Orthologous TEs

Epigenetic profile of TEs in chromHMM states

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation