Background

Gene dosage of X-chromosomal genes in mammals is equalized between sexes by inactivation of one of the two X chromosomes in female cells [1]. During early embryonic development of mice, two waves of X chromosome inactivation (XCI) occur. At the two- to four-cell embryonic stage [embryonic day (E)1.5] the paternally derived X chromosome is inactivated, referred to as imprinted XCI. At the early blastocyst stage (E4.5) the X chromosome is reactivated, after which random XCI takes place: during a stochastic process either the maternally or paternally derived X chromosome is silenced (see Heard and Disteche [2], Barakat and Gribnau [3] and Jeon et al. [17, 18]. Although the earliest regions containing enriched occupancies of **st are spread across the entire linear X chromosome, these regions have a high frequency of close contact to the XIC. The early-enriched **st localization sites are gene dense and enriched for silent genes [17, 18]. From these early ‘docking stations’, a second wave of **st spreading occurs by pulling the actively transcribed genes as well as the gene-poor regions in closer proximity to the XIC. **st recruits the Polycomb repressive complex 2 (PRC2) and other proteins involved in gene silencing and chromatin compaction, creating a repressive nuclear compartment present in differentiated cells displaying stable XCI [1820]. In line with these observations, **st binding is proportional to the increase of PRC2 and the repressive trimethylation of lysine 27 on histone 3 (H3K27me3) on the ** [18, 21]. Similar to **st, the Polycomb proteins and H3K27me3 are first detected at ~150 canonical sites distributed over the **, after which spreading over active genes occurs [21, 22].

Despite recent advances in chromatin-associated changes of the ** during XCI, little is known on how this affects silencing of genes located on the ** at the transcript level. Lin et al. [23] investigated gene silencing during XCI by a comparative approach in which differentiating female and male ESCs were profiled in parallel. The female-specific changes were considered to be associated with XCI. However, female and male ESCs maintained in serum-containing media are distinct in their epigenetic make-up, with female ESCs being hypomethylated and male ESCs being hypermethylated [2426]. Also, differences in activity of the MAPK, Gsk3 and Akt signaling pathways have been reported [27], complicating direct comparisons between ESCs of different sexes.

After establishment of XCI, silencing of the ** is stably maintained in somatic cells during replication [28]. Although most genes are silent on the ** at this stage, some genes escape XCI and remain active. In human, at least 15 % of the X-linked genes have been shown to escape XCI [29]. These escape genes are distributed in clusters over the X chromosome [2931]. This suggests a common regulatory mechanism acting on chromatin domains, the nature of which remains elusive thus far. In mouse, around 15 escape genes have been identified [3237]. Except for **st, these genes are generally lower expressed from the ** compared with the Xa. It has been shown that the escape of Kdm5c in mouse adult tissues is preceded by silencing during early embryonic development [38]. However, for most other escape genes it is currently unclear whether they are initially silenced and reactivated or whether they are never subject to XCI.

Here, we set out to study the dynamics of X-linked gene silencing during the early stages of XCI by differentiation of female ESCs to embryoid bodies (EBs). To avoid comparative analysis between sexes and enable direct quantitative profiling of gene silencing on the **, we used female mouse ESCs with non-random XCI and polymorphic X chromosomes [39] to specifically determine the changes occurring on the (future) ** by high-resolution allele-specific RNA-seq. To investigate later stages, these ESCs were differentiated in vitro to neural progenitor cells (NPCs) [35]. We used allele-specific RNA-seq on the NPCs, in which XCI is fully established and maintained, to correlate the silencing dynamics of genes observed during early XCI with escape from XCI in the NPCs. By associating the genes that escape XCI with topologically associating domains (TADs) as determined in the female ESCs by genome-wide chromosome conformation capture (Hi-C) profiling, we investigate the role of chromatin domains during XCI. By determining the kinetics of gene silencing and correlating this to epigenomic features, our data provide further insight into the formation of the repressive complex during XCI.

Results

Experimental setup to study gene silencing on the ** using allele-specific RNA-seq

To determine the dynamics of gene silencing during XCI, we used female ESCs derived from an intercross of Mus musculus (M.m.) musculus 129/SV-Jae (129) and M.m. castaneus (Cast) as previously described [39, 40]. Due to the cross of genetically distant mouse strains, this ESC line contains two sets of chromosomes with many polymorphic sites, around 20.8 million genome-wide (~1 single-nucleotide polymorphism (SNP) per 130 bp) and around 0.6 million on chromosome X (~1 SNP per 300; see “Materials and methods”). These sites can be used to perform allele-specific quantification of X-linked and autosomal transcripts by RNA-seq [40]. The introduction of a transcriptional stop signal into the transcribed region of Tsix on the 129-derived X chromosome in the female ESC line results in complete skewing of **st expression toward the 129-targeted allele [39]. Therefore, the 129-derived X chromosome will always be inactivated during differentiation, allowing specific quantification of transcripts from the ** and the Xa, respectively (Fig. 1, “ES_Tsix-stop”, pink background). In undifferentiated female ESCs cultured in serum-containing ESC media, inhibition or blocking of Tsix transcription has been shown to be associated with aberrant **st upregulation and/or partial XCI [6, 23, 41]. Interestingly, we observed a fourfold reduction in **st expression and increased expression of X-linked genes during culturing of the ES_Tsix-stop ESCs in serum-free ESC culture media supplemented with two kinase inhibitors to maintain pluripotency (“2i” ESCs) [24, 27, 4245] compared with culturing in serum-containing media (“serum” ESCs; Additional file 1: Figure S1). Therefore, we used 2i ES_Tsix-stop ESCs to initiate XCI by differentiation towards EBs and performed allele-specific RNA-seq of the undifferentiated 2i ESCs as well as after 2, 3, 4 and 8 days of EB formation. Validation of the EB time course is documented in Additional file 1: Figure S2, and Figure S3.

Fig. 1
figure 1

Overview of the setup to study dynamics of gene silencing on the ** during XCI. Female ES_Tsix-stop ESCs [39] display non-random XCI due to a transcriptional stop in the coding region of Tsix, allowing allele-specific quantification of transcripts originating from the (future) ** by RNA-seq (pink background). To investigate stable XCI from the same female ES_Tsix-stop ESCs, we performed RNA-seq on a clonal NPC line derived from the ES_Tsix-stop ESCs (*NPC_129-**, red background) [35]. Also, we included RNA-seq on two NPC lines generated from the F1 hybrid ESCs before introduction of the transcriptional Tsix stop. Clonal lines were generated from these two NPC lines to ensure complete XCI skewing towards inactivation of the M.m. castaneus (Cast)- or the M.m. musculus (129)-derived X chromosome (NPC_Cast-**, orange background and NPC_129-**, dark purple background, respectively) [35]

To investigate stable XCI, we included allele-specific RNA-seq of three NPC lines that were previously generated in vitro from the polymorphic ESCs [22] by applying a new procedure based on the GSNAP (Genomic Short-read Nucleotide Alignment Program) algorithm [46], in which the alternative alleles of polymorphic sites are included in the reference genome during map**. This results in an unbiased map** of the 129- and Cast-derived sequence tags and an equal contribution in expression from the Cast- and 129-derived genomes in undifferentiated ESCs (Additional file 1: Figure S4a). To enable reliable allele-specific quantification of the RNA-seq, we only included genes for further analysis that (i) showed consistent Cast versus 129 allelic ratios over the polymorphic sites that are present within the gene body (standard error of the mean < 0.1); (ii) contained a total of at least 80 tag counts over polymorphic sites for each allele over the EB formation time course (equivalent to a standard deviation of the allelic ratio of a gene of < 15 % over the time course; see Additional file 1: Figure S4b and “Materials and methods” for further details). Together, our stringent criteria resulted in accurate quantification of allele-specific expression as exemplified in Additional file 1: Figure S4c, d. In total, we obtained allele-specific quantification for 9666 out of a total of 13,909 unique RefSeq genes showing a mean expression of >0.5 RPKM (Reads Per Kilobase of exon per Million mapped reads) over the time course of EB formation (69 %). These include 259 genes on the X chromosome (out of a total of 590 genes with expression > 0.5 RPKM (49 %)). Further details on the samples profiled for this study are provided in Additional file 2: Table S1. Additional file 3: Table S2 contains the gene expression values and allelic counts for all RNA-seq samples.

XCI during EB formation of female ESCs and in NPCs

In order to evaluate the XCI occurring during the EB differentiation of the female 2i ESCs, we examined expression within the XIC. RNA-seq shows Tsix expression in the undifferentiated ESCs (ES_Tsix-stop T = 0 days), while **st is highly upregulated after two days of differentiation, specifically from the 129 allele (Fig. 2a, b). In line, **st clouds are robustly detected in more than half of the cells after two days of EB formation by RNA fluorescent in situ hybridization (FISH), and in 94 % of the cells after 8 days (Fig. 2a, right column). Activation of **st coincides with a global reduction in expression of X-linked genes of ~30 % after two days of EB formation (Fig. 2c). As the reduction of X-linked expression was not observed during EB differentiation of male cells, nor for autosomal genes, we conclude that this reflects the XCI occurring in the female cells. Within the NPCs, **st is highly expressed. As expected, **st is exclusively expressed from the 129 allele in *NPC_129-** and NPC_129-**, while in NPC_Cast-** **st is expressed from the Cast allele (Fig. 2b). Together, the data show that XCI is robustly initiated on the 129 allele during the EB differentiation time course of ES_Tsix-stop, and stably present in the NPCs.

Fig. 2
figure 2

X-linked gene expression during differentiation of ES_Tsix-stop ESCs towards EBs and in NPCs. a Tsix / **st expression dynamics during XCI in ES_Tsix-stop ESCs by EB differentiation, as well as in NPCs. Genome browser view of the Tsix/**st locus, and the percentage of cells positive for **st clouds as determined by RNA-FISH. b Total **st expression levels in RPKM (corresponding to (a); in black), as well as the contribution from the 129-derived (green) or Cast-derived (blue) alleles. c Distribution of gene expression in male (E14; blue) and female (ES_Tsix-stop; pink) ESCs during EB formation. All genes with an expression level of RPKM >0.5 in at least one condition are included (542 and 13,819 genes on the X chromosome and autosomes, respectively)

Kinetics of gene silencing during XCI on the **

To investigate the transcriptional changes occurring on the ** and Xa specifically, we calculated the ratio of 129/Cast over the time course (Fig. 3a). At a global level, the allelic ratios for autosomal genes remain stable. In contrast, genes on chromosome X show an increasing bias towards expression from the Cast allele, the X chromosome that remains active. After 8 days, gene expression is, on average, approximately fourfold higher from the Xa than from the **. Absolute quantification of gene expression shows that expression from the 129 and Cast alleles remains similar on autosomes (Fig. 3b, left panel). For X-linked genes, expression from the 129 allele (**) is gradually downregulated, while expression of the Cast allele (Xa) shows a relatively minor but significant (p < 0.05 [47]) increase in expression (Fig. 3b, right panel). The increase in activity is not specific for female cells but rather associated with differentiation, as male ESCs also show a similar trend (albeit not significant) of increased X-linked expression during EB formation (Fig. 2c, blue boxplots). Notably, by comparison of the individual time points in female cells we observed a slight but significant difference (p < 0.05 [47]) in XCI dynamics between lowly (RPKM ≤2) and highly (RPKM >2) expressed genes, as the lowly expressed genes show faster XCI dynamics than the highly expressed genes (Fig. 3c; Additional file 1: Figure S5).

Fig. 3
figure 3

Dynamics of gene silencing on the ** during XCI using allele-specific RNA-seq. a Distribution of relative expression of genes from the 129 versus the Cast allele during EB formation of ES_Tsix-stop. A log2 ratio of 0 represents equal biallelic gene expression from the 129 and Cast alleles, while positive and negative ratios represent higher expression from the 129 or Cast allele, respectively. b Distribution of absolute gene expressions from the 129 and Cast alleles (absolute allelic expression values in RPKM; see Materials and methods” for further details) in the ES_Tsix-stop ESCs during EB formation. c Median of the relative expression of genes from the 129 versus the Cast allele during EB formation of ES_Tsix-stop for highly and lowly expressed genes on chromosome X (same as the medians shown for the boxplots for chromosome X in Additional file 1: Figure S5b). For highly expressed genes we included genes showing a mean RPKM >2 over the time course (338 genes), while lowly expressed genes showed a mean RPKM ≤2 over the time course (81 genes). See Additional file 1: Figure S5 for further details

To further stratify genes showing similar XCI dynamics, we performed K-means clustering on the **/Xa ratio over the time course (Fig. 4a). The clustering revealed four clusters containing genes that show similar dynamics. The genes in cluster 1 are mainly silenced on the ** within 2 days of EB formation, and therefore these genes are inactivated relatively fast (labeled as “early”). The genes in cluster 2 (labeled as “intermediate”) mainly show silencing between 4 and 8 days of EB formation. Genes in cluster 3 show some initial silencing of the ** over the time course, and only show a mild bias for higher expression from the Xa at the latest time point of 8 days EB formation. However, most of the cluster 3 genes are fully silenced during stable XCI, including in NPCs (as discussed later; Fig. 5). Therefore, we labeled this cluster as “late”. The relatively small number of genes present in cluster 4 did not show any sign of silencing (labeled “not silenced”), and include many known escape genes such as **st, Kdm6a (Utx), Utp14a and Chm. Figure 4b shows three examples of genes present in the “early”, “late” and “not silenced” cluster, respectively. Genes within the “late” cluster were significantly higher expressed than genes in the other clusters (Additional file 1: Figure S7) [47], reinforcing the observation that highly expressed genes generally show slower silencing kinetics during XCI (Fig. 3c; Additional file 1: Figure S5).

Fig. 4
figure 4

A linear component in the propagation of silencing over chromosome X outwards from the XIC. a K-means clustering during XCI identifies four groups (present in the four rows) of genes with different inactivation kinetics on the **: early inactivated genes (top row), genes that show inactivation at intermediate time points (second row), late inactivated genes (third row) and genes that are not inactivated (bottom row). The first three columns show the inactivation dynamics within the four clusters over the time course as an average (left) of the individual genes within the clusters, as a lineplot (middle) or as a heatmap (right). b Examples of genes within the clusters as shown in (a). Total expression levels in black, the contribution from the 129-derived or Cast-derived alleles in green and blue, respectively. See Additional file 1: Figure S6 for the genome browser views of the genes. c Location of the genes within the clusters as obtained in (a) over the linear X chromosome. On the right, the first column shows the clusters and the number of genes within each cluster. The second column shows the average distance of the genes within a cluster to the XIC. The last column shows the p value calculated using the gene set enrichment analysis (GSEA) rank test corrected for multiple testing (using FDR (false discovery rate); *significant). The running sum statistics for each cluster for the GSEA is shown in Additional file 1: Figure S9. d Early silencing of genes on the ** plotting the **/Xa ratio per gene at day 2 after the onset of EB differentiation over the linear X chromosome. The trend line (polynomial order 3) of the **/Xa ratio is plotted in red

Fig. 5
figure 5

Allele-specific RNA-seq on three NPC lines identifies three distal regions of genes that escape XCI. a Ratio of **/Xa (y-axis; for each of the three NPC lines sorted from highest to lowest) for genes showing a log2 ratio of at least −5. We set the cutoff for escape on 10 % relative expression from the ** versus the Xa (log 2 ratio of > −3.32; similar to Yang et al. [37]). b **/Xa ratio of genes that escape XCI in all three NPC lines. c Distribution of the escape genes identified in *NPC_129-** over the four clusters as characterized in Fig. 4a. d Localization of the escape genes within each NPC line over the linear X chromosome (see also Table 1). The black dots on the fourth row represent all X-linked genes for which high-confidence allele-specific ratios were obtained in NPCs. e Validation of the escape genes within the three escape regions by Sanger sequencing of cDNA. See Additional file 1: Figure S13 for the full panel of 13 genes that we validated, and for further details

A comparison of the kinetic clusters with a previous study that used RNA FISH to determine X-linked silencing at the single gene level [37]. Only ** XCI in at least one NPC line. Only Shroom4 and Car5 are stably inactivated in the NPCs used for the current study, while they escape XCI in the Patski cells as reported by Yang et al. [37] (see Table 1 for detailed comparisons). Most genes that escape XCI in mouse brain tissue [48] also escape XCI in NPCs (Table 1). In line with their tissue-specificity, only one gene (Utp14a) of the 24 genes that specifically escape XCI in mouse spleen and/or ovary [48] escapes XCI in the NPCs. Furthermore, nearly all genes that escape in mouse trophoblast cells during imprinted XCI [49] (and for which there is sufficient allele-specific coverage in the NPCs profiled in the current study) escape XCI in at least one of the NPC lines (Table 1). However, we identify more escape genes compared with these previous studies (Table 1), as further discussed below.

Table 1 Genes esca** XCI in any of the three NPC lines in comparison with other studies

Comparison of the kinetic clusters (Fig. 4) with the 38 genes that escape XCI in *NPC_129-** XCI in the three NPC lines (Fig. 5a, Table 1), we plotted the genes esca** XCI over the linear X chromosome (Fig. 5d). This shows that all three NPCs share escape genes over most of the X chromosome, except for three distal regions (regions 1–3) that were also pronounced in cluster 4 within the previous analysis (Fig. 4c, “not silenced”). Within these regions *NPC_129-** XCI. The escape genes identified in the three NPC lines showed a large overlap with the escape genes as determined at the start of the culturing (Additional file 1: Figure S14a), including the escape genes present in the three escape regions (Additional file 1: Figure S14b). Notably, most genes showing differential escape before and after one month of NPC culturing are expressed from the ** XCI in the NPCs are stably maintained over time.

Regions of genes that escape XCI in the NPCs are associated with TADs

The clustering of genes that escape XCI, as observed in the NPCs, might suggest regulatory control at the level of chromatin domains in which epigenetic domains on the ** XCI colocalize with TADs as identified in ES_Tsix-stop ESCs. ac Overview of the TADs present at regions 1, 2 and 3 [indicated with a box in (a), (b) and (c), respectively] in the female ES_Tsix-stop ESCs. In red, the interaction matrix used for TAD calling with domains indicated by dashed lines. The second row shows the spearman correlation between the 40 kb-binned Hi-C interaction matrices of the female ES_Tsix-stop and male J1 ESCs [51] (see “Materials and methods” for further details). The legend for genes that escape XCI or genes that are silenced is indicated in (a). Coloring of genes indicates escape in one or two NPC lines, respectively, while genes in black are X-inactivated in all NPC lines. Wdr45 and Slc35a2 are included as escape gene for *NPC_129-** as the contribution in gene expression from the ** is >10 % (Table 1). Additional file 1: Figure S17 contains the same information as Fig. 6, but includes genes for which no allelic information was obtained (mainly due to low expression or the absence of polymorphic sites), as well as the interaction matrix in male J1 ESCs obtained from Dixon et al. [51] for comparison