Abstract
Background
X-chromosome inactivation (XCI) is the epigenetic inactivation of one of two X chromosomes in XX eutherian mammals. The inactive X chromosome is the result of multiple silencing pathways that act in concert to deposit chromatin changes, including DNA methylation and histone modifications. Yet over 15% of genes escape or variably escape from inactivation and continue to be expressed from the otherwise inactive X chromosome. To the extent that they have been studied, epigenetic marks correlate with this expression.
Results
Using publicly available data, we compared XCI status calls with DNA methylation, H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3 and H3K36me3. At genes subject to XCI we found heterochromatic marks enriched, and euchromatic marks depleted on the inactive X when compared to the active X. Genes esca** XCI were more similar between the active and inactive X. Using sample-specific XCI status calls, we found some marks differed significantly with variable XCI status, but which marks were significant was not consistent between genes. A model trained to predict XCI status from these epigenetic marks obtained over 75% accuracy for genes esca** and over 90% for genes subject to XCI. This model made novel XCI status calls for genes without allelic differences or CpG islands required for other methods. Examining these calls across a domain of variably esca** genes, we saw XCI status vary across individual genes rather than at the domain level. Lastly, we compared XCI status calls to genetic polymorphisms, finding multiple loci associated with XCI status changes at variably esca** genes, but none individually sufficient to induce an XCI status change.
Conclusion
The control of expression from the inactive X chromosome is multifaceted, but ultimately regulated at the individual gene level with detectable but limited impact of distant polymorphisms. On the inactive X, at silenced genes euchromatic marks are depleted while heterochromatic marks are enriched. Genes esca** inactivation show a less significant enrichment of heterochromatic marks and depletion of H3K27ac. Combining all examined marks improved XCI status prediction, particularly for genes without CpG islands or polymorphisms, as no single feature is a consistent feature of silenced or expressed genes.
Similar content being viewed by others
Introduction
In eutherian mammals, one of the two X chromosomes (X) is epigenetically inactivated in XX females in order to achieve dosage compensation with XY males through a process known as X-chromosome inactivation (XCI) (see Balaton, 2018 for a review [1]). This inactivation is incomplete, as approximately 12% of genes consistently escape from XCI in humans [2], here defined as having at least 10% expression from the inactive X (** from XCI [2]. The short arm of the X near PAR1 is enriched in genes esca** from XCI, while the long arm that contains XIST—the gene responsible for initiating XCI—is enriched in genes subject to XCI [3]. Genes esca** from XCI are often found clustered together, with some convergence with topologically associated domains (TADs) [9]. In addition to genes that consistently escape from XCI (sometimes called constitutive escape), a further 8% of genes have been found to vary their XCI status between different tissues or individuals (termed variable or facultative escape [2] (reviewed in [5]), and another 7% of genes were found to be discordant between the studies identifying them [2]. Variably esca** and discordant genes were found to be enriched at boundaries between clusters of genes with opposite XCI statuses [2]. The factors determining XCI status remain unresolved, with the above evidence suggesting regional control, but there are also lone genes that escape XCI while flanked with genes subject to XCI [2] and even genes with two transcription start sites (TSSs) with opposite XCI status [10, 11]. Furthermore, these solo escape genes are able to recapitulate escape when integrated elsewhere on the X [12, 13].
Many methods have been used to identify which genes escape from XCI (reviewed in [14]). The gold-standard approach is to compare expression levels between the ** from XCI, while inactive marks such as H3K9me3, H4K20me3, H3K27me3 and macroH2A are enriched at genes subject to XCI [14, 22, 23], reviewed in [14]. A predictive model using many epigenetic as well as genetic features in mice was able to predict a gene’s XCI status accurately 78% of the time [24] and in humans a model obtained over 80% accuracy using only genomic repeats [25]. These, and additional studies have found L1 repeats enriched near genes that are subject to XCI, while ALU elements are more frequent at genes esca** XCI [25,26,27,29]. Another study found many genes where ** Technologies (CEMT) as these samples were derived from cancer and thus were anticipated to have a high frequency of skewed XCI, allowing us to use allelic expression to determine XCI status in each sample [11]. As cancer is known to have epigenetic changes, we additionally examined data from Core Research for Evolutional Science and Technology (CREST), another group within IHEC, thus allowing us to determine whether any trends that we observed in the CEMT data were due to the samples being cancer-derived. However, the CREST samples had less sequencing depth, fewer females (only nine), and could only be examined for DNAme and histone marks. Samples are listed in Additional file 2: Table S1. In our analyses, genes in the PAR were not included with genes esca** from XCI as they may be epigenetically distinct, especially when comparisons with males are included.
Histone marks differ with sex and XCI status
We compared the levels of histone modifications with sex and published XCI status calls derived from a synthesis of various approaches (hereafter referred to as meta-status) [2]. We used levels within 500 bp upstream of a gene’s TSS (except for the mark H3K36me3 that is associated with gene bodies and so was examined at exons [32]), and H3K4me1 that is associated with enhancers and so was examined at annotated enhancer sites [33]. We found that most marks had a significant difference (p value < 0.01) for the median level per transcript between males and females, at genes esca** and subject to XCI in both datasets (Fig. 1a, Additional file 3: Table S2). Fewer marks showed significant differences between genes esca** XCI and those subject to XCI within each sex. The euchromatic marks (H3K4me3, H3K27ac, and H3K36me3) were significantly different between transcripts subject to XCI and those esca** from XCI in both CEMT and CREST females, while the heterochromatic marks (H3K9me3, and H3K27me3) were only significantly different within the CREST dataset. Comparing XCI statuses within males gave the fewest significantly different marks, as was expected. Overall, the X chromosome of males and females differs in both heterochromatic and euchromatic marks, and the observable differences between XCI status implicate inactivation-related differences in addition to copy number (XX or XY) differences.
The ** XCI, which partially explains the stronger p-values at transcripts subject to XCI. H3K27me3 has a higher ** XCI, and lower for transcripts subject to XCI. H3K36me3 is reduced on the ** from XCI the differences were more variable between the datasets.
H3K27me3 showed the largest change between the **, subject to XCI and variably esca** categories being significantly different between the sexes. We analyzed chromosome 7 as an example autosome and saw a much lower percentage of transcripts with significant male–female differences for H3K9me3 and H3K27me3 than for transcripts esca** from XCI, validating that transcripts that escape from XCI have a significant increase of heterochromatic marks in females relative to males. Metagene plots extending 50 kb up and downstream of genes esca** or subject to XCI, in females and males (Additional file 1:Figure S2) confirm the predominance of marks at the TSSs, with higher H3K4me3 and H3K27ac TSS peaks observed for genes esca** XCI in females. For the heterochromatic H3K9me3 and particularly H3K27me3, we observe both a reduced TSS peak and lower gene body levels for escape genes in females. For all marks, the standard deviation across genes with each XCI status was large, calling into question whether the differences could be predictive for individual genes, as has been found for DNAme (see Additional file 3: Table S2).
In addition to our promoter and gene-based analysis, we also compared histone marks at enhancers annotated to genes on the X [33] and found that all marks showed significant, although small, differences between males and females, for both XCI statuses (Fig. 1d, Additional file 3: Table S2 for values). We further considered whether the enhancer was found within the gene to ensure that differences were not arising simply due to expression of the gene altering chromatin; however, most marks remained significant regardless of location. Looking at the ** and genes subject to XCI. In CREST, the ** from XCI. Overall, it appears that enhancers gain heterochromatic marks on the ** from XCI, while five were previously designated esca** XCI and one subject to XCI.
Epigenetic marks do not change consistently with XCI status for variably esca** genes. a The number of genes with each XCI status call across all samples as assigned by ** genes that had significant differences in the histone mark between samples that were subject to or esca** from XCI. For each gene, on the left is a comparison of each epigenetic mark vs the ** genes.
Genes that variably escape from XCI provide a unique opportunity to study differences between genes esca** vs subject to XCI in the same genomic context. All of the marks available except for H3K4me1 were significantly different (p-value < 0.05) between samples esca** XCI vs those subject to XCI in at least one of the eight variably esca** genes, but never for the majority of genes (Fig. 2b, Additional file 1: Table S5). Consistent with the associations seen for genes subject to or esca** from XCI, when active marks were significantly different, they tended to be higher in samples esca** XCI, while inactive marks were lower in samples esca** XCI (Additional file 1: Table S6). The exception to this is H3K36me3 in gene bodies.
DNAme was the most consistent mark differentiating samples esca** from those subject to XCI, being seen significantly different in four out of the eight variably esca** genes. The samples subject to XCI in PRKX had significantly higher DNAme, but were not above the DNAme thresholds for XCI status calls that we established previously [11]. The other three genes with significant DNAme differences showed a clear switch from a DNAme pattern matching genes esca** XCI to a pattern matching genes subject to XCI. TIMP1, one of the four genes that was not significant, has low CpG density and high male DNAme so was not expected to differ with XCI status. For the other three genes, the limited informative samples reduced the power to detect differences, although they may have had incorrect XCI status calls or there may be more complicated epigenetic processes involved. Interestingly, the two genes found to be variably esca** by both ** genes did not show significant differences at any of the examined marks; increasing the sample size might give us the power to see more consistent differences across variably esca** genes as some of these genes only had 2 informative samples per XCI status. Two genes showed significant expression differences between samples that escaped XCI versus those subject to XCI (Additional file 1: Figure S4). In BCOR, samples esca** XCI had higher expression across all exons, while in EIF2S3 some exons were higher in samples subject to XCI while other exons were higher in samples esca** XCI. XCI status and expression per exon may be linked by different TSSs having different XCI status or possibly different tissues having different XCI status and dominant splicing variants. To test whether variable escape may be tissue-specific, XCI status per sample was compared with tissue of origin; only one of the eight genes showed tissue-specificity, EIF2S3. However, with only eight samples in three tissue types and being limited by heterozygous polymorphisms, there are likely other variable escape genes that were not identified here as many genes did not have the required number of informative samples.
Expanding sample-specific XCI status by using DNA methylation
To increase our sample size, we used promoter DNAme levels to determine XCI status across all genes within the larger 45 sample CEMT dataset, regardless of skewed XCI. Only TSSs with high CpG density and low male methylation were considered informative, and within this group we found 47 genes esca** XCI, 393 subject to XCI and 17 variably esca** across samples (Fig. 3a, Additional file 4: Table S4 for XCI status calls). Our DNAme-based calls had strong concordance with meta-status; there were no genes called as esca** XCI here that were previously called as subject to XCI, while only one of the genes called as subject to XCI here was previously called as esca** XCI. We included genes in the variably esca** from XCI category if at least one of their TSSs had 33% or more of its samples esca** XCI and another 33% or more samples subject to XCI. Additionally, one gene had opposing XCI statuses at separate TSSs and 36 had opposite XCI statuses across tissues (examples of genes with these variable escape scenarios are shown in Fig. 3b). An additional 67 genes were found variably esca** in at least one tissue, but were not identified as variably esca** from XCI in the larger dataset. Only BCOR was found variably esca** from XCI in the ** here. In addition 96% of genes esca** and 87% of genes subject to XCI identified by ** in only one of the datasets.
DNAme varies at genes variably esca** from XCI. a The number of genes with each XCI status call by DNAme, with their call by meta-status underneath. b From left to right: An example of a gene that variably escapes XCI across individuals (and within multiple tissues), a gene that variably escapes from XCI between tissues, and a gene that variably escapes from XCI between TSSs. c The percent DNAme per read for genes, binned together by their mean DNAme across the CpG island. Only reads overlap** the CpG island were included here. d The distribution of genes with each XCI status across the bins of mean DNAme per island. e Allelic DNAme, shown as the percent DNAme per read by allele. The mean DNAme across all reads per allele in each bin is shown underneath
Comparing epigenetic marks to DNAme-based XCI status calls, all marks (H3K4me3, H3K9me3, H3K27me3 and H3K27ac) except H3K4me1 and H3K36me3 were significantly different between genes with opposite XCI status calls, with increased prominence of H3K9me3 (Additional file 1: Table S7). We again compared epigenetic marks at variably esca** genes to see if they differed between samples in which the gene escaped XCI vs those in which it was subject to XCI. We categorized variable escape genes as those variably esca** across the dataset, across TSSs, across tissues or within specific tissues. For variable escape from XCI between individuals across the dataset, every mark examined was found to be significant (adjusted p value < 0.01) in at least one gene; however across all categories of variable escape from XCI, only expression and H3K4me3 were significant in more than 25% of genes in any type of variable escape category (Table 1). The direction of histone mark changes was less consistent than for ** XCI and higher inactive marks in genes subject to XCI, but with many genes showing the opposite results (Additional file 1: Figure S5).
We have previously seen that the average DNAme at genes subject to XCI was 38%, less than expected if the ** genes were found distributed in the range where genes esca** and subject to XCI were found; however, genes with intermediate 20–30% DNAme had more variably esca** genes than genes with a consistent XCI status.
While the bimodal appearance of the DNAme reads reflects that the ** heterozygous SNPs within 2 kb of TSSs. In addition to the usual limitations of map** allelic reads, we had to exclude C < > T and G < > A polymorphisms as the bisulfite conversion step in WGBS converts unmethylated C to T and on the opposite strand this appears as a G to A conversion. Separating genes into the same 10% bins of mean DNAme as earlier (Fig. 3e), we see that the intermediately methylated reads tend to be on the hypermethylated allele (the presumed ** from XCI and those having one allele below 25% and one above 75% being called as subject to XCI. These calls for SNPs within CpG islands had good agreement with previous calls with all 28 of the loci called esca** and 50/51 of the loci called subject to XCI being concordant. To explain the prevalence of intermediately methylated reads, we examined the DNAme per CpG across some of these islands where we observed that the DNAme level was not consistent (Additional file 1: Figure S6 for browser tracks across islands, Additional file 1: Figure S7 for DNAme differences between adjacent CpGs). We observe an average difference between adjacent CpG sites of 24% in cancer and 13% in healthy samples, which is likely a major contributor to the intermediately methylated WGBS reads and CpG island DNAme averages we observe for the ** XCI, using the remainder to test accuracy, and used twice as many genes subject to XCI for training. Using this predictor, we could predict escape from XCI with accuracies ranging from 42% with H3K9me3 to 69% with H3K4me3 and for genes subject to XCI with accuracies ranging from 85% with genebody H3K36me3 to 99% with H3K27ac. In contrast, a similar model using CpG island DNAme data obtained a much better accuracy of 87% for predicting genes as esca** XCI and 99% for predicting genes as subject to XCI, showing the higher predictive ability of DNAme.
XCI status predictions with an epigenetic model expands the number of genes examinable. a ROC curves for each random forest predictor trained using single marks, along with the combined predictor using all of the epigenetic marks. An example sample, CEMT28 is shown. See Additional file 1: Figure S8 for all samples. b Accuracy of our epigenetic predictor using DNAme and all six histone marks. Each point is one of the 20 models per sample. This accuracy is tested on genes outside of the training set. c The number of genes with each XCI status as predicted by our model, with their call by meta-status underneath. d, e As (c), but further split by the presence of a CpG island (d) or by an expression threshold of 0.1 RPKM (e). f The predictive ability of each mark. Each mark was ranked per model on how important it was to the model, with the most important mark being ranked 14 and the least important being ranked first. We used the marks within each female sample paired with the mean mark in similar male samples for the predictor, so both the female and male marks are featured here
To get XCI status calls from histone mark data with an improved accuracy, we combined data from all of the histone marks and DNAme data from CEMT and trained a new random forest model [35]. This combined epigenetic XCI predictor was trained using XCI meta-status and was able to accurately predict genes esca** vs subject to XCI, with a median accuracy for genes outside the training set of 75% for genes esca** from XCI and 90% for genes subject to XCI (Fig. 4b). We trained the model 20 separate times per sample and were confident in a prediction if 75% + of the models agreed. A separate epigenetic XCI predictor was trained and used within each sample, however the models are capable of being used across samples within the same tissue with reduced accuracy and even across tissues (Additional file 1: Figure S9 for a summary of accuracies). Models in some tissues tended to overcall genes as subject to XCI while others overcalled genes as esca** from XCI, however the number of escape genes called per sample had no correlation with XIST expression (Additional file 1: Figure S10). Across all samples, the model called 46 genes as esca** XCI, 780 genes as subject to XCI and seven genes as variably esca** from XCI (Fig. 4c, Additional file 4: Table S4 for XCI status calls). While none of the genes predicted to escape XCI here have a meta-status of subject to XCI, 11 of the genes predicted to be subject to XCI have a meta-status of esca** XCI and an additional six genes are located in the PAR1 and are expected to escape XCI [2]. Comparing these predictions to our ** XCI by ** by ** XCI across samples, we predicted 48 genes having tissue-specific escape from XCI, and one gene with separate TSSs with opposite XCI status. To investigate which marks are driving this variability in XCI status predictions we compared our epigenetic marks across samples, tissues and TSSs with opposite XCI status predictions (Additional file 1: Table S12). At genes predicted to variably escape across samples we found that very few marks had significant (t-test, adjusted p value < 0.01) differences between samples found esca** and those subject to XCI. DNAme was the exception to this with four of seven genes having significant DNAme differences. For the genes found variably esca** across tissues, all of the marks had multiple genes significantly different between tissues subject to XCI vs tissues esca** from XCI, but many of the genes that didn’t variably escape also had significant differences across tissues. Tissue-specific variable escape genes had significant enrichment (Chi-square test, adjusted p value < 0.01) for genes with tissue-specific H3K27me3, H3K4me3, DNAme and expression over genes that did not variably escape from XCI. There was only one gene found to variably escape between TSSs so no statistical tests were possible, however there were differences between TSSs for H3K27ac, H3K4me1 and DNAme for the different exons used.
Our initial thresholds to call variable escape across samples were arbitrary, so we varied the percentage of samples with each XCI status required to classify a gene as variably esca** from XCI in order to determine the effects of different variable escape thresholds. At our threshold requiring 33% of samples to have each XCI status in order to be called as variably esca** from XCI, we found 7 of 1155 genes to be variably esca**. Lowering this threshold to 25% found 35 variably esca** genes, at 10% we found 304 genes and at 5% we found 476 genes. This shows that there is no natural threshold at which genes become variable in their expression from the ** decreased, the percentage of these genes with significant DNAme differences between samples with opposite XCI statuses decreased down to 20% and the percentage of genes with H3K27me3 differences rose to 27% (Additional file 1: Table S13); however, we must also consider that the cancer origin of these samples may contribute to rare epigenetic misregulation.
To validate our conclusions from this model on healthy samples, we trained our overall epigenetic predictor on the CREST dataset. The CREST dataset contains nine samples for which we were able to obtain all of the required epigenetic data for our predictor. We predicted 88 genes esca** from XCI, 802 subject to XCI, 40 variably esca** across samples, ten across tissues and six across TSSs. These calls are similar to those in the CEMT data, with 95% of genes with calls from both datasets agreeing (Additional file 1: Table S14). The genes variably esca** from XCI in the CEMT dataset tended to be esca** XCI in CREST while genes variably esca** in CREST tended to be subject to XCI in the CEMT dataset. The number of genes variably esca** from XCI is increased in CREST, possibly due to how few samples were required for variable escape (three with each XCI status) decreasing stringency. Another possibility is that having random ** across individuals in CREST had significant differences between samples subject to XCI and those esca** from XCI (Additional file 1: Table S15). CREST tissue-specific genes had significant differences in H3K27me3, DNAme and expression between tissues, all three of which were also significant in CEMT samples. CREST had enough genes variably escape across TSSs to see that H3K4me3, H3K27me3 and DNAme were significantly different between TSSs esca** and TSSs subject to XCI in females. Males had significant differences in H3K4me3, H3K27ac, H3K27me3, H3K36me3 and DNAme between TSSs esca** vs subject to XCI in females, which suggests that these TSS also differ significantly on the Xa. These TSSs may be predisposed to have different XCI statuses based on their epigenetic landscape prior to XCI or the Xa differences may be misleading the predictor causing it to predict different XCI statuses. The results between our cancer and healthy samples are similar overall, with results from both datasets finding few genes with significant epigenetic differences between genes variably esca** across individuals, and finding H3K27me3, DNAme and expression differences more commonly different between tissues at genes with a tissue-specific XCI status than at other genes.
Independent regulation of variable escape across a region
As an application of our epigenetic XCI predictor and to understand the scale at which variably esca** genes are regulated, we examined XCI status calls per sample across a region that is enriched in genes variably esca** from XCI according to their meta-status (Fig. 5a) We found that many of the genes in this region that are annotated as variably esca** from XCI had low levels of variable escape with few samples differing from the most common XCI status. The genes that vary in XCI status across samples change their XCI status independent of the XCI status of neighboring genes, suggesting that regulation of variably esca** genes happens at the single gene level and not at the domain level. Additionally, we saw genes that had multiple TSSs with different XCI statuses and genes that are bidirectional from the same promoter with opposite XCI status showing that the scale of regulation could be narrowed even further. All of the genes in this region that showed variable escape here, except for IRAK1, had significant differences for some combination of marks including H3K9me3, H3K27me3 and DNAme between samples esca** vs subject to XCI (p value < 0.05, Fig. 5b, Additional file 1: Figure S11 for which marks were significant per TSS). Euchromatic marks were less frequently seen to be significantly different.
XCI status calls are independent between neighboring variably esca** genes. a A map of a variably esca** region, with genes colored by their XCI status as predicted per sample, by our random forest model using all epigenetic marks available. The samples were clustered based on their XCI status calls within the region. Arrows indicate where each TSS is located, and they point in the direction of transcription. Genes which are colored as variably esca** here are variably esca** between transcripts and TSSs within a sample. b Metagene plots for the epigenetic marks that were most commonly significantly different between samples subject to XCI vs those esca** from XCI at the above variably esca** genes. Genes were chosen to show every combination of which mark is significant per gene, that we saw in this region. Marks that were significant at a gene are marked with a star
Genetic contribution to variable escape from XCI
To identify any genetic differences at variably esca** genes between samples that escape and those subject to XCI, we obtained existing exome-seq, RNA-seq, Illumina Infinium Human Methylation450 BeadChip array (450 k array) and Affymetrix Genome-Wide Human SNP Array 6.0 (SNP6) data for 5817 samples from cancers where clonality should lead to skewed ** XCI and 377 genes subject to XCI by ** XCI and 397 genes subject to XCI (Fig. 6a). Of the 25 genes called as esca** from XCI by DNAme that were informative by ** by ** from XCI and 20 variably esca** from XCI by ** genes. a The number of genes with each XCI status call in the TCGA dataset made using ** XCI by DNAme became variably esca** genes when the threshold for variable escape was lowered to 100 or more samples with each XCI status. b–i The percent of samples with each allele that were found with each XCI status at the most significant loci for our association analyses. The chromosomal location below the gene name is for the locus associated with the XCI status of the gene and is the location in hg38. The top row of graphs are the most significant loci associated with ** from XCI, we found 45 genes variably esca** by ** genes by DNAme. For our genetic tests, we decided to use a less stringent measure of variable escape for our DNAme calls, as there are so many informative samples here we called any gene with over 100 samples with each XCI status instead of the usual 33% of all samples. This gave us 126 variably esca** genes to test. Of these new variably esca** genes, 26 were previously called as esca** XCI, 59 were called as subject to XCI and 36 did not meet the thresholds for either call previously as too many of the samples were outside of the thresholds to be called as esca** or subject to XCI.
We tested association between XCI status of these variably esca** genes and all SNPs on the SNP6 array and did not find any loci significantly associated with our ** genes but with our DNAme-based XCI status calls we found 610 significant combinations of gene and genetic locus across all chromosomes (Additional file 5: Table S16). Only seven of these were X-linked with the closest being 9 Mb away from the affected gene. There were significant loci for 75 of the genes found to variably escape by DNAme, and most of these genes had multiple significant loci, with a maximum of 26 significant loci for SLC16A2 (Additional file 6: Table S17). Many of the loci were also significantly associated with XCI status for multiple genes, with only 372 unique loci appearing in the 610 significant gene:locus associations. The most genes affected per locus was 18 for chr4:130533697 (Additional file 6: Table S17). However, none of these significant polymorphisms showed 100% correlation with XCI status calls and so they are not the causative or sole-causative polymorphism responsible for the change in XCI status, but may be part of a complex mechanism or be in incomplete linkage disequilibrium with a causative polymorphism (Fig. 6b-i). We examined attributable risk per significant locus and found that the allele with the highest contribution to XCI status had an attributable risk of 28%, but 90% of the loci had attributable risk under 10% (Additional file 5: Table S16). This suggests to us that there are alleles which allow for a change in XCI status and give an increased chance of changing the XCI status, but are not sufficient for the change by themselves.
To test the strength of the effect of SNPs that were significantly associated with XCI status as determined by DNAme, we compared the genotype of samples to their DNAme to find significant DNAme-quantitative trait loci (DNAme-QTL). Testing our 610 significantly associated gene:locus combinations with DNAme-based XCI status calls, we found 38 loci were also significant DNAmeQTLs (Fig. 6j-k, Additional file 7: Table S18). We also tested these DNAmeQTLs in males and all 38 loci were found to only be significant in females. Three of these significant DNAmeQTLs (for the genes EIF2S3, PNPLA4 and NLGN4X) had their median DNAme with one allele in the range to be called as esca** from XCI, with the median DNAme of their other allele in the range to be called as subject to XCI, while the others did not (Fig. 6l). Overall, it appears that there are multiple X-linked and autosomal loci contributing to the variability observed in escape from XCI; however, these are not major contributors and the effect of a single DNAmeQTL is not sufficient for a change in XCI status.
Discussion
XCI is a classic paradigm for studying epigenetic regulation, yet how some genes are resistant to silencing (or the maintenance of silencing) and escape XCI remains unresolved. Here, we have examined the genetic and epigenetic differences between genes esca** and those subject to XCI. Overall, epigenetic marks were more different between males and females than between genes esca** vs subject to XCI, suggesting an influence of the ** XCI have similar epigenetic marks between the ** XCI may be why escape genes can have as low as 10% expression from the ** XCI could also contribute to lower expression from the ** from those subject to XCI, while the heterochromatic mark H3K27me3 had the largest **, variably esca** or subject to XCI across our DNAme analyses as our previous ** and variably esca** from XCI. A large proportion of the additional genes found subject to XCI by our epigenetic predictor may in fact be silenced on both the Xa and ** from XCI and the number of epigenetic marks that were significant in at least one gene, but decreased the percentage of genes significant for DNAme that was the only mark ever significant for over 50% of genes in a dataset.
We observed that variable escape from XCI was regulated at the level of single genes, with adjacent genes varying their XCI status independently. In contrast, a study in mice found clusters of genes that variably escape across their three cell lines, with adjacent genes often having the same XCI status across lines [9]. They also found that these clusters colocalize with TADs, with one line having the majority of a TAD esca** XCI and another line having only part of it esca**. An interesting candidate regulator of regional control is SMCHD1. In mice with SMCHD1 knocked-out, regions enriched with variably esca** genes were upregulated, while genes that constitutively escaped from XCI were not affected; however, no impact was seen on variable escape genes in human patients with heterozygous SMCHD1 mutations [36]. Nonetheless, another study found variants with low expression of SMCHD1, ZSCAN9 and HBG2/TRIM6 associated with hypomethylation of X-linked CpG islands, with affected islands enriched near genes that variably escape from XCI [29]. Additionally there are individual genes which are susceptible to reactivation under certain conditions, such as how some genes are reliant on XIST expression and H3K27me3 deacetylation to remain silent, while others continue to be silenced when XIST expression is disrupted [47]. This also supports how our variably esca** genes did not have consistent epigenetic differences between samples which escaped XCI and those which were subject to XCI. Overall, there is evidence for both domain-level and gene-specific regulation of escape. We suggest that for some domains the former predominates, while for other genes the latter predominates. Additionally, the domain featured in Fig. 4 (and other variably esca** domains) is at a threshold where individual genes within the domain can have either XCI status based on local factors.
We thus asked whether variable escape from XCI could be controlled by local sequence variants. Here, we found an association between numerous genetic variants and sample-specific XCI status at variably esca** genes. However, we did not find any local genetic effect, as none of the loci were within 5 Mb of the affected genes and only 10 of 610 significant loci were located on the X. None of the SNPs we identified were completely correlated with a gene’s XCI status, so other factors must be involved. Additionally, all of the significant loci we found were based on DNAme for XCI status calls. These loci could have been affecting just DNAme instead of XCI status, however 38 out of 610 significant loci were female-specific DNAmeQTL while only one loci was a significant DNAmeQTL in males. With more samples with skewed XCI, we may have found loci associated with our ** from XCI in the CEMT cancer dataset as in the healthy CREST dataset. Nonetheless, we used the CEMT dataset because it had a standardized set of epigenetic marks across many samples and the clonality of cancer allowed us to examine expression and DNAme allelically. We found that other datasets, did not always have all the marks from the same samples, were lacking females or sex labels or had mislabeled sex.
The use of different methods and sample sizes to call XCI status can result in discordant calls generally due to one approach calling a gene as variably esca** while other studies do not. A previous meta-analysis saw 7% of genes having discordant calls between studies [2]. Many of these discordancies between studies may be due to different samples and tissues used, but here we see differences in XCI status called using different approaches with the same samples. Genes could be falsely called as subject to XCI in the ** XCI tended to have equal levels of marks on the Xa and ** vs subject to XCI at variably esca** genes, but which marks were significant was not consistent between genes and no mark was significant across all of the variably esca** genes, likely reflecting that variably esca** genes having multiple ways in which they are regulated. DNAme intermediate to what is expected for genes esca** vs subject to XCI is enriched at variably esca** genes and is mostly due to inconsistent DNAme on the ** genes were seen to regulate their XCI status independently from each other, suggesting local regulatory elements. Additionally, we searched for polymorphisms which may control variable escape from XCI and found non-syntenic loci, some with a strong correlation, but none were completely correlated further suggesting complex regulation. Overall, we see that escape from XCI is influenced by both local regulatory elements as well as trans-acting factors and chromatin modifications that can be independent of each other. Understanding how genes escape from XCI will further our understanding of epigenetics in general and may allow us to control which genes are esca** from XCI and rescue X-linked mutations in females.
Methods
Previous XCI status calls
We used XCI meta-status calls from [2] for all comparisons with past XCI statuses and to train our models. Genes that escape and mostly escaped were combined together due to the small size of these categories, with genes in the PAR1 being left out or having their own separate category depending on the analysis. Genes that were mostly subject to XCI were combined with genes subject to XCI for comparisons between studies, but were left out when training models. Genes that were annotated as variably esca**, mostly variably esca** and discordant across studies were combined together as variably esca** genes for comparisons here.
Histone ChIP-seq analysis
Histone ChIP-seq bigwig files were downloaded from the IHEC data portal [38] and their mean signal quantified with bigWigAverageOverBed [39] for a region 500 bp upstream of TSSs as annotated by Gencode [40]. We normalized the data across samples by multiplying samples to have the same total depth (including all chromosomes). The ** genes, requiring at least two samples with each XCI status. This narrows the number of variably esca** genes and increases the chance that those found would have enough samples to reach significance. The overall expression level of genes was calculated using bigwig files downloaded from the CEEHRC data portal [42] and quantified as RPKM using VisRseq [43].
DNAme analysis
WGBS bigwig files were downloaded from the IHEC data portal [38] and quantified with bigWigAverageOverBed [39] for a region 500 bp upstream of TSSs as annotated by Gencode [40]. DNAme thresholds established in [11] were used to determine which genes were esca** XCI and which were subject to XCI. These thresholds are: DNAme < 10% escapes XCI, 15% < DNAme < 60% subject to XCI, and DNAme > 60% hypermethylated. A threshold of DNAme < 15% in males was used to filter out TSSs that were methylated on the Xa and therefore not informative for this analysis. To see the differences between adjacent CpGs, we converted bigWig files to bedGraphs and for each island we used R to find the mean absolute value difference between each adjacent CpG.
DNAme per read was calculated by downloading WGBS bam files and using a script to count the number of unmethylated and methylated CG dinucleotides per read within CpG islands within 2 kb of TSSs. For allelic DNAme, we did similar but only examined reads that overlapped heterozygous SNPs identified in our ** from XCI and both alleles higher than 0.75 being called as hypermethylated. Polymorphisms with one allele above 0.75 and the other allele below 0.25 were called as subject to XCI. The DNAme per read per polymorphism was binned as above, but instead of using the mean DNAme across all reads, we determined the mean DNAme per allele and used the mean of that; this was done so that we get the mean between the ** XCI were often within two standard deviations of each, the average of these two means was often used as a threshold instead.
For our random forest models, we wanted to include both male and female data, and breast did not have any male data so we used the kmeans function in R to cluster all of our samples based on autosomal levels of all seven epigenetic marks used herein. With three clusters we had multiple male and female samples in each cluster. As input for our models, we used individual female data per sample and matched it with the mean values per gene across males in the same cluster.
Random forest models were trained using the R package caret [35] with the trainControl method cv and the train method rf. We trained the model on genes known to escape or be subject to XCI [2]. The training metric was ROC, tunelength was 5 and ntree was 1500. Three genes esca** and subject to XCI were left out of the training set and used to check accuracy of overall calls. We trained twenty models per sample, with each model being trained on a random sample of 75% of the genes esca** XCI and twice as many genes subject to XCI, with each iteration of the model using 75% of the number of input esca** genes. Accuracy per model was tested on the remaining genes with known XCI status. Genes were considered as esca** or subject to XCI if 15 + of 20 models predicted them as esca** or subject to XCI, respectively. Separate categories were made for genes where only 12–14 of the models agreed on the gene’s XCI status, being annotated as leaning subject or leaning escape. Overall calls were made across samples with genes with 66% or more of samples agreeing on a gene’s XCI status being called as subject to or esca** from XCI, genes with at least 33% or more of all samples having each XCI status being called as variably esca** from XCI, and genes that required the leaning categories to reach 66% of samples having a status being annotated with a similar leaning status.
Statistical comparisons
All statistical comparisons were done in R [44]. The majority were t-tests with a Benjamini–Hochberg (BH) multiple testing correction [45] with results deemed significant if they had an adjusted p value < 0.01. The one test with a different threshold was for comparing genes variably esca** XCI as determined by ** from XCI, assuming the reason for this is that they did not have skewed XCI.
For the TCGA DNAme-based XCI status calls we downloaded methylation beta-values from the genomic data commons data portal for females and males from the TCGA dataset. Probes were removed if the average male DNAme was over 15% and female samples were removed if their average DNAme was two standard deviations below the female average, as we presume that they were mislabeled males or had lost their ** gene’s XCI status (Chi-square test, significant if BH adjusted p value < 0.01). We tested vs all SNPs on the array, and again with just the SNPs on the X. For samples which had multiple SNP array datasets, we used a consensus allele across all of the arrays. We did not include heterozygous samples as we were testing for a cis-effect and had no way of knowing which allele was on the **. DNAmeQTLs were examined by using the lm function in R to make a linear model for every combination of SNP and CpG island.
Availability of data and material
See references for data sources. All are publicly available.
Abbreviations
- 450k array:
-
Illumina Infinium Human Methylation450 BeadChip array
- BH:
-
Benjamini–Hochberg
- CEMT:
-
Center for Epigenome Map** Technologies
- ChIP-seq:
-
Chromatin immunoprecipitation sequencing
- CREST:
-
Core Research for Evolutional Science and Technology
- DNAme:
-
DNA methylation
- DNAmeQTL:
-
DNA methylation quantitative trait loci
- IHEC:
-
International Human Epigenome Consortium
- meta-status:
-
XCI status calls from Balaton et al., 2015
- PAR:
-
Pseudo-autosomal region
- RNA-seq:
-
RNA sequencing
- SNP6:
-
Affymetrix Genome-Wide Human SNP Array 6.0
- TAD:
-
Topologically associating domain
- TCGA:
-
The Cancer genome Atlas
- TSS:
-
Transcription start site
- WGBS:
-
Whole genome bisulfite sequencing
- X:
-
X chromosome
- Xa:
-
Active X
- XCI:
-
X-chromosome inactivation
- **:
-
Inactive X
References
Balaton BP, Dixon-McDougall T, Peeters SB, Brown CJ. The eXceptional nature of the X chromosome. Hum Mol Genet. 2018;27:R242-49.
Balaton BP, Cotton AM, Brown CJ. Derivation of consensus inactivation status for X-linked genes from genome-wide studies. Biol Sex Differ. 2015;6:35.
Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–4.
Dunford A, Weinstock DM, Savova V, Schumacher SE, Cleary JP, Yoda A, et al. Tumor-suppressor genes that escape from X-inactivation contribute to cancer sex bias. Nat Genet. 2017;49:10–6.
Navarro-Cobos MJ, Balaton BP, Brown CJ. Genes that escape from X-chromosome inactivation: potential contributors to Klinefelter syndrome. Am J Med Genet C Semin Med Genet. 2020;184:226–38.
Tukiainen T, Villani A-C, Yen A, Rivas MA, Marshall JL, Satija R, et al. Landscape of X chromosome inactivation across human tissues. Nature. 2017;550:244–8.
Godfrey AK, Naqvi S, Chmátal L, Chick JM, Mitchell RN, Gygi SP, et al. Quantitative analysis of Y-Chromosome gene expression across 36 human tissues. Genome Res. 2020;30:860–73.
Helena Mangs A, Morris BJ. The human pseudoautosomal region (PAR): origin, function and future. Curr Genomics. 2007;8:129–36.
Marks H, Kerstens HHD, Barakat TS, Splinter E, Dirks RAM, van Mierlo G, et al. Dynamics of gene silencing during X inactivation using allele-specific RNA-seq. Genome Biol. 2015;16:149.
Goto Y, Kimura H. Inactive X chromosome-specific histone H3 modifications and CpG hypomethylation flank a chromatin boundary between an X-inactivated and an escape gene. Nucleic Acids Res. 2009;37:7416–28.
Balaton BP, Fornes O, Wasserman WW, Brown CJ. Cross-species examination of X-chromosome inactivation highlights domains of escape from silencing. Genetics. 2020. https://doi.org/10.1186/s13072-021-00386-8.
Horvath LM, Li N, Carrel L. Deletion of an X-inactivation boundary disrupts adjacent gene silencing. PLoS Genet. 2013;9:e1003952.
Peeters SB, Korecki AJ, Simpson EM, Brown CJ. Human cis-acting elements regulating escape from X-chromosome inactivation function in mouse. Hum Mol Genet. 2018;27:1252–62.
Balaton BP, Brown CJ. Escape Artists of the X Chromosome. Trends Genet. 2016;32:348–59.
Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, et al. Escape from X inactivation varies in mouse tissues. PLoS Genet. 2015;11:e1005079.
Vacca M, Della Ragione F, Scalabrì F, D’Esposito M. X inactivation and reactivation in X-linked diseases. Semin Cell Dev Biol. 2016;56:78–87.
Mengel-From J, Lindahl-Jacobsen R, Nygaard M, Soerensen M, Ørstavik KH, Hertz JM, et al. Skewness of X-chromosome inactivation increases with age and varies across birth cohorts in elderly Danish women. Sci Rep. 2021;11:4326.
Larson NB, Fogarty ZC, Larson MC, Kalli KR, Lawrenson K, Gayther S, et al. An integrative approach to assess X-chromosome inactivation using allele-specific expression with applications to epithelial ovarian cancer. Genet Epidemiol. 2017;41:898–914.
de MoreiraMello JC, Fernandes GR, Vibranovski MD, Pereira LV. Early X chromosome inactivation during human preimplantation development revealed by single-cell RNA-sequencing. Sci Rep. 2017;7:10794.
Hagen SH, Henseling F, Hennesen J, Savel H, Delahaye S, Richert L, et al. Heterogeneous escape from X Chromosome inactivation results in sex differences in type I IFN responses at the single human pDC level. Cell Rep. 2020;33:108485.
Cotton AM, Price EM, Jones MJ, Balaton BP, Kobor MS, Brown CJ. Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Hum Mol Genet. 2015;24:1528–39.
Qu K, Zaba LC, Giresi PG, Li R, Longmire M, Kim YH, et al. Individuality and variation of personal regulomes in primary human T cells. Cell Syst. 2015;1:51–61.
Kucera KS, Reddy TE, Pauli F, Gertz J, Logan JE, Myers RM, et al. Allele-specific distribution of RNA polymerase II on female X chromosomes. Hum Mol Genet. 2011;20:3964–73.
de BarrosAndrade E, Sousa L, Jonkers I, Syx L, Dunkel I, Chaumeil J, Picard C, et al. Kinetics of -induced gene silencing can be predicted from combinations of epigenetic and genomic features. Genome Res. 2019;29:1087–99.
Wang Z, Willard HF, Mukherjee S, Furey TS. Evidence of influence of genomic DNA sequence on human X chromosome inactivation. PLoS Comput Biol. 2006;2:e113.
Cotton AM, Chen C-Y, Lam LL, Wasserman WW, Kobor MS, Brown CJ. Spread of X-chromosome inactivation into autosomal sequences: role for DNA elements, chromatin features and chromosomal domains. Hum Mol Genet. 2014;23:1211–23.
Bailey JA, Carrel L, Chakravarti A, Eichler EE. Molecular evidence for a relationship between LINE-1 elements and X chromosome inactivation: the Lyon repeat hypothesis. Proc Natl Acad Sci USA. 2000;97:6634–9.
Loda A, Brandsma JH, Vassilev I, Servant N, Loos F, Amirnasr A, et al. Genetic and epigenetic features direct differential efficiency of **st-mediated silencing at X-chromosomal and autosomal locations. Nat Commun. 2017;8:690.
Luijk R, Wu H, Ward-Caviness CK, Hannon E, Carnero-Montoro E, Min JL, et al. Autosomal genetic variation is associated with DNA methylation in regions variably esca** X-chromosome inactivation. Nat Commun. 2018;9:3738.
Chen B, Craiu RV, Sun L. Bayesian model averaging for the X-chromosome inactivation dilemma in genetic association study. Biostatistics. 2020;21:319–35.
Xu W, Hao M. A unified partial likelihood approach for X-chromosome association on time-to-event outcomes. Genet Epidemiol. 2018;42:80–94.
Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37.
Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database. 2017. https://doi.org/10.1093/database/bax028.
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1.
Max Kuhn. Caret: classification and regression training. R package version 6.0-86. 2020. https://CRAN.R-project.org/package=caret/. Accessed 25 Jan 2020.
Wang C-Y, Brand H, Shaw ND, Talkowski ME, Lee JT. Role of the chromosome architectural factor SMCHD1 in X-Chromosome inactivation, gene regulation, and disease in humans. Genetics. 2019;213:685–703.
Dawson MA, Kouzarides T. Cancer epigenetics: from mechanism to therapy. Cell. 2012;150:12–27.
Bujold D, de Morais DA, Gauthier C, Côté C, Caron M, Kwan T, et al. The international human epigenome consortium data portal. Cell Syst. 2016;3:496-9.e2.
Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26:2204–7.
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–74.
Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160-5.
Canadian Epigenomes. (2020). http://www.epigenomes.ca/data-release/hg38/. Accessed 14 Aug 2020.
Younesy H, Möller T, Lorincz MC, Karimi MM, Jones SJM. VisRseq: R-based visual framework for analysis of sequencing data. BMC Bioinformatics. 2015;16(Suppl 11):S2.
R Core Team. R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. 2020. https://www.R-project.org/. Accessed 25 Jan 2020.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B. 1995;57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Website. Genomic Data Commons. 2020. https://portal.gdc.cancer.gov/. Accessed 14 Aug 2020.
Yu B, Qi Y, Li R, Shi Q, Satpathy AT, Chang HY. B cell-based XIST complex enforces X-inactivation and restrains atypical B cells. Cell. 2021;184:7.
Acknowledgements
We thank the other members of the Brown lab for helpful comments during the development of this project. Most of the analyses conducted here used data generated by The Canadian Epigenetics, Epigenomics, Environment and Health Research Consortium (CEEHRC) initiative funded by the Canadian Institutes of Health Research (CIHR), Genome BC, and Genome Quebec. Information about CEEHRC and the participating investigators and institutions can be found at http://www.cihr-irsc.gc.ca/e/43734.html. Our genetic association studies used data generated by the TCGA Research network: https://www.cancer.gov/tcga. We would also like to thank the research groups which generated the other sources of data used in this analysis.
Funding
BPB was supported by a CGS-D award from NSERC. Research was supported by CIHR project grant (PJT-16120).
Author information
Authors and Affiliations
Contributions
BPB conducted all analyses. All authors contributed to the interpretation of data and writing of the manuscript. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Much of the expression and WGBS data analyzed were generated by The Canadian Epigenetics, Epigenomics, Environment and Health Research Consortium (CEEHRC) initiative funded by the Canadian Institutes of Health Research (CIHR), Genome BC, and Genome Quebec. Ethics approval for data access was provided by the University of British Columbia Clinical Research Ethics Board (H17-01363).
The data for our genetic analysis of XCI were generated by The Cancer Genome Atlas (TCGA) Research Group. Ethics approval for data access was provided by the University of British Columbia Clinical Research Ethics board (H19-02018).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file1:
Table S1. List of samples used. See additional files. For CEMT samples, tissue was annotated to combine samples from related areas. Columns D through L refer to the availability of the dataset for each sample. Patient health status and sample disease are the annotations done by CEMT. CREST samples were only used for the epigenetic predictor and only samples with all datasets available were included here. Table S2. Comparison of histone marks between sex and XCI status. See additional files. The first sheet shows BH adjusted p-values comparing female vs male and escape genes vs those subject to XCI per mark in CEMT with our meta-status and ** XCI by ** or subject to XCI for each gene. Table S6. The differences in epigenetic marks between samples with opposite XCI statuses at genes found variably esca** XCI by ** XCI. Those found significant in Table S5 are bolded. Genes with multiple transcripts are included multiple times, even if they share a TSS. Table S7. Adjusted p-values comparing marks in females between genes found subject to XCI vs esca** XCI by DNAme. Those in bold are significant (adjusted p-value<0.01). Table S8. Distribution summary for DNAme per read. The number is what proportion of reads in each bin were below 25%, between 33 and 66% or over 75% DNAme. Table S9. The accuracy of simple models predicting XCI status from a single histone mark. These accuracies are low because the models overpredicted variable escape from XCI as there is large overlap between the two XCI statuses. Table S10. The accuracy of random forest models predicting XCI status from a single histone mark. This is the combined accuracy using the consensus of 20 models trained with each mark. Table S11. XCI status calls made using a random forest epigenetic predictor, split by presence or absence of a CpG island and expression. The threshold used to split low from high expression is a median of 0.1 RPKM across samples. Inconsistent predictions had over a third of samples with fewer than 15 of the 20 models trained agree on an XCI status. Table S12. The percent of genes found variably esca** by our epigenetic predictor with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as esca** from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably esca** genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X with differences between tissues/TSSs. Highlighted in blue are marks that were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chi-square adjusted p-value<0.01). Table S13. The percent of genes found variably esca** by our epigenetic predictor with significant differences in various epigenetic marks across various variable escape thresholds. Variable escape threshold is the number of samples with each XCI status (esca** from XCI and subject to XCI) that were required in order to call a gene as variably esca** from XCI across samples. Genes were counted as significant if BH corrected p-values were less than 0.01 when comparing samples predicted as subject to XCI to samples predicted as esca** from XCI. Table S14. Comparing XCI status calls made by an epigenetic predictor in the CEMT dataset vs a similar model in the CREST dataset. Table S15. The percent of genes found variably esca** by our epigenetic predictor in the CREST dataset with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as esca** from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably esca** genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X with differences between tissues/TSSs. Highlighted in blue are marks that were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chi-square adjusted p-value<0.01). Table S16. Top 100 results from an analysis associating XCI status with genotype. See additional files. There are separate sheets for association with ** or subject to XCI, with O being the ratio of these two columns and P being the reciprocal of O if it is less than 1, to make comparison easier. This enrichment column (col P) shows enrichment of reference allele at samples with one XCI status over the other. For the DNAme allChr sheet we have also included a column showing the attributable risk per allele. Table S17. The number of loci associated with each gene and genes associated with each locus. See additional files. These are for the association between DNAme based XCI status and genetic polymorphisms. Table S18. DNAmeQTL analysis for the loci significantly associated with DNAme-based XCI status calls. See additional files. These loci were independently tested as DNAmeQTLs in females and males, with some columns color coded based on sex (pink female, light blue male). There are also columns with the median and mean DNAme value at the gene’s island for samples with the reference or alternate allele at that loci; these columns are color coded based on whether the allele is in the range to escape from XCI (DNAme<0.01, blue) or in the range to be subject to XCI (DNAme>0.15, orange). There are mean and median columns for both males and females, but only the female columns are color coded based on XCI status. There are boxes around the genes with female median values with one allele in the range to escape XCI and the other allele in the range to be subject to XCI. Figure S1. log2(** to genes that escape from or are subject to XCI. Enhancers are split by whether they are located within a gene (genic) or not (intergenic). Figure S4. Expression across exons for genes with significantly different expression in samples with opposite XCI statuses. XCI status per sample was determined here using ** vs subject to XCI at variably esca** genes called using DNAme. For most of these marks, the region 500bp upstream of the promoter is used, except for H3K36me3 which uses the gene body. The median value per gene in samples found subject to XCI was subtracted from the median value per gene in samples which escaped from XCI. This is done here for all genes found variably esca** across individuals by DNAme. Figure S6. IGV view of DNAme bigwig tracks at two variably esca** genes. a) A view of the CpG island at CITED1. b) a view of the CpG island at NAA10. A broad representation of samples was sought, some hypomethylated, some hypermethylated and some inconsistent across the CpG island. Broad hypermethylation in males at these genes was rare but is included here as an example of an extreme. Figure S7. Average DNAme difference between adjacent CpGs per CpG island. Each point is the average DNAme difference between adjacent CpGs for an individual island, averaged again across samples. Islands are colored by the meta-status of the closest TSS within 2kb. Chr7 was chosen as an autosomal control to show whether the differences are X specific. Males and females from CEMT were used to check for sex specificity and females from CREST were included to check for cancer specificity. Figure S8. ROC for predictive models trained with each epigenetic mark. On display is one random forest model trained per sample with one epigenetic mark as its input, along with the median value of the mark in similar males. Samples are colored by tissue. The all category is for a predictor using all 6 histone marks and DNAme. Black diagonal lines were added to ease comparison between figures.Figure S9. Accuracy when models trained in one sample are tested on other models. Figure S10. Comparing XIST expression to the number of escape genes predicted per sample. Predictions were made using a random forest model with all histone marks and DNAme. Figure S11. Which marks were significantly different between samples predicted as esca** vs subject to XCI in a variably esca** region. Transcript ID is the order that the transcripts are located along the chromosome. There are multiple transcripts per gene but they may be sharing the same TSS and have the same data for all marks but H3K36me3. Vertical lines are drawn denoting which transcripts belong with each gene.
Additional file2:
Table S1. List of samples used. See additional files. For CEMT samples, tissue was annotated to combine samples from related areas. Columns D through L refer to the availability of the dataset for each sample. Patient health status and sample disease are the annotations done by CEMT. CREST samples were only used for the epigenetic predictor and only samples with all datasets available were included here.
Additional file3:
Table S2. Comparison of histone marks between sex and XCI status. See additional files. The first sheet shows BH adjusted p-values comparing female vs male and escape genes vs those subject to XCI per mark in CEMT with our meta-status and ** or subject to XCI, with O being the ratio of these two columns and P being the reciprocal of O if it is less than 1, to make comparison easier. This enrichment column (col P) shows enrichment of reference allele at samples with one XCI status over the other. For the DNAme allChr sheet we have also included a column showing the attributable risk per allele.
Additional file6:
Table S17. The number of loci associated with each gene and genes associated with each locus. See additional files. These are for the association between DNAme based XCI status and genetic polymorphisms.
Additional file7:
Table S18. DNAmeQTL analysis for the loci significantly associated with DNAme-based XCI status calls. See additional files. These loci were independently tested as DNAmeQTLs in females and males, with some columns color coded based on sex (pink female, light blue male). There are also columns with the median and mean DNAme value at the gene’s island for samples with the reference or alternate allele at that loci; these columns are color coded based on whether the allele is in the range to escape from XCI (DNAme<0.01, blue) or in the range to be subject to XCI (DNAme>0.15, orange). There are mean and median columns for both males and females, but only the female columns are color coded based on XCI status. There are boxes around the genes with female median values with one allele in the range to escape XCI and the other allele in the range to be subject to XCI.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Balaton, B.P., Brown, C.J. Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation. Epigenetics & Chromatin 14, 30 (2021). https://doi.org/10.1186/s13072-021-00404-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13072-021-00404-9