Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation

Balaton, Bradley P.; Brown, Carolyn J.

doi:10.1186/s13072-021-00404-9

Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation

Research
Open access
Published: 29 June 2021

Volume 14, article number 30, (2021)
Cite this article

Download PDF

You have full access to this open access article

Epigenetics & Chromatin Aims and scope Submit manuscript

Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation

Download PDF

5339 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Background

X-chromosome inactivation (XCI) is the epigenetic inactivation of one of two X chromosomes in XX eutherian mammals. The inactive X chromosome is the result of multiple silencing pathways that act in concert to deposit chromatin changes, including DNA methylation and histone modifications. Yet over 15% of genes escape or variably escape from inactivation and continue to be expressed from the otherwise inactive X chromosome. To the extent that they have been studied, epigenetic marks correlate with this expression.

Results

Using publicly available data, we compared XCI status calls with DNA methylation, H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3 and H3K36me3. At genes subject to XCI we found heterochromatic marks enriched, and euchromatic marks depleted on the inactive X when compared to the active X. Genes esca** XCI were more similar between the active and inactive X. Using sample-specific XCI status calls, we found some marks differed significantly with variable XCI status, but which marks were significant was not consistent between genes. A model trained to predict XCI status from these epigenetic marks obtained over 75% accuracy for genes esca** and over 90% for genes subject to XCI. This model made novel XCI status calls for genes without allelic differences or CpG islands required for other methods. Examining these calls across a domain of variably esca** genes, we saw XCI status vary across individual genes rather than at the domain level. Lastly, we compared XCI status calls to genetic polymorphisms, finding multiple loci associated with XCI status changes at variably esca** genes, but none individually sufficient to induce an XCI status change.

Conclusion

The control of expression from the inactive X chromosome is multifaceted, but ultimately regulated at the individual gene level with detectable but limited impact of distant polymorphisms. On the inactive X, at silenced genes euchromatic marks are depleted while heterochromatic marks are enriched. Genes esca** inactivation show a less significant enrichment of heterochromatic marks and depletion of H3K27ac. Combining all examined marks improved XCI status prediction, particularly for genes without CpG islands or polymorphisms, as no single feature is a consistent feature of silenced or expressed genes.

Cross-species examination of X-chromosome inactivation highlights domains of escape from silencing

Article Open access 17 February 2021

Derivation of consensus inactivation status for X-linked genes from genome-wide studies

Article Open access 30 December 2015

X-Chromosome Inactivation and Escape from X Inactivation in Mouse

Introduction

In eutherian mammals, one of the two X chromosomes (X) is epigenetically inactivated in XX females in order to achieve dosage compensation with XY males through a process known as X-chromosome inactivation (XCI) (see Balaton, 2018 for a review [1]). This inactivation is incomplete, as approximately 12% of genes consistently escape from XCI in humans [2], here defined as having at least 10% expression from the inactive X (** from XCI [2]. The short arm of the X near PAR1 is enriched in genes esca** from XCI, while the long arm that contains XIST—the gene responsible for initiating XCI—is enriched in genes subject to XCI [3]. Genes esca** from XCI are often found clustered together, with some convergence with topologically associated domains (TADs) [9]. In addition to genes that consistently escape from XCI (sometimes called constitutive escape), a further 8% of genes have been found to vary their XCI status between different tissues or individuals (termed variable or facultative escape [2] (reviewed in [5]), and another 7% of genes were found to be discordant between the studies identifying them [2]. Variably esca** and discordant genes were found to be enriched at boundaries between clusters of genes with opposite XCI statuses [2]. The factors determining XCI status remain unresolved, with the above evidence suggesting regional control, but there are also lone genes that escape XCI while flanked with genes subject to XCI [2] and even genes with two transcription start sites (TSSs) with opposite XCI status [10, 11]. Furthermore, these solo escape genes are able to recapitulate escape when integrated elsewhere on the X [12, 13].

Many methods have been used to identify which genes escape from XCI (reviewed in [14]). The gold-standard approach is to compare expression levels between the ** from XCI, while inactive marks such as H3K9me3, H4K20me3, H3K27me3 and macroH2A are enriched at genes subject to XCI [14, 22, 23], reviewed in [14]. A predictive model using many epigenetic as well as genetic features in mice was able to predict a gene’s XCI status accurately 78% of the time [24] and in humans a model obtained over 80% accuracy using only genomic repeats [25]. These, and additional studies have found L1 repeats enriched near genes that are subject to XCI, while ALU elements are more frequent at genes esca** XCI [25,26,27,29]. Another study found many genes where ** Technologies (CEMT) as these samples were derived from cancer and thus were anticipated to have a high frequency of skewed XCI, allowing us to use allelic expression to determine XCI status in each sample [11]. As cancer is known to have epigenetic changes, we additionally examined data from Core Research for Evolutional Science and Technology (CREST), another group within IHEC, thus allowing us to determine whether any trends that we observed in the CEMT data were due to the samples being cancer-derived. However, the CREST samples had less sequencing depth, fewer females (only nine), and could only be examined for DNAme and histone marks. Samples are listed in Additional file 2: Table S1. In our analyses, genes in the PAR were not included with genes esca** from XCI as they may be epigenetically distinct, especially when comparisons with males are included.

Histone marks differ with sex and XCI status

We compared the levels of histone modifications with sex and published XCI status calls derived from a synthesis of various approaches (hereafter referred to as meta-status) [2]. We used levels within 500 bp upstream of a gene’s TSS (except for the mark H3K36me3 that is associated with gene bodies and so was examined at exons [32]), and H3K4me1 that is associated with enhancers and so was examined at annotated enhancer sites [33]. We found that most marks had a significant difference (p value < 0.01) for the median level per transcript between males and females, at genes esca** and subject to XCI in both datasets (Fig. 1a, Additional file 3: Table S2). Fewer marks showed significant differences between genes esca** XCI and those subject to XCI within each sex. The euchromatic marks (H3K4me3, H3K27ac, and H3K36me3) were significantly different between transcripts subject to XCI and those esca** from XCI in both CEMT and CREST females, while the heterochromatic marks (H3K9me3, and H3K27me3) were only significantly different within the CREST dataset. Comparing XCI statuses within males gave the fewest significantly different marks, as was expected. Overall, the X chromosome of males and females differs in both heterochromatic and euchromatic marks, and the observable differences between XCI status implicate inactivation-related differences in addition to copy number (XX or XY) differences.

Discussion

XCI is a classic paradigm for studying epigenetic regulation, yet how some genes are resistant to silencing (or the maintenance of silencing) and escape XCI remains unresolved. Here, we have examined the genetic and epigenetic differences between genes esca** and those subject to XCI. Overall, epigenetic marks were more different between males and females than between genes esca** vs subject to XCI, suggesting an influence of the ** XCI have similar epigenetic marks between the ** XCI may be why escape genes can have as low as 10% expression from the ** XCI could also contribute to lower expression from the ** from those subject to XCI, while the heterochromatic mark H3K27me3 had the largest **, variably esca** or subject to XCI across our DNAme analyses as our previous ** and variably esca** from XCI. A large proportion of the additional genes found subject to XCI by our epigenetic predictor may in fact be silenced on both the Xa and ** from XCI and the number of epigenetic marks that were significant in at least one gene, but decreased the percentage of genes significant for DNAme that was the only mark ever significant for over 50% of genes in a dataset.

We observed that variable escape from XCI was regulated at the level of single genes, with adjacent genes varying their XCI status independently. In contrast, a study in mice found clusters of genes that variably escape across their three cell lines, with adjacent genes often having the same XCI status across lines [9]. They also found that these clusters colocalize with TADs, with one line having the majority of a TAD esca** XCI and another line having only part of it esca**. An interesting candidate regulator of regional control is SMCHD1. In mice with SMCHD1 knocked-out, regions enriched with variably esca** genes were upregulated, while genes that constitutively escaped from XCI were not affected; however, no impact was seen on variable escape genes in human patients with heterozygous SMCHD1 mutations [36]. Nonetheless, another study found variants with low expression of SMCHD1, ZSCAN9 and HBG2/TRIM6 associated with hypomethylation of X-linked CpG islands, with affected islands enriched near genes that variably escape from XCI [29]. Additionally there are individual genes which are susceptible to reactivation under certain conditions, such as how some genes are reliant on XIST expression and H3K27me3 deacetylation to remain silent, while others continue to be silenced when XIST expression is disrupted [47]. This also supports how our variably esca** genes did not have consistent epigenetic differences between samples which escaped XCI and those which were subject to XCI. Overall, there is evidence for both domain-level and gene-specific regulation of escape. We suggest that for some domains the former predominates, while for other genes the latter predominates. Additionally, the domain featured in Fig. 4 (and other variably esca** domains) is at a threshold where individual genes within the domain can have either XCI status based on local factors.

We thus asked whether variable escape from XCI could be controlled by local sequence variants. Here, we found an association between numerous genetic variants and sample-specific XCI status at variably esca** genes. However, we did not find any local genetic effect, as none of the loci were within 5 Mb of the affected genes and only 10 of 610 significant loci were located on the X. None of the SNPs we identified were completely correlated with a gene’s XCI status, so other factors must be involved. Additionally, all of the significant loci we found were based on DNAme for XCI status calls. These loci could have been affecting just DNAme instead of XCI status, however 38 out of 610 significant loci were female-specific DNAmeQTL while only one loci was a significant DNAmeQTL in males. With more samples with skewed XCI, we may have found loci associated with our ** from XCI in the CEMT cancer dataset as in the healthy CREST dataset. Nonetheless, we used the CEMT dataset because it had a standardized set of epigenetic marks across many samples and the clonality of cancer allowed us to examine expression and DNAme allelically. We found that other datasets, did not always have all the marks from the same samples, were lacking females or sex labels or had mislabeled sex.

The use of different methods and sample sizes to call XCI status can result in discordant calls generally due to one approach calling a gene as variably esca** while other studies do not. A previous meta-analysis saw 7% of genes having discordant calls between studies [2]. Many of these discordancies between studies may be due to different samples and tissues used, but here we see differences in XCI status called using different approaches with the same samples. Genes could be falsely called as subject to XCI in the ** XCI tended to have equal levels of marks on the Xa and ** vs subject to XCI at variably esca** genes, but which marks were significant was not consistent between genes and no mark was significant across all of the variably esca** genes, likely reflecting that variably esca** genes having multiple ways in which they are regulated. DNAme intermediate to what is expected for genes esca** vs subject to XCI is enriched at variably esca** genes and is mostly due to inconsistent DNAme on the ** genes were seen to regulate their XCI status independently from each other, suggesting local regulatory elements. Additionally, we searched for polymorphisms which may control variable escape from XCI and found non-syntenic loci, some with a strong correlation, but none were completely correlated further suggesting complex regulation. Overall, we see that escape from XCI is influenced by both local regulatory elements as well as trans-acting factors and chromatin modifications that can be independent of each other. Understanding how genes escape from XCI will further our understanding of epigenetics in general and may allow us to control which genes are esca** from XCI and rescue X-linked mutations in females.

Methods

Previous XCI status calls

We used XCI meta-status calls from [2] for all comparisons with past XCI statuses and to train our models. Genes that escape and mostly escaped were combined together due to the small size of these categories, with genes in the PAR1 being left out or having their own separate category depending on the analysis. Genes that were mostly subject to XCI were combined with genes subject to XCI for comparisons between studies, but were left out when training models. Genes that were annotated as variably esca**, mostly variably esca** and discordant across studies were combined together as variably esca** genes for comparisons here.

Histone ChIP-seq analysis

Histone ChIP-seq bigwig files were downloaded from the IHEC data portal [38] and their mean signal quantified with bigWigAverageOverBed [39] for a region 500 bp upstream of TSSs as annotated by Gencode [40]. We normalized the data across samples by multiplying samples to have the same total depth (including all chromosomes). The ** genes, requiring at least two samples with each XCI status. This narrows the number of variably esca** genes and increases the chance that those found would have enough samples to reach significance. The overall expression level of genes was calculated using bigwig files downloaded from the CEEHRC data portal [42] and quantified as RPKM using VisRseq [43].

DNAme analysis

WGBS bigwig files were downloaded from the IHEC data portal [38] and quantified with bigWigAverageOverBed [39] for a region 500 bp upstream of TSSs as annotated by Gencode [40]. DNAme thresholds established in [11] were used to determine which genes were esca** XCI and which were subject to XCI. These thresholds are: DNAme < 10% escapes XCI, 15% < DNAme < 60% subject to XCI, and DNAme > 60% hypermethylated. A threshold of DNAme < 15% in males was used to filter out TSSs that were methylated on the Xa and therefore not informative for this analysis. To see the differences between adjacent CpGs, we converted bigWig files to bedGraphs and for each island we used R to find the mean absolute value difference between each adjacent CpG.

DNAme per read was calculated by downloading WGBS bam files and using a script to count the number of unmethylated and methylated CG dinucleotides per read within CpG islands within 2 kb of TSSs. For allelic DNAme, we did similar but only examined reads that overlapped heterozygous SNPs identified in our ** from XCI and both alleles higher than 0.75 being called as hypermethylated. Polymorphisms with one allele above 0.75 and the other allele below 0.25 were called as subject to XCI. The DNAme per read per polymorphism was binned as above, but instead of using the mean DNAme across all reads, we determined the mean DNAme per allele and used the mean of that; this was done so that we get the mean between the ** XCI were often within two standard deviations of each, the average of these two means was often used as a threshold instead.

For our random forest models, we wanted to include both male and female data, and breast did not have any male data so we used the kmeans function in R to cluster all of our samples based on autosomal levels of all seven epigenetic marks used herein. With three clusters we had multiple male and female samples in each cluster. As input for our models, we used individual female data per sample and matched it with the mean values per gene across males in the same cluster.

Random forest models were trained using the R package caret [35] with the trainControl method cv and the train method rf. We trained the model on genes known to escape or be subject to XCI [2]. The training metric was ROC, tunelength was 5 and ntree was 1500. Three genes esca** and subject to XCI were left out of the training set and used to check accuracy of overall calls. We trained twenty models per sample, with each model being trained on a random sample of 75% of the genes esca** XCI and twice as many genes subject to XCI, with each iteration of the model using 75% of the number of input esca** genes. Accuracy per model was tested on the remaining genes with known XCI status. Genes were considered as esca** or subject to XCI if 15 + of 20 models predicted them as esca** or subject to XCI, respectively. Separate categories were made for genes where only 12–14 of the models agreed on the gene’s XCI status, being annotated as leaning subject or leaning escape. Overall calls were made across samples with genes with 66% or more of samples agreeing on a gene’s XCI status being called as subject to or esca** from XCI, genes with at least 33% or more of all samples having each XCI status being called as variably esca** from XCI, and genes that required the leaning categories to reach 66% of samples having a status being annotated with a similar leaning status.

Statistical comparisons

All statistical comparisons were done in R [44]. The majority were t-tests with a Benjamini–Hochberg (BH) multiple testing correction [45] with results deemed significant if they had an adjusted p value < 0.01. The one test with a different threshold was for comparing genes variably esca** XCI as determined by ** from XCI, assuming the reason for this is that they did not have skewed XCI.

For the TCGA DNAme-based XCI status calls we downloaded methylation beta-values from the genomic data commons data portal for females and males from the TCGA dataset. Probes were removed if the average male DNAme was over 15% and female samples were removed if their average DNAme was two standard deviations below the female average, as we presume that they were mislabeled males or had lost their ** gene’s XCI status (Chi-square test, significant if BH adjusted p value < 0.01). We tested vs all SNPs on the array, and again with just the SNPs on the X. For samples which had multiple SNP array datasets, we used a consensus allele across all of the arrays. We did not include heterozygous samples as we were testing for a cis-effect and had no way of knowing which allele was on the **. DNAmeQTLs were examined by using the lm function in R to make a linear model for every combination of SNP and CpG island.

Availability of data and material

See references for data sources. All are publicly available.

Abbreviations

450k array:: Illumina Infinium Human Methylation450 BeadChip array
BH:: Benjamini–Hochberg
CEMT:: Center for Epigenome Map** Technologies
ChIP-seq:: Chromatin immunoprecipitation sequencing
CREST:: Core Research for Evolutional Science and Technology
DNAme:: DNA methylation
DNAmeQTL:: DNA methylation quantitative trait loci
IHEC:: International Human Epigenome Consortium
meta-status:: XCI status calls from Balaton et al., 2015
PAR:: Pseudo-autosomal region
RNA-seq:: RNA sequencing
SNP6:: Affymetrix Genome-Wide Human SNP Array 6.0
TAD:: Topologically associating domain
TCGA:: The Cancer genome Atlas
TSS:: Transcription start site
WGBS:: Whole genome bisulfite sequencing
X:: X chromosome
Xa:: Active X
XCI:: X-chromosome inactivation
**:: Inactive X

References

Balaton BP, Dixon-McDougall T, Peeters SB, Brown CJ. The eXceptional nature of the X chromosome. Hum Mol Genet. 2018;27:R242-49.
Article CAS Google Scholar
Balaton BP, Cotton AM, Brown CJ. Derivation of consensus inactivation status for X-linked genes from genome-wide studies. Biol Sex Differ. 2015;6:35.
Article Google Scholar
Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–4.
Article CAS Google Scholar
Dunford A, Weinstock DM, Savova V, Schumacher SE, Cleary JP, Yoda A, et al. Tumor-suppressor genes that escape from X-inactivation contribute to cancer sex bias. Nat Genet. 2017;49:10–6.
Article CAS Google Scholar
Navarro-Cobos MJ, Balaton BP, Brown CJ. Genes that escape from X-chromosome inactivation: potential contributors to Klinefelter syndrome. Am J Med Genet C Semin Med Genet. 2020;184:226–38.
Article CAS Google Scholar
Tukiainen T, Villani A-C, Yen A, Rivas MA, Marshall JL, Satija R, et al. Landscape of X chromosome inactivation across human tissues. Nature. 2017;550:244–8.
Article Google Scholar
Godfrey AK, Naqvi S, Chmátal L, Chick JM, Mitchell RN, Gygi SP, et al. Quantitative analysis of Y-Chromosome gene expression across 36 human tissues. Genome Res. 2020;30:860–73.
Article CAS Google Scholar
Helena Mangs A, Morris BJ. The human pseudoautosomal region (PAR): origin, function and future. Curr Genomics. 2007;8:129–36.
Article CAS Google Scholar
Marks H, Kerstens HHD, Barakat TS, Splinter E, Dirks RAM, van Mierlo G, et al. Dynamics of gene silencing during X inactivation using allele-specific RNA-seq. Genome Biol. 2015;16:149.
Article Google Scholar
Goto Y, Kimura H. Inactive X chromosome-specific histone H3 modifications and CpG hypomethylation flank a chromatin boundary between an X-inactivated and an escape gene. Nucleic Acids Res. 2009;37:7416–28.
Article CAS Google Scholar
Balaton BP, Fornes O, Wasserman WW, Brown CJ. Cross-species examination of X-chromosome inactivation highlights domains of escape from silencing. Genetics. 2020. https://doi.org/10.1186/s13072-021-00386-8.
Article Google Scholar
Horvath LM, Li N, Carrel L. Deletion of an X-inactivation boundary disrupts adjacent gene silencing. PLoS Genet. 2013;9:e1003952.
Article Google Scholar
Peeters SB, Korecki AJ, Simpson EM, Brown CJ. Human cis-acting elements regulating escape from X-chromosome inactivation function in mouse. Hum Mol Genet. 2018;27:1252–62.
Article CAS Google Scholar
Balaton BP, Brown CJ. Escape Artists of the X Chromosome. Trends Genet. 2016;32:348–59.
Article CAS Google Scholar
Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, et al. Escape from X inactivation varies in mouse tissues. PLoS Genet. 2015;11:e1005079.
Article Google Scholar
Vacca M, Della Ragione F, Scalabrì F, D’Esposito M. X inactivation and reactivation in X-linked diseases. Semin Cell Dev Biol. 2016;56:78–87.
Article Google Scholar
Mengel-From J, Lindahl-Jacobsen R, Nygaard M, Soerensen M, Ørstavik KH, Hertz JM, et al. Skewness of X-chromosome inactivation increases with age and varies across birth cohorts in elderly Danish women. Sci Rep. 2021;11:4326.
Article CAS Google Scholar
Larson NB, Fogarty ZC, Larson MC, Kalli KR, Lawrenson K, Gayther S, et al. An integrative approach to assess X-chromosome inactivation using allele-specific expression with applications to epithelial ovarian cancer. Genet Epidemiol. 2017;41:898–914.
Article Google Scholar
de MoreiraMello JC, Fernandes GR, Vibranovski MD, Pereira LV. Early X chromosome inactivation during human preimplantation development revealed by single-cell RNA-sequencing. Sci Rep. 2017;7:10794.
Article Google Scholar
Hagen SH, Henseling F, Hennesen J, Savel H, Delahaye S, Richert L, et al. Heterogeneous escape from X Chromosome inactivation results in sex differences in type I IFN responses at the single human pDC level. Cell Rep. 2020;33:108485.
Article CAS Google Scholar
Cotton AM, Price EM, Jones MJ, Balaton BP, Kobor MS, Brown CJ. Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Hum Mol Genet. 2015;24:1528–39.
Article CAS Google Scholar
Qu K, Zaba LC, Giresi PG, Li R, Longmire M, Kim YH, et al. Individuality and variation of personal regulomes in primary human T cells. Cell Syst. 2015;1:51–61.
Article CAS Google Scholar
Kucera KS, Reddy TE, Pauli F, Gertz J, Logan JE, Myers RM, et al. Allele-specific distribution of RNA polymerase II on female X chromosomes. Hum Mol Genet. 2011;20:3964–73.
Article CAS Google Scholar
de BarrosAndrade E, Sousa L, Jonkers I, Syx L, Dunkel I, Chaumeil J, Picard C, et al. Kinetics of -induced gene silencing can be predicted from combinations of epigenetic and genomic features. Genome Res. 2019;29:1087–99.
Article Google Scholar
Wang Z, Willard HF, Mukherjee S, Furey TS. Evidence of influence of genomic DNA sequence on human X chromosome inactivation. PLoS Comput Biol. 2006;2:e113.
Article Google Scholar
Cotton AM, Chen C-Y, Lam LL, Wasserman WW, Kobor MS, Brown CJ. Spread of X-chromosome inactivation into autosomal sequences: role for DNA elements, chromatin features and chromosomal domains. Hum Mol Genet. 2014;23:1211–23.
Article CAS Google Scholar
Bailey JA, Carrel L, Chakravarti A, Eichler EE. Molecular evidence for a relationship between LINE-1 elements and X chromosome inactivation: the Lyon repeat hypothesis. Proc Natl Acad Sci USA. 2000;97:6634–9.
Article CAS Google Scholar
Loda A, Brandsma JH, Vassilev I, Servant N, Loos F, Amirnasr A, et al. Genetic and epigenetic features direct differential efficiency of **st-mediated silencing at X-chromosomal and autosomal locations. Nat Commun. 2017;8:690.
Article Google Scholar
Luijk R, Wu H, Ward-Caviness CK, Hannon E, Carnero-Montoro E, Min JL, et al. Autosomal genetic variation is associated with DNA methylation in regions variably esca** X-chromosome inactivation. Nat Commun. 2018;9:3738.
Article Google Scholar
Chen B, Craiu RV, Sun L. Bayesian model averaging for the X-chromosome inactivation dilemma in genetic association study. Biostatistics. 2020;21:319–35.
PubMed Google Scholar
Xu W, Hao M. A unified partial likelihood approach for X-chromosome association on time-to-event outcomes. Genet Epidemiol. 2018;42:80–94.
Article Google Scholar
Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37.
Article CAS Google Scholar
Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database. 2017. https://doi.org/10.1093/database/bax028.
Article PubMed PubMed Central Google Scholar
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1.
Article Google Scholar
Max Kuhn. Caret: classification and regression training. R package version 6.0-86. 2020. https://CRAN.R-project.org/package=caret/. Accessed 25 Jan 2020.
Wang C-Y, Brand H, Shaw ND, Talkowski ME, Lee JT. Role of the chromosome architectural factor SMCHD1 in X-Chromosome inactivation, gene regulation, and disease in humans. Genetics. 2019;213:685–703.
Article CAS Google Scholar
Dawson MA, Kouzarides T. Cancer epigenetics: from mechanism to therapy. Cell. 2012;150:12–27.
Article CAS Google Scholar
Bujold D, de Morais DA, Gauthier C, Côté C, Caron M, Kwan T, et al. The international human epigenome consortium data portal. Cell Syst. 2016;3:496-9.e2.
Article CAS Google Scholar
Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26:2204–7.
Article CAS Google Scholar
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–74.
Article CAS Google Scholar
Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160-5.
Article Google Scholar
Canadian Epigenomes. (2020). http://www.epigenomes.ca/data-release/hg38/. Accessed 14 Aug 2020.
Younesy H, Möller T, Lorincz MC, Karimi MM, Jones SJM. VisRseq: R-based visual framework for analysis of sequencing data. BMC Bioinformatics. 2015;16(Suppl 11):S2.
Article Google Scholar
R Core Team. R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. 2020. https://www.R-project.org/. Accessed 25 Jan 2020.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B. 1995;57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Article Google Scholar
Website. Genomic Data Commons. 2020. https://portal.gdc.cancer.gov/. Accessed 14 Aug 2020.
Yu B, Qi Y, Li R, Shi Q, Satpathy AT, Chang HY. B cell-based XIST complex enforces X-inactivation and restrains atypical B cells. Cell. 2021;184:7.
Google Scholar

Download references

Acknowledgements

We thank the other members of the Brown lab for helpful comments during the development of this project. Most of the analyses conducted here used data generated by The Canadian Epigenetics, Epigenomics, Environment and Health Research Consortium (CEEHRC) initiative funded by the Canadian Institutes of Health Research (CIHR), Genome BC, and Genome Quebec. Information about CEEHRC and the participating investigators and institutions can be found at http://www.cihr-irsc.gc.ca/e/43734.html. Our genetic association studies used data generated by the TCGA Research network: https://www.cancer.gov/tcga. We would also like to thank the research groups which generated the other sources of data used in this analysis.

Funding

BPB was supported by a CGS-D award from NSERC. Research was supported by CIHR project grant (PJT-16120).

Author information

Authors and Affiliations

Department of Medical Genetics, The University of British Columbia, Vancouver, Canada
Bradley P. Balaton & Carolyn J. Brown

Authors

Bradley P. Balaton
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn J. Brown
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BPB conducted all analyses. All authors contributed to the interpretation of data and writing of the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Carolyn J. Brown.

Ethics declarations

Ethics approval and consent to participate

Much of the expression and WGBS data analyzed were generated by The Canadian Epigenetics, Epigenomics, Environment and Health Research Consortium (CEEHRC) initiative funded by the Canadian Institutes of Health Research (CIHR), Genome BC, and Genome Quebec. Ethics approval for data access was provided by the University of British Columbia Clinical Research Ethics Board (H17-01363).

The data for our genetic analysis of XCI were generated by The Cancer Genome Atlas (TCGA) Research Group. Ethics approval for data access was provided by the University of British Columbia Clinical Research Ethics board (H19-02018).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file1:

Table S1. List of samples used. See additional files. For CEMT samples, tissue was annotated to combine samples from related areas. Columns D through L refer to the availability of the dataset for each sample. Patient health status and sample disease are the annotations done by CEMT. CREST samples were only used for the epigenetic predictor and only samples with all datasets available were included here. Table S2. Comparison of histone marks between sex and XCI status. See additional files. The first sheet shows BH adjusted p-values comparing female vs male and escape genes vs those subject to XCI per mark in CEMT with our meta-status and ** XCI by ** or subject to XCI for each gene. Table S6. The differences in epigenetic marks between samples with opposite XCI statuses at genes found variably esca** XCI by ** XCI. Those found significant in Table S5 are bolded. Genes with multiple transcripts are included multiple times, even if they share a TSS. Table S7. Adjusted p-values comparing marks in females between genes found subject to XCI vs esca** XCI by DNAme. Those in bold are significant (adjusted p-value<0.01). Table S8. Distribution summary for DNAme per read. The number is what proportion of reads in each bin were below 25%, between 33 and 66% or over 75% DNAme. Table S9. The accuracy of simple models predicting XCI status from a single histone mark. These accuracies are low because the models overpredicted variable escape from XCI as there is large overlap between the two XCI statuses. Table S10. The accuracy of random forest models predicting XCI status from a single histone mark. This is the combined accuracy using the consensus of 20 models trained with each mark. Table S11. XCI status calls made using a random forest epigenetic predictor, split by presence or absence of a CpG island and expression. The threshold used to split low from high expression is a median of 0.1 RPKM across samples. Inconsistent predictions had over a third of samples with fewer than 15 of the 20 models trained agree on an XCI status. Table S12. The percent of genes found variably esca** by our epigenetic predictor with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as esca** from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably esca** genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X with differences between tissues/TSSs. Highlighted in blue are marks that were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chi-square adjusted p-value<0.01). Table S13. The percent of genes found variably esca** by our epigenetic predictor with significant differences in various epigenetic marks across various variable escape thresholds. Variable escape threshold is the number of samples with each XCI status (esca** from XCI and subject to XCI) that were required in order to call a gene as variably esca** from XCI across samples. Genes were counted as significant if BH corrected p-values were less than 0.01 when comparing samples predicted as subject to XCI to samples predicted as esca** from XCI. Table S14. Comparing XCI status calls made by an epigenetic predictor in the CEMT dataset vs a similar model in the CREST dataset. Table S15. The percent of genes found variably esca** by our epigenetic predictor in the CREST dataset with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as esca** from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably esca** genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X with differences between tissues/TSSs. Highlighted in blue are marks that were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chi-square adjusted p-value<0.01). Table S16. Top 100 results from an analysis associating XCI status with genotype. See additional files. There are separate sheets for association with ** or subject to XCI, with O being the ratio of these two columns and P being the reciprocal of O if it is less than 1, to make comparison easier. This enrichment column (col P) shows enrichment of reference allele at samples with one XCI status over the other. For the DNAme allChr sheet we have also included a column showing the attributable risk per allele. Table S17. The number of loci associated with each gene and genes associated with each locus. See additional files. These are for the association between DNAme based XCI status and genetic polymorphisms. Table S18. DNAmeQTL analysis for the loci significantly associated with DNAme-based XCI status calls. See additional files. These loci were independently tested as DNAmeQTLs in females and males, with some columns color coded based on sex (pink female, light blue male). There are also columns with the median and mean DNAme value at the gene’s island for samples with the reference or alternate allele at that loci; these columns are color coded based on whether the allele is in the range to escape from XCI (DNAme<0.01, blue) or in the range to be subject to XCI (DNAme>0.15, orange). There are mean and median columns for both males and females, but only the female columns are color coded based on XCI status. There are boxes around the genes with female median values with one allele in the range to escape XCI and the other allele in the range to be subject to XCI. Figure S1. log2(** to genes that escape from or are subject to XCI. Enhancers are split by whether they are located within a gene (genic) or not (intergenic). Figure S4. Expression across exons for genes with significantly different expression in samples with opposite XCI statuses. XCI status per sample was determined here using ** vs subject to XCI at variably esca** genes called using DNAme. For most of these marks, the region 500bp upstream of the promoter is used, except for H3K36me3 which uses the gene body. The median value per gene in samples found subject to XCI was subtracted from the median value per gene in samples which escaped from XCI. This is done here for all genes found variably esca** across individuals by DNAme. Figure S6. IGV view of DNAme bigwig tracks at two variably esca** genes. a) A view of the CpG island at CITED1. b) a view of the CpG island at NAA10. A broad representation of samples was sought, some hypomethylated, some hypermethylated and some inconsistent across the CpG island. Broad hypermethylation in males at these genes was rare but is included here as an example of an extreme. Figure S7. Average DNAme difference between adjacent CpGs per CpG island. Each point is the average DNAme difference between adjacent CpGs for an individual island, averaged again across samples. Islands are colored by the meta-status of the closest TSS within 2kb. Chr7 was chosen as an autosomal control to show whether the differences are X specific. Males and females from CEMT were used to check for sex specificity and females from CREST were included to check for cancer specificity. Figure S8. ROC for predictive models trained with each epigenetic mark. On display is one random forest model trained per sample with one epigenetic mark as its input, along with the median value of the mark in similar males. Samples are colored by tissue. The all category is for a predictor using all 6 histone marks and DNAme. Black diagonal lines were added to ease comparison between figures.Figure S9. Accuracy when models trained in one sample are tested on other models. Figure S10. Comparing XIST expression to the number of escape genes predicted per sample. Predictions were made using a random forest model with all histone marks and DNAme. Figure S11. Which marks were significantly different between samples predicted as esca** vs subject to XCI in a variably esca** region. Transcript ID is the order that the transcripts are located along the chromosome. There are multiple transcripts per gene but they may be sharing the same TSS and have the same data for all marks but H3K36me3. Vertical lines are drawn denoting which transcripts belong with each gene.

Additional file2:

Table S1. List of samples used. See additional files. For CEMT samples, tissue was annotated to combine samples from related areas. Columns D through L refer to the availability of the dataset for each sample. Patient health status and sample disease are the annotations done by CEMT. CREST samples were only used for the epigenetic predictor and only samples with all datasets available were included here.

Additional file3:

Table S2. Comparison of histone marks between sex and XCI status. See additional files. The first sheet shows BH adjusted p-values comparing female vs male and escape genes vs those subject to XCI per mark in CEMT with our meta-status and ** or subject to XCI, with O being the ratio of these two columns and P being the reciprocal of O if it is less than 1, to make comparison easier. This enrichment column (col P) shows enrichment of reference allele at samples with one XCI status over the other. For the DNAme allChr sheet we have also included a column showing the attributable risk per allele.

Additional file6:

Table S17. The number of loci associated with each gene and genes associated with each locus. See additional files. These are for the association between DNAme based XCI status and genetic polymorphisms.

Additional file7:

Table S18. DNAmeQTL analysis for the loci significantly associated with DNAme-based XCI status calls. See additional files. These loci were independently tested as DNAmeQTLs in females and males, with some columns color coded based on sex (pink female, light blue male). There are also columns with the median and mean DNAme value at the gene’s island for samples with the reference or alternate allele at that loci; these columns are color coded based on whether the allele is in the range to escape from XCI (DNAme<0.01, blue) or in the range to be subject to XCI (DNAme>0.15, orange). There are mean and median columns for both males and females, but only the female columns are color coded based on XCI status. There are boxes around the genes with female median values with one allele in the range to escape XCI and the other allele in the range to be subject to XCI.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Balaton, B.P., Brown, C.J. Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation. Epigenetics & Chromatin 14, 30 (2021). https://doi.org/10.1186/s13072-021-00404-9

Download citation

Received: 26 March 2021
Accepted: 17 June 2021
Published: 29 June 2021
DOI: https://doi.org/10.1186/s13072-021-00404-9

Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation

Abstract

Background

Results

Conclusion

Similar content being viewed by others

Introduction

Histone marks differ with sex and XCI status

Expanding sample-specific XCI status by using DNA methylation

Independent regulation of variable escape across a region

Genetic contribution to variable escape from XCI

Discussion

Methods

Previous XCI status calls

Histone ChIP-seq analysis

DNAme analysis

Statistical comparisons

Availability of data and material

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation