Log in

Genetic drivers and cellular selection of female mosaic X chromosome loss

  • Article
  • Published:

From Nature

View current issue Submit your manuscript

Abstract

Mosaic loss of the X chromosome (mLOX) is the most common clonal somatic alteration in leukocytes of female individuals1,2, but little is known about its genetic determinants or phenotypic consequences. Here, to address this, we used data from 883,574 female participants across 8 biobanks; 12% of participants exhibited detectable mLOX in approximately 2% of leukocytes. Female participants with mLOX had an increased risk of myeloid and lymphoid leukaemias. Genetic analyses identified 56 common variants associated with mLOX, implicating genes with roles in chromosomal missegregation, cancer predisposition and autoimmune diseases. Exome-sequence analyses identified rare missense variants in FBXO10 that confer a twofold increased risk of mLOX. Only a small fraction of associations was shared with mosaic Y chromosome loss, suggesting that distinct biological processes drive formation and clonal expansion of sex chromosome missegregation. Allelic shift analyses identified X chromosome alleles that are preferentially retained in mLOX, demonstrating variation at many loci under cellular selection. A polygenic score including 44 allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Our results support a model in which germline variants predispose female individuals to acquiring mLOX, with the allelic content of the X chromosome possibly sha** the magnitude of clonal expansion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Common and rare genetic contributors to mLOX susceptibility.
Fig. 2: Shared and distinct genetic contributors to mLOX susceptibility in female participants and mLOY susceptibility in male participants.
Fig. 3: Allelic shift of X chromosome alleles among mLOX cases.
Fig. 4: X chromosome alleles under cis selection and their effects on mLOX susceptibility.
Fig. 5: Inferring the retained X chromosome in female biobank participants with mLOX.

Similar content being viewed by others

Data availability

Overall and population-level GWAS summary statistics generated from the mLOX meta-analysis are available on the GWAS catalogue (accession numbers GCST90328147, GCST90328148, GCST90328149 and GCST90328150). Requests for access to individual-level data differ for each contributing biobank. For FinnGen, researchers can apply for health data from the Finnish Data Authority Findata (https://findata.fi/en/permits/) and individual-level genotype data available through the Fingenious portal (https://site.fingenious.fi/en/). These resources are hosted by the Finnish Biobank Cooperative FINBB (https://finbb.fi/en/). Access can only be provided for research projects within the scope of the Finnish Biobank Act, which includes health promotion, understanding disease mechanisms or develo** medical products or treatment practices. For EBB, individual-level health, lifestyle, demographic and genetic data are anonymized and available for research projects. Data sharing is conducted in accordance with the regulations of the Estonian Genome Center of the University of Tartu (HGRA). A data application form can be found at https://www.biobank.ee. The research project has to obtain approval from the Ethics Review Committee on Human Research of the University of Tartu as well as approval from the EGCUT scientific committee. For UKBB, all individual-level data used in the analysis is available by application to the UKBB Access Management System (https://www.ukbiobank.ac.uk). Approved researchers can submit applications for review and assessments are made to determine if the research proposal qualifies as health-related research in line with public interest. For BCAC, data for some of the samples are available on dbGAP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001265.v1.p1). Requests for BCAC data can be made to the Data Access Coordination Committee (DACC) of BCAC (http://bcac.ccge.medschl.cam.ac.uk/bcacdata/). BCAC DACC approval is required to access individual-level phenotype and genotype data from the ABCFS, ABCS, ABCTB, BBCC, BBCS, BCEES, BCFR-NY, BCFR-PA, BCFR-UTAH, BCINIS, BIGGS, BREOGAN, BSUCH, CBCS, CCGP, CECILE, CGPS, CNIO-BCS, CPSII, CTS, EPIC, DIETCOMPLYF, ESTHER, GC-HBOC, GENICA, HABCS, HCSC, HEBCS, HMBCS, HUBCS, KARBAC, KARMA, KBCP, KCONFAB/AOCS, LMBC, MABCS, MARIE, MBCSG, MCBCS, MCCS, MEC, MISS, MMHS, MTLGEBCS, NBCS, NC-BCFR, NBHS, NCBCS, NHS, NHS2, OBCS, OFBCR, ORIGO, PBCS, PKARMA, PLCO, POSH, RBCS, SASBAC, SBCS, SEARCH, SISTER, SKKDKFZS, SMC, SZBCS, UCIBCS, UKBGS, UKOPS and USRT studies. For MVP, summary statistics are available on dbGaP under the MVP accession number phs001672. Additional data supporting the findings of this study are available upon reasonable request from MVP. These data are not publicly available due to restrictions of the US Government and Department of Veterans Affairs concerning privacy and participant consent. For MGB, a portion of individual-level genomic data are available in dbGAP as part of the eMERGE consortium (phs001584.v2.p2) and as part of the Center Common Disease Genomics (phs002018.v1.p1). Additional MGB data are not currently publicly available due to data restrictions. For PLCO, individual-level genotype data is available in dbGaP (phs001286.v2.p2). Permitted data use includes discovery and hypothesis generation in the investigation of genetic contributions to cancer risk and risk of other diseases as well as development of novel analytical approaches for GWAS. Individual-level phenotype data can be requested through the NCI Cancer Data Access System (CDAS) (https://cdas.cancer.gov/plco/). For BBJ, information on the cohort is available at the RIKEN website (http://jenger.riken.jp/en/). While individual-level genetic data are not accessible, all other individual-level data are available upon request.

Code availability

The MoChA pipelines used for mLOX calling (mocha.wdl), GWAS (assoc.wdl), allelic shift analysis (impute.wdl and shift.wdl) and X chromosome differential score estimation (score.wdl) are available at https://doi.org/10.5281/zenodo.1089252086 (please see the detailed and most updated version at https://github.com/freeseek/mochawdl). The GWAS meta-analysis was performed by using the pipeline developed by the COVID-19 Host Genetics Initiative, available at https://github.com/covid19-hg/META_ANALYSIS. The codes used for the Bayesian line model are available at https://github.com/dsgelab/Mosaic-loss-of-chromosome-X/tree/main/BayesLineModel.

References

  1. Machiela, M. J. et al. Female chromosome X mosaicism is age-related and preferentially affects the inactivated X chromosome. Nat. Commun. 7, 11843 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  2. Zekavat, S. M. et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat. Med. 27, 1012–1024 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Brown, C. J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991).

    Article  CAS  PubMed  ADS  Google Scholar 

  4. Lyon, M. F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).

    Article  CAS  PubMed  ADS  Google Scholar 

  5. Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244–248 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  6. Busque, L. et al. Nonrandom X-inactivation patterns in normal females: lyonization ratios vary with age. Blood 88, 59–65 (1996).

    Article  CAS  PubMed  Google Scholar 

  7. Gale, R. E. & Linch, D. C. Interpretation of X-chromosome inactivation patterns. Blood 84, 2376–2378 (1994).

    Article  CAS  PubMed  Google Scholar 

  8. Zito, A. et al. Heritability of skewed X-inactivation in female twins is tissue-specific and associated with age. Nat. Commun. 10, 5339 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  9. Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dumanski, J. P. et al. Smoking is associated with mosaic loss of chromosome Y. Science 347, 81–83 (2015).

    Article  CAS  PubMed  ADS  Google Scholar 

  11. Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wright, D. J. et al. Genetic variants associated with mosaic Y chromosome loss highlight cell cycle genes and overlap with cancer susceptibility. Nat. Genet. 49, 674–679 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Loh, P. R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  15. Lin, S. H. et al. Incident disease associations with mosaic chromosomal alterations on autosomes, X and Y chromosomes: insights from a phenome-wide association study in the UK Biobank. Cell Biosci. 11, 1–11 (2021).

    Article  Google Scholar 

  16. Zhou, W. et al. Detectable chromosome X mosaicism in males is rarely tolerated in peripheral leukocytes. Sci. Rep. 11, 1193 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Sybert, V. P. & McCauley, E. Turner’s syndrome. N. Engl. J. Med. 351, 1227–1238 (2004).

    Article  CAS  PubMed  Google Scholar 

  18. Jäger, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Koren, A. & McCarroll, S. A. Random replication of the inactive X chromosome. Genome Res. 24, 64–69 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kessler, M. D. et al. Common and rare variant associations with clonal haematopoiesis phenotypes. Nature 612, 301–309 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  21. Terao, C. et al. GWAS of mosaic loss of chromosome Y highlights genetic effects on blood cell differentiation. Nat. Commun. 10, 4719 (2019).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  22. Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  23. Leitsalu, L. et al. Cohort profile: Estonian biobank of the Estonian genome center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).

    Article  PubMed  Google Scholar 

  24. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Bycroft, C. et al. The UK Biobank resource with deep phenoty** and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  26. Michailidou, K. et al. Large-scale genoty** identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  28. Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

    Article  PubMed  Google Scholar 

  29. Hunter-Zinck, H. et al. Genoty** array design and data quality control in the Million Veteran Program. Am. J. Hum. Genet. 106, 535–548 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Karlson, E. W., Boutin, N. T., Hoffnagle, A. G. & Allen, N. L. Building the partners healthcare biobank at partners personalized medicine: informed consent, return of research results, recruitment lessons and operational considerations. J. Pers. Med. 6, 2 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Boutin, N. T. et al. The evolution of a large biobank at Mass General Brigham. J. Pers. Med. 12, 1323 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Machiela, M. et al. GWAS Explorer: an open-source tool to explore, visualize, and access GWAS summary statistics in the PLCO Atlas. Sci. Data 10, 25 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Nagai, A. et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Vlasschaert, C. et al. A practical approach to curate clonal hematopoiesis of indeterminate potential in human genetic datasets. Blood 141, 2214–2223 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Frampton, M. et al. Variation at 3p24. 1 and 6q23. 3 influences the risk of Hodgkin’s lymphoma. Nat. Commun. 4, 2549 (2013).

    Article  PubMed  ADS  Google Scholar 

  37. Berndt, S. I. et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat. Commun. 7, 10933 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  38. Celik, H. et al. JARID2 functions as a tumor suppressor in myeloid neoplasms by repressing self-renewal in hematopoietic progenitor cells. Cancer Cell 34, 741–756 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pattabiraman, D. R. & Gonda, T. J. Role and potential for therapeutic targeting of MYB in leukemia. Leukemia 27, 269–277 (2013).

    Article  CAS  PubMed  Google Scholar 

  40. Schaffner, C., Stilgenbauer, S., Rappold, G. A., Döhner, H. & Lichter, P. Somatic ATM mutations indicate a pathogenic role of ATM in B-cell chronic lymphocytic leukemia. Blood 94, 748–753 (1999).

    Article  CAS  PubMed  Google Scholar 

  41. Zenz, T. et al. TP53 mutation and survival in chronic lymphocytic leukemia. J. Clin. Oncol. 28, 4473–4479 (2010).

    Article  PubMed  Google Scholar 

  42. Catalano, A. et al. The PRKAR1A gene is fused to RARA in a new variant acute promyelocytic leukemia. Blood 110, 4073–4076 (2007).

    Article  CAS  PubMed  Google Scholar 

  43. Loh, P. R., Genovese, G. & McCarroll, S. A. Monogenic and polygenic inheritance become instruments for clonal selection. Nature 584, 136–141 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  44. Luo, Y. et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-map** in HIV host response. Nat. Genet. 53, 1504–1516 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ritari, J., Koskela, S., Hyvärinen, K. & Partanen, J. HLA-disease association and pleiotropy landscape in over 235,000 Finns. Hum. Immunol. 83, 391–398 (2022).

    Article  CAS  PubMed  Google Scholar 

  46. Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  47. Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 52, 969–983 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zhou, W. et al. SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat. Genet. 54, 1466–1469 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Chiorazzi, M. et al. Related F-box proteins control cell death in Caenorhabditis elegans and human lymphoma. Proc. Natl Acad. Sci. USA 110, 3943–3948 (2013).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  50. Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52, 506 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Trubetskoy, V. et al. Map** genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  52. Yang, C. H., Tomkiel, J., Saitoh, H., Johnson, D. H. & Earnshaw, W. C. Identification of overlap** DNA-binding and centromere-targeting domains in the human kinetochore protein CENP-C. Mol. Cell. Biol. 16, 3576–3586 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Du, Y., Topp, C. N. & Dawe, R. K. DNA binding of centromere protein C (CENPC) is stabilized by single-stranded RNA. PLoS Genet. 6, e1000835 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Delaneau, O., Zagury, J. F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  55. Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  56. Zhao, Y. et al. Detection and characterization of male sex chromosome abnormalities in the UK Biobank study. Genet. Med. 24, 1909–1919 (2022).

    Article  CAS  PubMed  Google Scholar 

  57. Zhao, Y. et al. GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health. Nat. Commun. 12, 4178 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  58. Balduzzi, S., Rücker, G. & Schwarzer, G. How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health 22, 153–160 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).

    Article  CAS  PubMed  Google Scholar 

  61. Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015A).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. COVID-19 Host Genetics Initiative. Map** the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).

    Article  Google Scholar 

  63. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  PubMed  Google Scholar 

  66. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  68. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  CAS  PubMed  Google Scholar 

  69. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. GTEx Consortium. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  PubMed Central  Google Scholar 

  73. Võsa, U. et al. Large-scale cis-and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Qi, T. et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 9, 2282. (2018).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  75. Pietzner, M. et al. Map** the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet. 55, 1267–1276 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Gardner, E. J. et al. Damaging missense variants in IGF1R implicate a role for IGF-1 resistance in the aetiology of type 2 diabetes. Cell Genomics 2, 100208 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Zhang, H. et al. A powerful procedure for pathway-based meta-analysis using summary statistics identifies 43 pathways associated with type II diabetes in European populations. PLoS Genet. 12, e1006122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  81. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68 (2015).

    Article  Google Scholar 

  82. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Article  ADS  Google Scholar 

  84. Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Ritari, J. et al. Increasing accuracy of HLA imputation by a population-specific reference panel in a FinnGen biobank cohort. NAR Genomics Bioinformatics 2, lqaa030 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Genovese, G. MoChA WDL pipelines 2022-12-21. Zenodo https://doi.org/10.5281/zenodo.10892520 (2022).

Download references

Acknowledgements

The authors thank J. Karjalainen and M. Cordioli for assistance in GWAS meta-analysis; S. J. Andrews and J. Leinonen for sharing formatted GWAS summary statistics used in genetic correlation analyses; S. Jukarainen and A. Gerussi for insightful discussion on phenome-wide association studies (pheWAS) analyses from a clinical standpoint; S. Jones and M. Kanai for valuable feedback on HLA and fine-map**; J. Koskela and M. Myllymäki for discussion on clonal haematopoiesis; Y. Fu and A. Preussner for discussion on genetic analyses of sex chromosomes; G. Kops for discussion on the mechanism causing chromosome missegregation; A. Kouno and the members of the BBJ Project for supporting the BBJ analyses; and B. Wheeler for his assistance in running the pathway analyses. We acknowledge the participants and investigators of each contributing biobank. The FinnGen project is funded by two grants from Business Finland (HUS 4685/31/2016 and UH 4386/31/2016) and the following industry partners: AbbVie, AstraZeneca UK, Biogen MA, Bristol Myers Squibb (and Celgene Corporation & Celgene International II), Genentech, Merck Sharp & Dohme, Pfizer, GlaxoSmithKline Intellectual Property Development, Sanofi US Services, Maze Therapeutics, Janssen Biotech, Novartis and Boehringer Ingelheim International. The following biobanks are acknowledged for delivering biobank samples to FinnGen: Auria Biobank (www.auria.fi/biopankki), THL Biobank (www.thl.fi/biobank), Helsinki Biobank (www.helsinginbiopankki.fi), Biobank Borealis of Northern Finland (https://www.ppshp.fi/Tutkimus-ja-opetus/Biopankki/Pages/Biobank-Borealis-briefly-in-English.aspx), Finnish Clinical Biobank Tampere (www.tays.fi/en-US/Research_and_development/Finnish_Clinical_Biobank_Tampere), Biobank of Eastern Finland (www.ita-suomenbiopankki.fi/en), Central Finland Biobank (www.ksshp.fi/fi-FI/Potilaalle/Biopankki), Finnish Red Cross Blood Service Biobank (www.veripalvelu.fi/verenluovutus/biopankkitoiminta), Terveystalo Biobank (www.terveystalo.com/fi/Yritystietoa/Terveystalo-Biopankki/Biopankki/) and Arctic Biobank (https://www.oulu.fi/en/university/faculties-and-units/faculty-medicine/northern-finland-birth-cohorts-and-arctic-biobank). All Finnish Biobanks are members of the BBMRI.fi infrastructure (www.bbmri.fi). Finnish Biobank Cooperative (FINBB) (https://finbb.fi/) is the coordinator of BBMRI-ERIC operations in Finland. The Finnish biobank data can be accessed through Fingenious services (https://site.fingenious.fi/en/) managed by FINBB. The work related to EBB was supported by the Estonian Research Council grants PRG1911 and TK (TK214) and the European Union through the European Regional Development Fund Project no. 2014-2020.4.01.15-0012 GENTRANSMED. The EBB data analysis was carried out in part in the High-Performance Computing Center of University of Tartu. For BCAC and MVP, a detailed acknowledgement is available in the Supplementary Information. This work was supported by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, and the Medical Research Council (unit programmes: MC_UU_12015/2, MC_UU_00006/2). G.G. was supported by NIH grants R01 MH104964 and R01 MH123451; A. Ganna was supported by the Academy of Finland (grant no. 323116) and by the European Research Council under the European Union’s Horizon 2020 Research and Innovation Programme (grant no. 945733); P.-R.L. was supported by NIH grant DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, the Next Generation Fund at the Broad Institute of MIT and Harvard, and a Sloan Research Fellowship; J.R.B.P. receives research fundings from GSK; C.T. was supported by Japan Agency for Medical Research and Development (AMED) grants JP21ek0109555, JP21tm0424220, JP21ck0106642, JP22wm0425008, JP23ek0410114 and JP23tm0424225, and Japan Society for the Promotion of Science (JSPS) KAKENHI grant JP20H00462; P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech/Roche and Novartis; and S.P. acknowledges research funding from the mvp000 grant. The manuscript does not necessarily represent the views of the Department of Veterans Affairs or the US Government.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

This project was initiated and led by A.L., G.G., P.-R.L., A. Ganna, J.R.B.P. and M.J.M. A.L. and M.J.M. wrote the first draft of the manuscript with input from all lead authors. A.L. coordinated the analyses of each contributing biobank, conducted across-biobank meta-analysis (including GWAS, allelic shift analysis and pheWAS) and FinnGen-specific analyses, organized post-GWAS analyses, designed and generated all figures and tables (except where noted), and wrote Results, Methods and part of the introduction and Discussion sections. G.G. developed the MoChA pipelines for mLOX calling, GWAS, allelic shift analysis, and X chromosome differential score estimation, guided the analyses of each contributing biobank, performed mLOX calling, GWAS and allelic shift analysis for UKBB and MGB, and wrote the manuscript. Y.Z. performed WES analyses and three-way combined call GWAS in UKBB, generated Supplementary Figs. 2 and 5, prepared Supplementary Tables 8 and 19, and drafted the relevant Results and Methods paragraphs. M.P. developed the Bayesian line model to cluster mLOX and mLOY loci and wrote the relevant Methods paragraph. S.M.Z. performed pheWAS for UKBB and MGB. K.A.K. performed the GWAStoGenes pipeline, prepared Supplementary Table 13, and drafted the relevant Methods paragraphs. Z.Y. estimated heritability and genetic correlations and prepared Supplementary Table 16. K.Y. and L.S. performed the pathway analysis and prepared Supplementary Table 14. C.V. performed the sensitivity analyses for associations with leukaemia in UKBB and prepared Supplementary Table 7. X.L. performed mLOX calling, GWAS, allelic shift analysis and HLA fine-map** replication analysis in BBJ. D.W.B. performed GWAS for PLCO and formatted inputs for blood cell trait heat maps (Fig. 2d and Extended Data Fig. 3b). G.H. performed mLOX calling, GWAS and allelic shift analysis for EBB. B.R.G. and S.P. performed mLOX calling, GWAS, allelic shift analysis and pheWAS for MVP. J.D. performed mLOX calling and GWAS for BCAC. W.Z. performed mLOX calling, GWAS and allelic shift analysis for PLCO. Y.M. participated in BBJ analyses. V.T. and F.-D.P. participated in EBB analyses. M.A., T.P.S. and A. Ghazal participated in FinnGen analyses. W.-Y.H. and N.D.F. participated in PLCO analyses. E.J.G. participated in UKBB WES analyses. V.G.S. assisted in interpretating findings related to clonal haematopoiesis. A.P. coordinated the FinnGen project. H.M.O. advised in the HLA fine-map** analysis and assisted in interpretating findings related to HLA. T.T. assisted in interpretating findings related to skewed X chromosome inactivation and escape from X chromosome inactivation. S.J.C. coordinated the PLCO project. R.M. supervised EBB analyses. P.N. supervised pheWAS for UKBB, MGB and MVP. M.J.D. initialized and conceptualized the mCA project in FinnGen and assisted in interpreting findings, especially those related to mLOY in male participants. A.B. supervised pheWAS in UKBB, MGB and MVP, and the sensitivity analyses for associations with leukaemia in UKBB. S.A.M. supervised the development of MoChA pipelines. C.T. supervised BBJ analyses and advised the HLA fine-map** analysis. P.-R.L., A. Ganna, J.R.B.P. and M.J.M. co-supervised the project, interpreted the findings and wrote the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Aoxing Liu, Giulio Genovese, Po-Ru Loh, Andrea Ganna, John R. B. Perry or Mitchell J. Machiela.

Ethics declarations

Competing interests

G.G., P.-R.L. and S.A.M. declare competing interests: patent application PCT/WO2019/079493 has been filed on the mCA detection method used in this work. J.R.B.P. and E.J.G. are employees and shareholders of Insmed. Y.Z. is a UK University Worker of GSK. A.B. reports scientific advisory board membership for TenSixteen Bio. P.N. reports personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co., Foresite Labs, Genentech/Roche, GV, HeartFlow, Magnet Biomedicine, Merck and Novartis, scientific advisory board membership of Esperion Therapeutics, Preciseli, and TenSixteen Bio, scientific co-founder of TenSixteen Bio, equity in MyOme, Preciseli and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work.

Peer review

Peer review information

Nature thanks Eric Jorgenson and Siddhartha Kar for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Theoretical framework of the mLOX study.

Panel (A) depicts the etiologic process leading to detectable mosaic loss of the X chromosome (mLOX) in females. Detectable age-related mLOX develops only if the mutant haematopoietic stem cell (HSC) survives loss of the X chromosome and the mutation confers a proliferative advantage over normal cells. Panel (B) shows the statistical approaches used to discover the genetic determinants of mLOX. Variants associated with susceptibility to mLOX, acting as either trans or cis factors, are examined using a genome-wide association study (GWAS), for common variants with minor allele frequency (MAF) > 0.1%, and a gene-burden test performed for whole-exome sequencing (WES) data for rare variants with MAF < 0.1%. Among samples with detectable mLOX, allelic shift analysis is used to detect chromosome X alleles exhibiting cis selection, that is, more likely to be clonally selected for when detectable mLOX retains these alleles.

Extended Data Fig. 2 Prevalence of mLOX by age at genoty** in each contributed biobank.

Panel (A) is for all detectable mLOX in peripheral leukocytes, while Panel (B) is restricted to expanded mLOX with cell fraction >5%. Data are presented as mean values +/− SEM.

Extended Data Fig. 3 Allelic shift of chromosome X alleles among mLOX cases.

Panel (A) shows -log10(P) of chromosome X variants from allelic shift analysis by meta-analyzing data of 83,320 mLOX cases from seven biobanks, with lead variants of 44 independent loci highlighted. The y axis is the log scale of P values from a two-sided test and the dashed line denotes the statistical significance after multiple comparison adjustments (5.0 × 10−8, which is the same as the GWAS significance level). Panel (B) is a heat map for associations of 43 allelic shift analysis lead variants with 19 blood cell phenotypes46, with significance levels from the original GWAS expressed by asterisks (*** for two-sided exact P ≤ 0.001, ** for P ≤ 0.01, * for P ≤ 0.05). One variant was dropped due to no appropriate proxy variant available in blood cell phenotype GWAS. The absolute Z scores were cropped to the range of [0−10].

Extended Data Fig. 4 Allelic shift in the context of X chromosome inactivation.

Panel (A) depicts the main mechanism of X chromosome inactivation (**) in females. To compensate for gene dosage imbalances between XX females and XY males, one of the two X chromosomes in females is randomly inactivated early in embryonic development and this inactivation status is passed down to daughter cells. As some females age, the expected 1:1 ratio of inactivated maternal to paternal X chromosome copies can become skewed, if cells harboring one of the active X chromosomes is more frequent than the other. Panel (B) and (C) depict the pattern of allelic shift in mLOX cases in terms of the status of **, with Panel (B) for random ** and panel (C) for skewed **. As mLOX preferentially affects the inactivated X chromosome2, the imbalance between chromosome X alleles in mLOX cases can be seen as the combined cis effects of both skewed ** and mLOX. In other words, the imbalance of chromosome X alleles in mLOX cases could also be shaped by alleles that have cis effects solely on the process of skewed **.

Extended Data Fig. 5 Contribution of each X chromosome allelic shift loci to the prediction of the retained X chromosome in females with mLOX.

We proposed a novel polygenic score including the 44 loci identified from allelic shift analysis to infer the retained X chromosome in detectable mLOX. To avoid overfitting, the effects of the 44 loci were estimated from allelic shift analysis of 56,319 mLOX cases from six biobanks excluding FinnGen while the prediction performance was tested in 27,001 FinnGen mLOX cases. The plot shows the contribution of each of the 44 loci to the prediction, starting with the most significant variants.

Extended Data Table 1 Descriptive characteristics of the eight biobanks contributing to the mLOX analysis

Supplementary information

Supplementary Information

Supplementary Figs. 1–15 and figure legends.

Reporting Summary

Peer Review File

Supplementary Tables 1–25

Supplementary Tables 1–25 and table legends.

Supplementary Note 1

A list of all FinnGen working-group members and their affiliations.

Supplementary Note 2

A list of all Breast Cancer Association Consortium working-group members and their affiliations, funding, and acknowledgements.

Supplementary Note 3

A list of all Million Veteran Program working-group members and their affiliations.

Supplementary Note 4

Million Veteran Program: consortium acknowledgement for manuscripts.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, A., Genovese, G., Zhao, Y. et al. Genetic drivers and cellular selection of female mosaic X chromosome loss. Nature (2024). https://doi.org/10.1038/s41586-024-07533-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41586-024-07533-7

  • Springer Nature Limited

Navigation