A Bioinformatic Pipeline to Estimate Ploidy Level from Target Capture Sequence Data Obtained from Herbarium Specimens

  • Protocol
  • First Online:
Plant Cytogenetics and Cytogenomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2672))

Abstract

Whole genome duplications (WGD) are frequent in many plant lineages; however, ploidy level variation is unknown in most species. The most widely used methods to estimate ploidy levels in plants are chromosome counts, which require living specimens, and flow cytometry estimates, which necessitate living or relatively recently collected samples. Newly described bioinformatic methods have been developed to estimate ploidy levels using high-throughput sequencing data, and these have been optimized in plants by calculating allelic ratio values from target capture data. This method relies on the maintenance of allelic ratios from the genome to the sequence data. For example, diploid organisms will generate allelic data in a 1:1 proportion, with an increasing number of possible allelic ratio combinations occurring in individuals with higher ploidy levels. In this chapter, we explain step-by-step this bioinformatic approach for the estimation of ploidy level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Šmarda P (2008) DNA ploidy level variability of some fescues (Festuca subg. Festuca, Poaceae) from Central and Southern Europe measured in fresh plants and herbarium specimens. Biologia 63:349–367

    Article  Google Scholar 

  2. Díaz-Lifante Z, Andrés Camacho C, Viruel J, Cabrera Caballero A (2009) The allopolyploid origin of Narcissus obsoletus (Alliaceae): identification of parental genomes by karyotype characterization and genomic in situ hybridization. Bot J Lin Soc 159:477–498

    Google Scholar 

  3. Patel N, Medina R, Johnson M, Goffinet B (2021) Karyotypic diversity and cryptic speciation: have we vastly underestimated moss species diversity? Bryophyte Diver Evol 43:150–163

    Google Scholar 

  4. Sliwinska E, Loureiro J, Leitch IJ, Šmarda P, Bainard J, Bureš P, Chumová Z, Horová L, Koutecký P, Lučanová M, Trávníček P, Galbraith DW (2021) Application-based guidelines for best practices in plant flow cytometry. Cytom A 101:749. https://doi.org/10.1002/cyto.a.24499

    Article  Google Scholar 

  5. Suda J, Krahulcová A, Trávníček P, Krahulec F (2006) Ploidy level versus DNA ploidy level: an appeal for consistent terminology. Taxon 55:447–450

    Article  Google Scholar 

  6. Farhat P, Hidalgo O, Robert T, Siljak-Yakovlev S, Leitch IJ, Adams RP, Bou Dagher-Kharrat M (2019) Polyploidy in the conifer genus Juniperus: an unexpectedly high rate. Front Plant Sci 10:676

    Article  PubMed  PubMed Central  Google Scholar 

  7. Farhat P, Siljak-Yakovlev S, Hidalgo O, Rushforth K, Bartel JA, Valentin N, Leitch IJ, Adams RP (2021) Polyploidy in Cupressaceae: discovery of a new naturally occurring tetraploid, Xanthocyparis vietnamensis. J Syst Evol. https://doi.org/10.1111/jse.12751

  8. Viruel J, Conejero M, Hidalgo O, Pokorny L, Powell RF, Forest F, Kantar MB, Soto Gomez M, Graham SW, Gravendeel B, Wilkin P, Leitch IJ (2019) A target capture-based method to estimate ploidy from herbarium specimens. Front Plant Sci 10:937

    Article  PubMed  PubMed Central  Google Scholar 

  9. Baack EJ, Whitney KD, Rieseberg LH (2005) Hybridization and genome size evolution: timing and magnitude of nuclear DNA content increases in Helianthus homoploid hybrid species. New Phytol 167:623–630

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ungerer MC, Strakosh SC, Zhen Y (2006) Genome expansion in three hybrid sunflower species is associated with retrotransposon proliferation. Curr Biol 16:R872–R873

    Article  CAS  PubMed  Google Scholar 

  11. Sánchez-Jiménez I, Pellicer J, Hidalgo O, Garcia S, Garnatje T, Vallès J (2009) Chromosome numbers in three Asteraceae tribes from Inner Mongolia (China), with genome size data for Cardueae. Folia Geobot 44:307–322

    Article  Google Scholar 

  12. Sánchez-Jiménez I, Lazkov GA, Hidalgo O, Garnatje T (2010) Molecular systematics of Echinops L. (Asteraceae, Cynareae): a phylogeny based on ITS and trnL-trnF sequences with emphasis on sectional delimitation. Taxon 59:698–708

    Article  Google Scholar 

  13. Brewer GE, Clarkson JJ, Maurin O, Zuntini AR, Barber V, Bellot S, Biggs N, Cowan RS, Davies NM, Dodsworth S, Edwards SL (2019) Factors affecting targeted sequencing of 353 nuclear genes from herbarium specimens spanning the diversity of angiosperms. Front Plant Sci 10:1102

    Article  PubMed  PubMed Central  Google Scholar 

  14. Kates HR, Doby JR, Siniscalchi CM, LaFrance R, Soltis DE, Soltis PS, Guralnick RP, Folk RA (2021) The effects of herbarium specimen characteristics on short-read NGS sequencing success in nearly 8000 specimens: old, degraded samples have lower DNA yields but consistent sequencing success. Front Plant Sci 12:669064

    Article  PubMed  PubMed Central  Google Scholar 

  15. Soto Gomez M, Pokorny L, Kantar MB, Forest F, Leitch IJ, Gravendeel B, Wilkin P, Graham SW, Viruel J (2019) A customized nuclear target enrichment approach for develo** a phylogenomic baseline for Dioscorea yams (Dioscoreaceae). Appl Plant Sci 7:e11254

    Article  PubMed  PubMed Central  Google Scholar 

  16. Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Wong GK-S, Baker WJ, Wickett NJ (2019) A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-Medoids clustering. Syst Biol 68:594–606

    Article  CAS  PubMed  Google Scholar 

  17. Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen S, Zhou Y, Chen Y, Gu J (2018) Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890

    Article  PubMed  PubMed Central  Google Scholar 

  20. Johnson MG, Gardner EM, Liu Y, Medina R, Goffinet B, Shaw AJ, Zerega NJC, Wickett NJ (2016) HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci 4:1600016

    Article  Google Scholar 

  21. Li H, Durbin R (2009) Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  23. Weiß CL, Pais M, Cano LM, Kamoun S, Burbano HA (2018) nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinform 19:122

    Article  Google Scholar 

  24. R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  25. RStudio Team (2020) RStudio: integrated development for R. RStudio, PBC, Boston. http://www.rstudio.com/

  26. Wickham H, François R, Henry L, Müller K (2018) dplyr: a grammar of data manipulation. R package version 0.7.6. https://CRAN.R-project.org/package=dplyr

  27. Johnson P (2020) devEMF: EMF graphics output device. R package version 4.0-2. https://CRAN.R-project.org/package=devEMF

  28. Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Viruel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Viruel, J. et al. (2023). A Bioinformatic Pipeline to Estimate Ploidy Level from Target Capture Sequence Data Obtained from Herbarium Specimens. In: Heitkam, T., Garcia, S. (eds) Plant Cytogenetics and Cytogenomics. Methods in Molecular Biology, vol 2672. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3226-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3226-0_5

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3225-3

  • Online ISBN: 978-1-0716-3226-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation