Abstract
Whole genome duplications (WGD) are frequent in many plant lineages; however, ploidy level variation is unknown in most species. The most widely used methods to estimate ploidy levels in plants are chromosome counts, which require living specimens, and flow cytometry estimates, which necessitate living or relatively recently collected samples. Newly described bioinformatic methods have been developed to estimate ploidy levels using high-throughput sequencing data, and these have been optimized in plants by calculating allelic ratio values from target capture data. This method relies on the maintenance of allelic ratios from the genome to the sequence data. For example, diploid organisms will generate allelic data in a 1:1 proportion, with an increasing number of possible allelic ratio combinations occurring in individuals with higher ploidy levels. In this chapter, we explain step-by-step this bioinformatic approach for the estimation of ploidy level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Šmarda P (2008) DNA ploidy level variability of some fescues (Festuca subg. Festuca, Poaceae) from Central and Southern Europe measured in fresh plants and herbarium specimens. Biologia 63:349–367
Díaz-Lifante Z, Andrés Camacho C, Viruel J, Cabrera Caballero A (2009) The allopolyploid origin of Narcissus obsoletus (Alliaceae): identification of parental genomes by karyotype characterization and genomic in situ hybridization. Bot J Lin Soc 159:477–498
Patel N, Medina R, Johnson M, Goffinet B (2021) Karyotypic diversity and cryptic speciation: have we vastly underestimated moss species diversity? Bryophyte Diver Evol 43:150–163
Sliwinska E, Loureiro J, Leitch IJ, Šmarda P, Bainard J, Bureš P, Chumová Z, Horová L, Koutecký P, Lučanová M, Trávníček P, Galbraith DW (2021) Application-based guidelines for best practices in plant flow cytometry. Cytom A 101:749. https://doi.org/10.1002/cyto.a.24499
Suda J, Krahulcová A, Trávníček P, Krahulec F (2006) Ploidy level versus DNA ploidy level: an appeal for consistent terminology. Taxon 55:447–450
Farhat P, Hidalgo O, Robert T, Siljak-Yakovlev S, Leitch IJ, Adams RP, Bou Dagher-Kharrat M (2019) Polyploidy in the conifer genus Juniperus: an unexpectedly high rate. Front Plant Sci 10:676
Farhat P, Siljak-Yakovlev S, Hidalgo O, Rushforth K, Bartel JA, Valentin N, Leitch IJ, Adams RP (2021) Polyploidy in Cupressaceae: discovery of a new naturally occurring tetraploid, Xanthocyparis vietnamensis. J Syst Evol. https://doi.org/10.1111/jse.12751
Viruel J, Conejero M, Hidalgo O, Pokorny L, Powell RF, Forest F, Kantar MB, Soto Gomez M, Graham SW, Gravendeel B, Wilkin P, Leitch IJ (2019) A target capture-based method to estimate ploidy from herbarium specimens. Front Plant Sci 10:937
Baack EJ, Whitney KD, Rieseberg LH (2005) Hybridization and genome size evolution: timing and magnitude of nuclear DNA content increases in Helianthus homoploid hybrid species. New Phytol 167:623–630
Ungerer MC, Strakosh SC, Zhen Y (2006) Genome expansion in three hybrid sunflower species is associated with retrotransposon proliferation. Curr Biol 16:R872–R873
Sánchez-Jiménez I, Pellicer J, Hidalgo O, Garcia S, Garnatje T, Vallès J (2009) Chromosome numbers in three Asteraceae tribes from Inner Mongolia (China), with genome size data for Cardueae. Folia Geobot 44:307–322
Sánchez-Jiménez I, Lazkov GA, Hidalgo O, Garnatje T (2010) Molecular systematics of Echinops L. (Asteraceae, Cynareae): a phylogeny based on ITS and trnL-trnF sequences with emphasis on sectional delimitation. Taxon 59:698–708
Brewer GE, Clarkson JJ, Maurin O, Zuntini AR, Barber V, Bellot S, Biggs N, Cowan RS, Davies NM, Dodsworth S, Edwards SL (2019) Factors affecting targeted sequencing of 353 nuclear genes from herbarium specimens spanning the diversity of angiosperms. Front Plant Sci 10:1102
Kates HR, Doby JR, Siniscalchi CM, LaFrance R, Soltis DE, Soltis PS, Guralnick RP, Folk RA (2021) The effects of herbarium specimen characteristics on short-read NGS sequencing success in nearly 8000 specimens: old, degraded samples have lower DNA yields but consistent sequencing success. Front Plant Sci 12:669064
Soto Gomez M, Pokorny L, Kantar MB, Forest F, Leitch IJ, Gravendeel B, Wilkin P, Graham SW, Viruel J (2019) A customized nuclear target enrichment approach for develo** a phylogenomic baseline for Dioscorea yams (Dioscoreaceae). Appl Plant Sci 7:e11254
Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Wong GK-S, Baker WJ, Wickett NJ (2019) A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-Medoids clustering. Syst Biol 68:594–606
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Chen S, Zhou Y, Chen Y, Gu J (2018) Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890
Johnson MG, Gardner EM, Liu Y, Medina R, Goffinet B, Shaw AJ, Zerega NJC, Wickett NJ (2016) HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci 4:1600016
Li H, Durbin R (2009) Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25:1754–1760
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Weiß CL, Pais M, Cano LM, Kamoun S, Burbano HA (2018) nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinform 19:122
R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
RStudio Team (2020) RStudio: integrated development for R. RStudio, PBC, Boston. http://www.rstudio.com/
Wickham H, François R, Henry L, Müller K (2018) dplyr: a grammar of data manipulation. R package version 0.7.6. https://CRAN.R-project.org/package=dplyr
Johnson P (2020) devEMF: EMF graphics output device. R package version 4.0-2. https://CRAN.R-project.org/package=devEMF
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Viruel, J. et al. (2023). A Bioinformatic Pipeline to Estimate Ploidy Level from Target Capture Sequence Data Obtained from Herbarium Specimens. In: Heitkam, T., Garcia, S. (eds) Plant Cytogenetics and Cytogenomics. Methods in Molecular Biology, vol 2672. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3226-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3226-0_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3225-3
Online ISBN: 978-1-0716-3226-0
eBook Packages: Springer Protocols