Abstract
Key message
Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genoty** methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration.
Abstract
Next-generation sequencing technologies provide powerful and flexible genoty** methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genoty** methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genoty** method and how to integrate genoty** data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genoty** in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genoty** tools. It is our hope that this synthetic view of genoty** methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.
Similar content being viewed by others
References
Adessi C, Matton G, Ayala G, Turcatti G, Mermod JJ, Mayer P, Kawashima E (2000) Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms. Nucleic Acids Res 28(20):e87
Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genoty**. Nat Rev Genet 12(5):363–376. https://doi.org/10.1038/nrg2958
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Zhou X (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 166(2):481–491. https://doi.org/10.1016/j.cell.2016.05.063
Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES (2000) An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513–516
Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM et al (2013) Marker density and read depth for genoty** populations using genoty**-by-sequencing. Genetics 193(4):1073–1081
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES (2007) TASSEL: software for association map** of complex traits in diverse samples. Bioinformatics 23(19):2633–2635
Bukowski R, Guo X, Lu Y, Zou C, He B, Rong Z et al (2015) Construction of the third generation Zea mays haplotype map. bioRxiv
Bybee SM, Bracken-Grissom H, Haynes BD et al (2011) Targeted amplicon sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics. Genome Biol Evolut 3:1312–1323. https://doi.org/10.1093/gbe/evr106
Campbell PJ et al (2008) Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet 40:722–729
Cao J et al (2011) Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nature Genet. https://doi.org/10.1038/ng.911
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Mol Ecol 22(11):3124–3140. https://doi.org/10.1111/mec.12354
Cheng Y et al (2010) Identification of novel SNPs by next generation sequencing of the genomic region containing the APC gene in colorectal cancer patients in China. OMICS 14:315–325
Cheung CY, Thompson EA, Wijsman EM (2013) GIGI: an approach to effective imputation of dense genotypes on large pedigrees. Am J Hum Genet. 92:504–516
Church GM (2006) Genomes for all. Sci Am 294(1):46–54. https://doi.org/10.1038/scientificamerican0106-46
Collard BC, Mackill DJ et al (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc B Biol Sci 363(1491):557–572
Cook DE, Lee TG, Guo X, Melito S, Wang K, Bayless AM, Wang J, Hughes TJ, Willis DK, Clemente TE, Diers BW, Jiang J, Hudson ME, Bent AF (2012) Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338 (6111):1206–1209
Crossa J, Beyene Y, Kassa S, Pérez P, Hickey JM, Chen C et al (2013) Genomic prediction in maize breeding populations with genoty** by-sequencing G3(3):1903–1926. https://doi.org/10.1534/g3.113.008227
Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genoty** using next-generation 666 sequencing. Nature. https://doi.org/10.1038/nrg3012
Deschamps S, Llaca V, May GD (2012) Genoty**-by-sequencing in plants. Biology 1(3):460–483. https://doi.org/10.3390/biology1030460
Di Giusto D, King GC (2003) Single base extension (SBE) with proofreading polymerases and phosphorothioate primers: improved fidelity in single-substrate assays. Nucleic Acids Res 31(3):e7
Edwards SL, Beesley J, French JD, Dunning AM (2013) Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet 93(5):779–797
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES et al (2011) A robust, simple genoty**- by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379. https://doi.org/10.1371/journal.pone.0019379
Esch E, Szymaniak JM, Yates H, Pawlowski WP, Buckler ES (2007) Using crossover breakpoints in recombinant inbred lines to identify quantitative trait loci controlling the global recombination frequency. Genetics 177(3):1851–1858
Fenselau de Felippes F, Schneeberger K, Dezulian T, Huson DH, Weigel D (2008) Evolution of Arabidopsis thaliana microRNAs from random sequences. RNA 14(12):2455–2459
Ganal MW, Polley A, Graner EM, Plieske J, Wieseke R et al (2012) Large SNP arrays for genoty** in crop plants. J Biosci 37(5):821–828
Gao L, Turner MK, Chao S, Kolmer J, Anderson JA (2016) Genome wide association study of seedling and adult plant leaf rust resistance in elite spring wheat breeding lines. PLoS ONE 11(2):e0148671
Gasc C, Peyretaillade E, Peyret P (2016) Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms. Nucleic Acids Res 44(10):4504–4518. https://doi.org/10.1093/nar/gkw309
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q et al (2014) TASSEL-GBS: a high capacity genoty** by sequencing analysis pipeline. PLoS ONE 9(2):e90346. https://doi.org/10.1371/journal.pone.0090346
Glodzik D, Navarro P, Vitart V, Hayward C, McQuillan R et al (2013) Inference of identity by descent in population isolates and optimal sequencing studies. Eur J Hum Genet. 21:1140–1145
Gompert Z, Forister ML, Fordyce JA, Nice CC, Williamson RJ, Buerkle CA (2010) Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies. Mol Ecol 19:2455–2473
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17(333–351):680. https://doi.org/10.1038/nrg.2016.49
Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, Hurwitz BL et al (2009) A first-generation haplotype map of maize. Science 326:1115–1117. https://doi.org/10.1126/science.1177837
Ha NT, Freytag S, Bickeboeller H (2014) Coverage and efficiency in current SNP chips. Eur J Hum Genet 22:1124–1130. https://doi.org/10.1038/ejhg.2013.304
Hao K, Chudin E, McElwee J, Schadt E (2009) Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet 10:27. https://doi.org/10.1186/1471-2156-10-27
Hedges D, Guettouche T, Yang S, Bademci G (2011) Comparison of three targeted enrichment strategies on the SOLiD sequencing platform. PLoS ONE 6:e18595
Hedrick P (2011) Genetics of populations, 4th edn. Jones & Bartlett Learning Press, Boston. ISBN 978-0-7637-5737-3
Hohenlohe PA, Phillips PC, Cresko WA (2010) Using population genomics to detect selection in natural populations: key concepts and methodological considerations. Int J Plant Sci 171:1059–1071
Hormozdiari F, Hajirasouliha IAM, Eichler EE, Sahinalp SC (2011) Simultaneous structural variation discovery in multiple paired-end sequenced genomes. In: Proceedings of RECOMB
Howie B, Marchini J, Stephens M (2011) Genotype imputation with thousands of genomes. G3: genes. Genom Gene 6:457–470
Huang X, Wei X, Sang T, Zhao Q, Feng Q et al (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42:961–967. https://doi.org/10.1038/ng.695
Huang BM, Raghavan C, Mauleon R, Broman KW, Leung H (2014) Efficient imputation of missing markers in low-coverage genoty**-by-sequencing data from multiparental crosses. Genet Soc Am. https://doi.org/10.1534/genetics.113.158014
Hwang S, Kim E, Lee I, Marcotte EM (2015) Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep 17875:693. https://doi.org/10.1038/srep17875
Jamann TM, Sood S, Wisser RJ, Holland JB (2017) High-throughput resequencing of maize landraces at genomic regions associated with flowering time. PLoS ONE 12(1):e0168910. https://doi.org/10.1371/journal.pone.0168910
Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, Lorenz A (2014) Genoty** by sequencing for genomic prediction in a soybean breeding population. BMC Genom 15(1):740
Karki R, Pandya D, Elston RC, Ferlini C (2015) Defining “mutation” and “polymorphism” in the era of personal genomics. BMC Med Genom 8:37. https://doi.org/10.1186/s12920-015-0115-z
Kiialainen A et al (2011) Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery. PLoS ONE 6:e16486
Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, Ossowski S et al (2007) Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat Genet 39:1151–1155
Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P et al (2008) Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet. 40:1068–1075
Kumar S, Banks TW, Cloutier S (2012) SNP discovery through next-generation sequencing and its applications. Int J Plant Genom. https://doi.org/10.1155/2012/831460
Lachance J, Tishkoff SA (2013) SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. BioEssays News Rev Mol Cell Dev Biol 35(9):780–786
LaFramboise T (2009) Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res 37(13):4181–4193
Lam HM, Xu X, Liu X, Chen WB, Yang GH, Wong FL, Li MW, He WM, Qin N, Wang B (2010) Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet 42:1053–1059. https://doi.org/10.1038/ng.715
Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K (2009a) SNP detection for massively parallel whole-genome resequencing. Genome Res. 19(6):1124–1132. https://doi.org/10.1101/gr.088013.108
Li Y, Willer C, Sanna S (2009b) Genotype imputation. Annu Rev Genom Hum Genet 10:387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
Lin T, Zhu G, Zhang J, Xu X, Yu Q et al (2014) Genomic analyses provide insights into the history of tomato breeding. Nat Genet 46:1220–1226. https://doi.org/10.1038/ng.3117
Lynch M (2009) Estimation of allele frequencies from high coverage genome sequencing projects. Genetics 182:295–301
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet. https://doi.org/10.1038/nrg2796
Maron LG, Guimarães CT, Kirst M, Albert PS, Birchler JA et al (2013) Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1220766110
Mascher M, Wu S, Amand PS, Stein N, Poland J (2013a) Application of genoty**-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley. PLoS ONE 8(10):e76925. https://doi.org/10.1371/journal.pone.0076925
Mascher M, Richmond TA, Gerhardt DJ et al (2013b) Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond. Plant J 76(3):494–505. https://doi.org/10.1111/tpj.12294
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet. https://doi.org/10.1038/nrg2626
Mills RE et al (2011) Map** copy number variation at fine scale by population scale genome sequencing. Nature 470:59–65
Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L et al (2016) The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol 17:53
Myllykangas S, Natsoulis G, Bell JM, Ji HP (2011) Targeted sequencing library preparation by genomic DNA circularization. BMC Biotechnol 11:122. https://doi.org/10.1186/1472-6750-11-122
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451
Nishida H, Yoshida T, Kawakami K, Fujita M, Long B, Akashi Y, Laurie DA, Kato K (2013) Structural variation in the 5′ upstream region of photoperiod-insensitive alleles Ppd-A1a and Ppd-B1a identified in hexaploid wheat (Triticum aestivum L.), and their effect on heading time. Mol Breed 31:27–37
Pei YF, Li J, Zhang L, Papasian CJ, Deng HW (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE 3:e3551. https://doi.org/10.1371/journal.pone.0003551
Pérez-de-Castro AM, Vilanova S, Cañizares J, Pascual L, Blanca JM, Díez MJ et al (2012) Application of genomic tools in plant breeding. Curr Genom 13(3):179–195. https://doi.org/10.2174/138920212800543084
Pinkel D, Albertson DG (2005) Comparative genomic hybridization. Annu Rev Genom Hum Genet 6:331–354
Pirooznia M, Kramer M, Parla J, Goes FS, Potash JB, McCombie WR, Zandi PP (2014) Validation and assessment of variant calling pipelines for next-generation sequencing. Human Genom 8(1):14
Poland JA, Rife TW (2012) Genoty**-by-sequencing for plant breeding and genetics. Plant Genome 5:92–102. https://doi.org/10.3835/plantgenome2012.05.0005
Rasheed A, Hao Y, ** platforms: progress, challenges, and perspectives. Mol Plant 10(8):1047–1064. https://doi.org/10.1016/j.molp.2017.06.008
Redon R, Carter NP (2009) Comparative genomic hybridization: microarray design and data interpretation. Methods Mol Biol 529:37–49 (Clifton, N.J.)
Rosato C, Etter P, Kamps-Hughes N, Johnson E (2012) Genoty** on high throughput sequencers: preparation and analysis of reduced representation genomic libraries. J Biomol Tech JBT 23(Suppl):S20
Rutkoski JE, Poland J, Jannink JL, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. Genes Genomes Gene 3:427–439. https://doi.org/10.1534/g3.112.005363
Saxena RK, Edwards D, Varshney RK (2014) Structural variations in plant genomes. Brief Funct Genom 13(4):296–307
Shirasawa K, Kuwata C, Watanabe M, Fukami M, Hirakawa H, Isobe S (2016) Target amplicon sequencing for genoty** genome-wide single nucleotide polymorphisms identified by whole-genome resequencing in peanut. Plant Genome. https://doi.org/10.3835/plantgenome2016.06.0052
Slatkin M (2008) Linkage disequilibrium—understanding the evolutionary past and map** the medical future. Nat Rev Genet 9(6):477–485
Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B et al (2013) An improved genoty** by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genoty**. PLoS ONE 8(1):e54603
Song Q, Hyten DL, Jia G, Quigley CV, Fickus EW, Nelson RL et al (2013) Development and evaluation of SoySNP50 K, a high-density genoty** array for soybean. PLoS ONE 8(1):e54985
Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y et al (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5(11):e1000734
Swaminathan MS (2009) Obituary: norman E. Borlaug (1914–2009) plant scientist who transformed global food production. Nature 461(7266):894
Swarts K, Bauer E, Glaubitz JC, Ho T, Johnson L et al (2016) A large scale joint analysis of flowering time reveals independent temperate adaptations in maize. bioRxiv. https://doi.org/10.1101/086082
Tattini L, D’Aurizio R, Magi A (2015) Detection of genomic structural variants from next-generation sequencing data. Front Bioeng Biotechnol 3:92. https://doi.org/10.3389/fbioe.2015.00092
Tennessen JA, O’Connor TD, Bamshad MJ, Akey JM (2011) The promise and limitations of population exomics for human evolution studies. Genome Biol 12(9):127
The 3,000 Rice Genomes Project (2014) The 3,000 rice genomes project. Giga Sci 3:7. https://doi.org/10.1186/2047-217X-3-7
Torkamaneh D, Belzile F (2015) Scanning and filling: ultra-dense SNP genoty** combining genoty**-by-sequencing, SNP array and whole-genome resequencing data. PLoS ONE 10(7):e0131533. https://doi.org/10.1371/journal.pone.0131533
Torkamaneh D, Laroche J, Belzile F, Candela H (2016) Genome-wide SNP calling from genoty** by sequencing (GBS) data: a comparison of seven pipelines and two sequencing technologies. PLOS ONE 11(8):e0161333
Torkamaneh D, Laroche J, Bastien M, Abed A, Belzile F (2017a) Fast-GBS: a new pipeline for the efficient and highly accurate calling of SNPs from genoty**-by sequencing data. BMC Bioinf. https://doi.org/10.1186/s12859-016-1431-9
Torkamaneh D, Laroche J, Tardivel A, O’Donoughue L, Cober E, Rajcan I, Belzile F (2017b) Comprehensive description of genome-wide nucleotide and structural variation in short-season soybean. Plant Biotechnol J 1–11. https://doi.org/10.1111/pbi.12825
Varela MA, Amos W (2010) Heterogeneous distribution of SNPs in the human genome: microsatellites as predictors of nucleotide diversity and divergence. Genomics. 95(3):151–159. https://doi.org/10.1016/j.ygeno.2009.12.003
Varshney RK, Terauchi R, McCouch SR (2014) Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol 12(6):e1001883. https://doi.org/10.1371/journal.pbio.1001883
Wang Y, **ong G, Hu J, Jiang L, Yu H, Xu J et al (2015) Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet 47:944–948. https://doi.org/10.1038/ng.3346
Wang J, Chu S, Zhang H, Zhu Y, Cheng H, Yu D (2016) Development and application of a novel genome-wide SNP array reveals domestication history in soybean. Sci Rep. https://doi.org/10.1038/srep20728
Ye K, Schulz MH, Long Q, Apweiler R, Ning Z (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25:2865–2871
Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J et al (2015) Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol 33:408–414. https://doi.org/10.1038/nbt.3096
Zhu Q, Zheng X, Luo J, Gaut BS, Ge S (2007) Multi locus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Mol Biol Evol 24:875–888
Acknowledgements
The authors wish to acknowledge the financial support received from Génome Québec, Genome Canada, the Government of Canada, the Ministère de l’Économie, Science et Innovation du Québec, Semences Prograin Inc., Syngenta Canada Inc., Sevita Genetics, Coop Fédérée, Grain Farmers of Ontario, Saskatchewan Pulse Growers, Manitoba Pulse and Soybean Growers, the Canadian Field Crop Research Alliance and Producteurs de grains du Québec.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have declared that no competing interests exist.
Additional information
Communicated by Rajeev K. Varshney.
Rights and permissions
About this article
Cite this article
Torkamaneh, D., Boyle, B. & Belzile, F. Efficient genome-wide genoty** strategies and data integration in crop plants. Theor Appl Genet 131, 499–511 (2018). https://doi.org/10.1007/s00122-018-3056-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-018-3056-z