Abstract
Exploiting the statistical associations coming out from a GWAS experiment to identify and validate candidate genes may be potentially difficult and time consuming. To fill the gap between the identification of candidate genes toward their functional validation onto the trait performance, the prioritization of variants underlying the GWAS-associated regions is necessary. In parallel, recent developments in genomics and statistical methods have been achieved notably in human genetic and they are accordingly being adopted in plant breeding toward the study of the genetic architecture of traits to sustain genetic gains. In this chapter, we aim at providing both theoretical and practical aspects underlying three main options including (1) the MetaGWAS analysis, (2) the statistical fine map** and (3) the integration of functional data toward the identification and validation of candidate genes from a GWAS experiment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. https://doi.org/10.1038/nature11632
Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Chu AY, Estrada K, Luan J, Kutalik Z et al (2014) Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46:1173–1186. https://doi.org/10.1038/ng.3097
Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J et al (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518:197–206. https://doi.org/10.1038/nature14177
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Muliyati NW et al (2010) Genome-wide association study of 107 phenotypes in a common set of Arabidopsis Thaliana inbred lines. Nature 465:627–631. https://doi.org/10.1038/nature08800.Genome-wide
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z et al (2010) Genome-wide association studies of 14 agronomic traits in Rice landraces. Nat Genet 42:961–967. https://doi.org/10.1038/ng.695
Li H, Peng Z, Yang X, Wang W, Fu J, Wang J, Han Y, Chai Y, Guo T, Yang N et al (2013) Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet 45:43–50. https://doi.org/10.1038/ng.2484
Zhou X, Huang X (2019) Genome-wide association studies in Rice: how to solve the low power problems? Mol Plant 12:10–12. https://doi.org/10.1016/j.molp.2018.11.010
Peters U, Hutter CM, Hsu L, Schumacher FR, Conti DV, Carlson CS, Edlund CK, Haile RW, Gallinger S, Zanke BW et al (2012) Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum Genet 131:217–234. https://doi.org/10.1007/s00439-011-1055-0
Zhao J, Sauvage C, Zhao J, Bitton F, Bauchet G, Liu D, Huang S, Tieman DM, Klee HJ, Causse M (2019) Meta-analysis of genome-wide association studies provides insights into genetic control of tomato flavor. Nat Commun 10:1–12. https://doi.org/10.1038/s41467-019-09462-w
Shook JM, Zhang J, Jones SE, Singh A, Diers BW, Singh AK (2021) Meta-GWAS for quantitative trait loci identification in soybean. G3 Genes Genom Genet 11:jkab 117. https://doi.org/10.1093/g3journal/jkab117
Joukhadar R, Thistlethwaite R, Trethowan R, Keeble-Gagnère G, Hayden MJ, Ullah S, Daetwyler HD (2021) Meta-analysis of genome-wide association studies reveal common loci controlling agronomic and quality traits in a wide range of Normal and heat stressed environments. Theor Appl Genet 134:2113–2127. https://doi.org/10.1007/s00122-021-03809-y
Spain SL, Barrett JC (2015) Strategies for fine-map** complex traits. Hum Mol Genet 24:111–119. https://doi.org/10.1093/hmg/ddv260
Evangelou E, Ioannidis JPA (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389. https://doi.org/10.1038/nrg3472
Pasaniuc B, Price AL (2017) Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 18:117–127. https://doi.org/10.1038/nrg.2016.142
Lin DY, Zeng D (2009) Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol 34:60–66. https://doi.org/10.1002/gepi.20435
Zeggini E, Ioannidis JP (2009) Meta-analysis in genome-wide association studies. Pharmacogenomics 10:191–201. https://doi.org/10.2217/14622416.10.2.191
de Bakker PIW, Ferreira MAR, Jia X, Neale BM, Raychaudhuri S, Voight BF (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17:122–128. https://doi.org/10.1093/hmg/ddn288
Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C et al (2009) Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet 41:703–707. https://doi.org/10.1038/ng.381
Franke A, McGovern DPB, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R et al (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 42:1118–1125. https://doi.org/10.1038/ng.717
Han B, Eskin E (2012) Interpreting meta-analyses of genome-wide association studies. PLoS Genet 8:e1002555. https://doi.org/10.1371/journal.pgen.1002555
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529. https://doi.org/10.1371/journal.pgen.1000529
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097. https://doi.org/10.1086/521987
Do C, Waples RS, Peel D, Macbeth GM, Tillett BJ, Ovenden JR (2014) NeEstimator v2: re-implementation of software for the estimation of contemporary effective population size (Ne) from genetic data. Mol Ecol Resour 14:209–214. https://doi.org/10.1111/1755-0998.12157
Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genome wide association scans. Bioinformatics 26:2190–2191. https://doi.org/10.1093/bioinformatics/btq340
Han B, Eskin E (2011) Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet 88:586–598. https://doi.org/10.1016/j.ajhg.2011.04.014
Turner D, Qqman S (2018) An R package for visualizing GWAS results using Q-Q and Manhattan plots. J Open Source Softw 3:005165. https://doi.org/10.21105/joss.00731
Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, Long Q, Nordborg M (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830. https://doi.org/10.1038/ng.2314
Chalhoub B, Denoeud F, Liu S, Parkin IAP, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B et al (2014) Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345:950–953. https://doi.org/10.1126/science.1253435
Schaid DJ, Chen W, Larson NB (2018) From genome-wide associations to candidate causal variants by statistical fine-map**. Nat Rev Genet 19:491–504. https://doi.org/10.1038/s41576-018-0016-z
Muños S, Ranc N, Botton E, Bérard A, Rolland S, Duffé P, Carretero Y, le Paslier M-C, Delalande C, Bouzayen M et al (2011) Increase in tomato Locule number is controlled by two single-nucleotide polymorphisms located near WUSCHEL. Plant Physiol 156:2244–2254. https://doi.org/10.1104/pp.111.173997
Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26:2336–2337. https://doi.org/10.1093/bioinformatics/btq419
Wilson MA, Iversen ES, Clyde MA, Schmidler SC, Schildkraut JM (2010) Bayesian model search and multilevel inference for SNP association studies. Ann Appl Stat 4:1342. https://doi.org/10.1214/09-AOAS322
Guan Y, Stephens M (2011) Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann Appl Stat 5:1780–1815. https://doi.org/10.1214/11-AOAS455
Wang G, Sarkar A, Carbonetto P, Stephens M (2020) A simple new approach to variable selection in regression, with application to genetic fine map**. J R Stat Soc Series B Stat Methodology 82:1273–1300. https://doi.org/10.1111/rssb.12388
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA downloaded from. Science 337:1190–1195. https://doi.org/10.1126/science.1222794
Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (2010) Trait-associated SNPs are more likely to be EQTLs: annotation to enhance discovery from GWAS. PLoS Genet 6:e1000888. https://doi.org/10.1371/journal.pgen.1000888
Kremling KAG, Diepenbrock CH, Gore MA, Buckler ES, Bandillo NB (2019) Transcriptome-wide association supplements genome-wide Association in Zea Mays. G3 Genes Genom Genet 9:3023–3033. https://doi.org/10.1534/g3.119.400549
Cano-Gamez E, Trynka G (2020) From GWAS to function: using functional genomics to identify the mechanisms underlying complex diseases. Front Genet 11:424. https://doi.org/10.3389/fgene.2020.00424
Sullivan A, Purohit PK, Freese NH, Pasha A, Esteban E, Waese J, Wu A, Chen M, Chin CY, Song R et al (2019) An ‘EFP-Seq browser’ for visualizing and exploring RNA sequencing data. Plant J 100:641–654. https://doi.org/10.1111/tpj.14468
Zhu G, Wang S, Huang Z, Zhang S, Liao Q, Zhang C, Lin T, Qin M, Peng M, Yang C et al (2018) Rewiring of the fruit metabolome in tomato breeding. Cell 172:249–261.e12. https://doi.org/10.1016/j.cell.2017.12.019
Lin HY, Liu Q, Li X, Yang J, Liu S, Huang Y, Scanlon MJ, Nettleton D, Schnable PS (2017) Substantial contribution of genetic variation in the expression of transcription factors to phenotypic variation revealed by ERD-GWAS. Genome Biol 18:1–14. https://doi.org/10.1186/s13059-017-1328-6
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KMM, Cao J, Chae E, Dezwaan TMM, Ding W et al (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis Thaliana. Cell 166:481–491. https://doi.org/10.1016/j.cell.2016.05.063
Liu B, Gloudemans MJ, Rao AS, Ingelsson E, Montgomery SB (2019) Abundant associations with gene expression complicate GWAS follow-up. Nat Genet 51:768–769. https://doi.org/10.1038/s41588-019-0404-0
Li D, Liu Q, Schnable PS (2021) TWAS results are complementary to and less affected by linkage disequilibrium than GWAS. Plant Physiol 186:1800–1811. https://doi.org/10.1093/plphys/kiab161
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X et al (2016) A survey of best practices for RNA-Seq data analysis. Genome Biol 17:1–19. https://doi.org/10.1186/s13059-016-0881-8
Stegle O, Parts L, Piipari M, Winn J, Durbin R (2012) Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc 7:500–507. https://doi.org/10.1038/nprot.2011.457
Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, Gore MA, Buckler ES, Zhang Z (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399. https://doi.org/10.1093/bioinformatics/bts444
Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, Ermel R, Ruusalepp A, Quertermous T, Hao K et al (2019) Opportunities and challenges for transcriptome-wide association studies. Nat Genet 51:592–599. https://doi.org/10.1038/s41588-019-0385-z
Grimm DG, Roqueiro D, Salomé PA, Kleeberger S, Greshake B, Zhu W, Liu C, Lippert C, Stegle O, Schölkopf B et al (2017) EasyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. Plant Cell 29:5–19. https://doi.org/10.1105/tpc.16.00551
Umit Seren GWA-Portal (2018) Genome-wide association studies made easy. Methods Mol Biol 1761:303–319
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Albert, E., Sauvage, C. (2022). Identification and Validation of Candidate Genes from Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_15
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2237-7_15
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2236-0
Online ISBN: 978-1-0716-2237-7
eBook Packages: Springer Protocols