Abstract
Estimating genetic variance is traditionally performed using pedigree analysis. Using high-throughput DNA marker data measured across the entire genome it is now possible to estimate and partition genetic variation from population samples. In this chapter, we introduce methods and a software tool called Genome-wide Complex Trait Analysis (GCTA) to estimate genomic relationships between pairs of conventionally unrelated individuals using genome-wide single nucleotide polymorphism (SNP) data, to estimate variance explained by all SNPs simultaneously on genomic or chromosomal segments or over the whole genome, and to perform a joint and conditional multiple SNPs association analysis using summary statistics from a meta-analysis of genome-wide association studies and linkage disequilibrium between SNPs estimated from a reference sample.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hindorff LA, Sethupathy P, Junkins HA et al (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106(23):9362–9367
Maher B (2008) Personal genomes: the case of the missing heritability. Nature 456(7218):18–21
Yang J, Benyamin B, McEvoy BP et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Yang J, Manolio TA, Pasquale LR et al (2011) Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 43(6):519–525
Davies G, Tenesa A, Payton A et al (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol Psychiatry 16(10):996–1005
Deary IJ, Yang J, Davies G et al (2012) Genetic contributions to stability and change in intelligence from childhood to old age. Nature 482(7384):212–215
Lee SH, Decandia TR, Ripke S et al (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44(3):247–250
Gibson G (2010) Hints of hidden heritability in GWAS. Nat Genet 42(7):558–560
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24
Teslovich TM, Musunuru K, Smith AV et al (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466(7307):707–713
Heid IM, Jackson AU, Randall JC et al (2010) Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42(11):949–960
Lango Allen H, Estrada K, Lettre G et al (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467(7317):832–838
Speliotes EK, Willer CJ, Berndt SI et al (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42(11):937–948
Ripke S, Sanders AR, Kendler KS et al (2011) Genome-wide association study identifies five new schizophrenia loci. Nat Genet 43(10):969–976
Yang J, Ferreira T, Morris AP et al (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44(4):369–375
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88(1):76–82
Hayes BJ, Visscher PM, Goddard ME (2009) Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res 91(1):47–60
Strandén I, Garrick DJ (2009) Technical note: derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci 92(6):2971–2975
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423
Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58(3):545–554
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Lee SH, van der Werf JH (2006) An efficient variance component approach implementing an average information REML suitable for combined LD and linkage map** with a general complex pedigree. Genet Sel Evol 38(1):25–43
Jorjani H, Klei L, Emanuelson U (2003) A simple method for weighted bending of genetic (co)variance matrices. J Dairy Sci 86(2):677–679
Hill WG, Thompson R (1978) Probabilities of non-positive definite between-group or genetic covariance matrices. Biometrics 34:429–439
Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 2:2–19
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, MA
Falconer DS (1965) The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet 29:51–71
Dempster ER, Lerner IM (1950) Heritability of threshold characters. Genetics 35(2):212–236
Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88(3):294–305
Price AL, Weale ME, Patterson N et al (2008) Long-range LD can confound genome scans in admixed populations. Am J Hum Genet 83(1):132–135
Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameters estimation in linear mixed models. Biometrics 51:1440–1450
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix A
Appendix A
In Eq. 4, we have
of which Eq. 2 is a special case with r = 1. By default in GCTA, we use the average information (AI) REML algorithm [31] to obtain the estimates the variance components \( \sigma_i^2 \) and \( \sigma_{\mathrm{ e}}^2 \) through iteration. In the tth iteration, \( {{\mathbf{q}}^{(t) }}={{\mathbf{q}}^{(t-1) }}+{{({{\mathbf{H}}^{(t-1) }})}^{-1 }}\frac{{\partial L}}{{\partial \mathbf{q}}}|{{\mathbf{q}}^{(t-1) }} \), where \( \mathbf{q} \) is a vector of the estimates of variance components (\( \hat{\sigma}_1^2 \), …, \( \hat{\sigma}_r^2 \) and \( \hat{\sigma}_{\mathrm{ e}}^2 \)); L is the log likelihood function of the mixed linear model (ignoring the constant), \( L=-1/2(\log |\hat{\mathbf{V}} |+\log |{\mathbf{X}}^{\prime}{{\hat{\mathbf{V}}}^{-1 }\bf X}|+\mathbf{y} \mathbf{^{\prime}\bf Py}) \) with \( \hat{\mathbf{V}} =\sum\limits_{i=1}^r {{{\mathbf{A}}_i}\hat{\sigma}_i^{2(t-1) }} +\mathbf{I}\hat{\sigma}_e^{2(t-1) } \) and \( \mathbf{P}={{\hat{\mathbf{V}}}^{-1 }}-{{\hat{\mathbf{V}}}^{-1 }}\mathbf{X}{{({\mathbf{X}}^{\prime}{{\hat{\mathbf{V}}}^{-1 }}\mathbf{X})}^{-1 }}{\mathbf{X}}^{\prime}{{\hat{\mathbf{V}}}^{-1 }} \) ; H is the average of the observed and expected information matrices [22],
and \( \frac{{\partial L}}{{\partial \mathbf{q}}} \) is a vector of first derivatives of the log likelihood function with respect to each variance component,
We also provide in GCTA two optional algorithms to estimate the variance components, which we call the direct REML and EM-REML. For the direct REML algorithm, the variance components in the tth iteration are estimated as
The direct REML algorithm is generally more robust but computationally less efficient than AI-REML. For the EM-REML algorithm, each variance component is estimated as
The EM-REML is robust, which guarantees increased likelihood after each iteration, but is extremely slow to converge. We therefore do not recommend choosing EM-REML in GCTA unless we know that the starting values are very close to the estimates. The GCTA option for choosing different REML algorithm is --reml-alg with the input value 0 for AI-REML (default), 1 for the direct REML algorithm and 2 for EM-REML. At the beginning of the iteration process, all the variance components are initialized by an arbitrary value, i.e., \( \sigma_i^{2(0) }=\sigma_{\mathrm{ P}}^2/(r+1) \), which is subsequently updated by the EM-REML algorithm \( \sigma_i^{2(1) }=[\sigma_i^{4(0)}\mathbf{y} \mathbf{^{\prime}\bf P}{{\mathbf{A}}_i}\mathbf{P}\mathbf{y}+\mathrm{ tr}(\sigma_i^{2(0)}\mathbf{I}-\sigma_i^{4(0)}\mathbf{P}{{\mathbf{A}}_i})]/n \). The EM-REML algorithm is used as an initial step to determine the direction of the iteration updates because it is robust to poor starting values. We also provide options (--reml-priors and --reml-priors-var) in GCTA for users to specify starting values. After one EM-REML iteration, GCTA switches to the AI-REML algorithm (or the other two algorithms) for the remaining iterations until the iteration converges with the criteria of L (t) − L (t−1) < 10−4 where L (t) is the log likelihood of the tth iteration. By default, any variance component that escapes from the parameter space (i.e., its estimate is negative) will be set to \( {10^{-6 }} \times \sigma_{\mathrm{ P}}^2 \). If a component keeps esca** from the parameter space, it will be constrained at \( {10^{-6 }} \times \sigma_{\mathrm{ P}}^2 \). There is an option in GCTA (--reml-no-constrain) that allows the estimates of variance components to be negative. This is justified because if a parameter is zero, an unbiased estimate of this parameter will have half chance being negative. In practice, however, a negative variance component is usually difficult to interpret. We also provide an option (--reml-maxit) for users to specify the maximum number of iterations at which the iteration process will stop without convergence.
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Yang, J., Lee, S.H., Goddard, M.E., Visscher, P.M. (2013). Genome-Wide Complex Trait Analysis (GCTA): Methods, Data Analyses, and Interpretations. In: Gondro, C., van der Werf, J., Hayes, B. (eds) Genome-Wide Association Studies and Genomic Prediction. Methods in Molecular Biology, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0_9
Download citation
DOI: https://doi.org/10.1007/978-1-62703-447-0_9
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-446-3
Online ISBN: 978-1-62703-447-0
eBook Packages: Springer Protocols