A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

Davenport, Clemontina A.; Maity, Arnab; Sullivan, Patrick F.; Tzeng, Jung-Ying

doi:10.1007/s12561-017-9189-9

A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

Published: 24 March 2017

Volume 10, pages 117–138, (2018)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Clemontina A. Davenport¹,
Arnab Maity²,
Patrick F. Sullivan³ &
…
Jung-Ying Tzeng^4,5,6

413 Accesses
5 Citations
Explore all metrics

Abstract

Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a single nucleotide polymorphism-set on multiple, possibly correlated, binary responses. We develop a score-based test using a non-parametric modeling framework that jointly models the global effect of the marker set. We account for the non-linear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrate our methods using the Clinical Antipsychotic Trials of Intervention Effectiveness antibody study data and the CoLaus study data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Modeling the Association Between Clusters of SNPs and Disease Responses

Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record

Article Open access 04 November 2019

Genetic Markers in Clinical Trials

References

Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
Book MATH Google Scholar
Arsenault BJ, Rana JS, Stroes ESG, Desprs J-P, Shah PK, Kastelein JJP, Wareham NJ, Boekholdt SM, Khaw K-T (2010) Beyond low-density lipoprotein cholesterol: respective contributions of nonhigh-density lipoprotein cholesterol levels, triglycerides, and the total cholesterol/high-density lipoprotein cholesterol ratio to coronary heart disease risk in apparently healthy men and women. J Am Coll Cardiol 55:3541
Google Scholar
Austin MA, Hokanson JE, Edwards KL (1998) Hypertriglyceridemia as a cardiovascular risk factor. Am J Cardiol 81:7B12B
Article Google Scholar
Bauer CR, Shankaran S, Bada HS, Lester B, Wright LL, Krause-Steinrauf H, Smeriglio VL, Finnegan LP, Maza PL, Verter J (2002) The maternal lifestyle study: drug exposure during pregnancy and short-term maternal outcomes. Am J Obstet Gynecol 186:487–495
Article Google Scholar
Buhmann MD (2003) Radial basis functions: theory and implementations. Cambridge University Press, Cambridge
Book MATH Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Article Google Scholar
Chen J, Chen W, Zhao N, Wu MC, Schaid DJ (2016) Small-sample kernel association tests for human genetic and microbiome association studies. Genet Epidemiol 40:5–19
Article Google Scholar
Das A, Poole WK, Bada HS (2004) A repeated measures approach for simultaneous modeling of multiple neurobehavioral outcomes in newborns exposed to cocaine in utero. Am J Epidemiol 159:891–899
Article Google Scholar
Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (2001) Executive summary of the third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). JAMA 285:2486–2497
Firmann M, Mayor V, Vidal PM, Bochud M, Pecoud A, Hayoz D, Paccaud F, Preisig M, Song KS, Yuan X et al (2008) The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8:6
Article Google Scholar
Freytag S, Bickeböller H, Amos CI, Kneib T, Schlather M (2012) A novel kernel for correcting size bias in the logistic kernel machine test with an application to rheumatoid arthritis. Hum Hered Hum7:97–108
Article Google Scholar
Girault EM, Foppen E, Ackermans MT, Fliers E, Kalsbeek A (2013) Central administration of an orexin receptor 1 antagonist prevents the stimulatory effect of Olanzapine on endogenous glucose production. Brain Res 1527:238–245
Article Google Scholar
Grundy SM, Cleeman JI, Daniels SR, Donato KA, Eckel RH, Franklin BA, Gordon DJ, Krauss RM, Savage PJ, Smith SC Jr et al (2005) Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation 112:2735–2752
Article Google Scholar
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220
Article MathSciNet MATH Google Scholar
Kralisch S, Klein J, Lossner U, Bluher M, Paschke R, Stumvoll M, Fasshauer M (2005) Isoproterenol, TNFalpha, and insulin downregulate adipose triglyceride lipase in 3T3-L1 adipocytes. Mol Cell Endocrinol 240:43–49
Article Google Scholar
Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008) A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397
Article Google Scholar
Lanckriet GRG, Cristianini N, Bartlett P, El Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
MathSciNet MATH Google Scholar
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834
Article Google Scholar
Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Article MathSciNet MATH Google Scholar
Lieberman JA, Stroup TS, McEvoy JP, Swartz MS, Rosenheck RA, Perkins DO, Keef RSE, Davis SM, Davis CE, Lebowitz BD et al (2005) Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. N Engl J Med 353:1209–1223
Article Google Scholar
Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326
Article MathSciNet MATH Google Scholar
Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Sinha D, Parzen M, Lipshultz S (2009) Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome data. J R Stat Soc 172:3–20
Article MathSciNet Google Scholar
Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multi-dimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics 63:1077–1088
MATH Google Scholar
Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform 9:292
Article Google Scholar
Maity A, Sullivan PF, Tzeng JY (2012) Multivariate phenotype association analysis by marker-set kernel machine regressions. Genet Epidemiol 36:686–695
Article Google Scholar
McCartan C, Mason R, Jayasinghe SR, Griffiths LR (2012) Cardiomyopathy classification: ongoing debate in the genomics era. Biochem Res Int 2012:796926
Article Google Scholar
Miller M, Stone NJ, Ballantyne C, Bittner V, Criqui MH, Ginsberg HN, Goldberg AC, Howard WJ, Jacobson MS, Kris-Etherton PM et al (2011) Triglycerides and cardiovascular disease: a scientific statement from the American Heart Association. Circulation 123:2292–2333
Article Google Scholar
Nam D, Kim SY (2008) Gene-set approach for expression pattern analysis. Brief Bioinform 9:189–197
Article Google Scholar
Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337:100–114
Article Google Scholar
Pan KH, Lih CJ, Cohen SN (2005) Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci USA 102:8961–8965
Article Google Scholar
Shen Y, Zhao Y, Zheng D, Chang X, Ju S, Guo L (2013) Effects of orexin A on GLUT4 expression and lipid content via MAPK signaling in 3T3-L1 adipocytes. J Steroid Biochem Mol Biol 138:376–383
Article Google Scholar
Sikder D, Kodadek T (2007) The neurohormone orexin stimulates hypoxia-inducible factor-1 activity. Genes Dev 21:2995–3005
Article Google Scholar
Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F et al (2008) Genomewide association for schizophrenia in the CATIE study: results of Stage 1. Mol Psychiatry 13:570–584
Article Google Scholar
Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Book MATH Google Scholar
Szafranski M, Grandvalet Y, Rakotomamonjy A (2010) Composite kernel learning. Mach Learn 79:73–103
Article MathSciNet Google Scholar
Tsuneki H, Wada T, Sasaoka T (2012) Role of orexin in the central regulation of glucose and energy homeostasis. Endocr J 59:365–374
Article Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wang X, Lee S, Zhu X, Redline S, Lin X (2013) GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet Epidemiol 37:778–786
Article Google Scholar
Wortley KE, Chang GQ, Davydova Z, Leibowitz SF (2003) Peptides that regulate food intake: orexin gene expression is increased during states of hypertriglyceridemia. Am J Physiol 284:R1454–R1465
Google Scholar
Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942
Article Google Scholar
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am J Hum Genet 89:82–93
Article Google Scholar
Wu M, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel S, Molldrem JJ, Armistead PM (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 37:267–275
Article Google Scholar
Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin W, Lou XY, Cui X, Liu N (2015) A sequence kernel association test for dichotomous traits in family samples under a generalized linear mixed model. Hum Hered 79:60–68
Article Google Scholar
Yolken RH, Torrey EF, Lieberman JA, Yang S, Dickerson FB (2011) Serological evidence of exposure to herpes simplex virus Type 1 is associated with cognitive deficits in the CATIE schizophrenia sample. Schizophr Res 128:61–65
Article Google Scholar
Zhang D, Lin X (2003) Hypothesis testing in semiparametric additive mixed models. Biostatistics 4:57–74
Article MATH Google Scholar
Zhang Y, Xu z, Shen X, Pan W, Alzheimer’s Disease Neuroimaging Initiative (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309–325
Zhao Y, Chen F, Zhai R, Lin X, Diao N (2012) Association test based on SNP set: logistic kernel machine based test vs. principal component analysis. PLoS ONE 7:e44978
Article Google Scholar

Download references

Acknowledgements

The authors thank Dr. Robert Yolken at Johns Hopkins University for providing the antibody data. The authors also thank Drs. Peter Vollenweider and Gerard Waeber, PIs of the CoLaus study, and Drs. Meg Ehm and Matthew Nelson, collaborators at GlaxoSmithKline for providing the CoLaus phenotype and sequence data. This work was supported by National Institutes of Health Grants R00 ES017744 (to A.M.), R01 MH084022 (to J.Y.T. and P.F.S.), and P01 CA142538 (to J.Y.T.).

Author information

Authors and Affiliations

Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, 27707, USA
Clemontina A. Davenport
Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA
Arnab Maity
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Patrick F. Sullivan
Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
Jung-Ying Tzeng
Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
Jung-Ying Tzeng
Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
Jung-Ying Tzeng

Authors

Clemontina A. Davenport
View author publications
You can also search for this author in PubMed Google Scholar
Arnab Maity
View author publications
You can also search for this author in PubMed Google Scholar
Patrick F. Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Ying Tzeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Clemontina A. Davenport.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare.

Appendix

From Sect. 2.1, the parameters ${\varvec{\beta }}$ and h in (1) can be estimated by maximizing the penalized log-likelihood using a Fisher scoring or a Newton–Raphson algorithm. [24] show that, by treating ${\varvec{\beta }}$ as a vector of fixed effects and $\mathbf{h } = (h_{1},\ldots , h_{n})^\mathrm{T}$ as a vector of random effects, the logistic KM estimator is the same as the penalized quasi-likelihood estimator from a logistic mixed model $\text {logit}(p_{i}) = \mathbf{x }_{i}^\mathrm{T} {\varvec{\beta }} + h_{i},$ where $\mathbf{h } \sim N(\mathbf 0 ,\, \tau \mathbf{K }),\, \tau = 1/ \lambda ,\,\lambda $ is the penalty parameter from the penalized likelihood, and $\mathbf{K }$ is a square matrix whose $(i,\,j)$th element is (2). The normal equations given in (5) of [24] coincide with iteratively fitting a working linear mixed model $\widetilde{\mathbf{y }} = \mathbf{X } {\varvec{\beta }} + \mathbf{h } + \varvec{\varepsilon }$ until convergence, where ${\varvec{\beta }}$ and $\mathbf{h }$ are estimated using BLUE and BLUP, respectively and $\varvec{\varepsilon } \sim \text {N}(\mathbf 0 ,\, \mathbf{D })$ where $\mathbf{D } = \text {diag}\{p_{i}(1-p_{i})\}.$ The regularization parameter $\tau $ can be estimated by treating it as a variance component and maximizing the REML criterion

$$\begin{aligned} \ell \approx {-}\frac{1}{2} \text {log }\left| \mathbf{V }_{u}\right| - \frac{1}{2} \text {log }\left| \mathbf{X }^\mathrm{T} \mathbf{V }_{u}^{-1} \mathbf{X }\right| - \frac{1}{2} (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}})^\mathrm{T} \mathbf{V }_{u}^{-1} (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}), \end{aligned}$$

(8)

where $\mathbf{V }_{u} = \mathbf{D }^{-1} + \tau \mathbf{K }$ and $\widetilde{\mathbf{y }} = \mathbf{X } {\varvec{\beta }} + \mathbf{h } + \mathbf{D }^{-1}(\mathbf{y } - \mathbf p ).$ We refer to [24] for full details.

Testing the overall genetic effect $H_{0}{\text {:}}\,h(\mathbf{z }) = 0$ for UV responses is equivalent to testing $H_{0}{\text {:}}\, \tau = 0.$ Liu et al. [24] propose the following score test statistic based on the derivative of (8) with respect to $\tau $

$$\begin{aligned} S = \frac{Q(\widehat{{\varvec{\beta }}}_{0}) - p_{Q}}{\sigma _{Q}}, \end{aligned}$$

(9)

where $Q(\widehat{{\varvec{\beta }}}_{0}) = (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}_{0})^\mathrm{T} \mathbf{D } \mathbf{K } \mathbf{D } (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}_{0}) = (\mathbf{y } - \widehat{\mathbf{p }}_{0})^\mathrm{T} \mathbf{K } (\mathbf{y } - \widehat{\mathbf{p }}_{0}),\, \text {logit} (\widehat{\mathbf{p }}_{0}) = \mathbf{X } \widehat{{\varvec{\beta }}}_{0},\, \widehat{{\varvec{\beta }}}_{0}$ is the MLE of ${\varvec{\beta }}$ under the null logistic model, $p_{Q} = \text {tr}\{\mathbf{P }_{0} \mathbf{K } \},\, \sigma _{Q} = 2 \text {tr}\{\mathbf{P }_{0} \mathbf{K } \mathbf{P }_{0} \mathbf{K } \},$ and $\mathbf{P }_{0} = \mathbf{D }_{0} - \mathbf{D }_{0} \mathbf{X } (\mathbf{X }^\mathrm{T}\mathbf{D }_{0} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} \mathbf{D }_{0}$ where $\mathbf{D }_{0} = \text {diag}\{\hat{p}_{i0}(1-\hat{p}_{i0}) \}.$

From Sect. 2.2.1, we modify the working model in (5). Define $\mathbf{D }$ as the block diagonal matrix with blocks $\mathbf{D }_{1}, \ldots , \mathbf{D }_{t}.$ Then the variance–covariance matrix of $\varvec{\varepsilon }$ is $\mathbf{D }^{-1} \mathbf{D }^{1/2} \mathbf{S } \mathbf{D }^{1/2} \mathbf{D }^{-1} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}$ and the modified working model will have the same form as (5) but with $\varvec{\varepsilon } \sim \text {MVN}(\mathbf 0 ,\, \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}).$ The parameters ${\varvec{\beta }}_{j}$ and $\mathbf{h }_{j}$ can now be estimated using BLUE and BLUP, respectively, and the variance components $\tau _{j}$ can be estimated by maximizing the restricted quasi-likelihood criterion

$$\begin{aligned} \ell \approx {-}\frac{1}{2} \text {log }\left| \mathbf{V }_{m}\right| - \frac{1}{2} \text {log }\left| \mathbf{X }^\mathrm{T} \mathbf{V }_{m}^{-1} \mathbf{X }\right| - \frac{1}{2} \left( \begin{array}{c} \widetilde{\mathbf{y }}_{1} - \mathbf{X } \widehat{{\varvec{\beta }}}_{1} \\ \vdots \\ \widetilde{\mathbf{y }}_{t} - \mathbf{X } \widehat{{\varvec{\beta }}}_{t} \end{array}\right) ^\mathrm{T} \mathbf{V }_{m}^{-1} \left( \begin{array}{c} \widetilde{\mathbf{y }}_{1} - \mathbf{X } \widehat{{\varvec{\beta }}}_{1} \\ \vdots \\ \widetilde{\mathbf{y }}_{t} - \mathbf{X } \widehat{{\varvec{\beta }}}_{t} \end{array}\right) , \end{aligned}$$

(10)

where $\mathbf{V }_{m} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2} + \mathbf{V }_{h}.$

The main goal is to test for genetic pathway effects $H_{0}{\text {:}}\, \mathbf{h }(\cdot ) = \mathbf 0 $ which is equivalent to testing $H_{0}{\text {:}}\, \tau _{1} = \cdots = \tau _{t} = 0.$ To do this, we propose a score-type test statistic based on the derivative of the quasi-likelihood like that in (10). Taking the derivative of the criterion in (10) with respect to $\tau _{j}$ for $j=1,\ldots , t$ and then setting $\tau _{j} = 0,$ the score function for $\tau _{j}$ is $S_{j} = Q_{j}({\varvec{\beta }},\,\varvec{\theta }) - p_{jQ},$ where

$$\begin{aligned} Q_{j}({\varvec{\beta }},\, \varvec{\theta }) = (\mathbf{y }-\widehat{\mathbf{p }})^\mathrm{T} \left( \mathbf{S } \mathbf{D }^{1/2}\right) ^{-1} \mathbf{D }^{1/2} \mathbf{K }_{j}^{*} \mathbf{D }^{1/2} \left( \mathbf{D }^{1/2} \mathbf{S }\right) ^{-1} (\mathbf{y }-\widehat{\mathbf{p }} ), \end{aligned}$$

and $p_{jQ} = \text {tr}\{ \mathbf{P } \mathbf{K }_{j}^{*} \}.$ Because the $\tau _{j}$’s are considered as variance components and thus are non-negative, testing $H_{0}{\text {:}}\,\tau _{1} = \cdots = \tau _{t} = 0$ is equivalent to testing $H_{0}{\text {:}}\,\tau _{1} + \cdots + \tau _{t} = 0$ and we adopt a similar technique to [25].

From Sect. 2.2.2, in order to evaluate Q in (6), we estimate ${\varvec{\beta }}$ and $\varvec{\theta }$ under the null using GEEs. We posit the GEEs under $H_{0}$

$$\begin{aligned} \widetilde{\mathbf{X }}^\mathrm{T} \mathbf{D } \mathbf{V }_{m0}^{-1} (\mathbf{y }-\mathbf{p }) = \mathbf 0 , \end{aligned}$$

(11)

where $\widetilde{\mathbf{X }}$ is an $nt \times nt$ block diagonal matrix with elements $\mathbf{X }$ and $\mathbf{V }_{m0} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}.$ If there is no genetic pathway effect ($H_{0}{\text {:}}\,\mathbf{h }(\cdot ) = \mathbf 0 $ is true), then $\mathbf {V} _{m0}$ is the working variance–covariance matrix of $\mathbf{y }.$ To solve (11), Liang and Zeger [19] suggest using a modified Fisher scoring algorithm to find ${\varvec{\beta }}$ and a method of moments estimation for $\varvec{\theta }.$ The updating equation is

$$\begin{aligned} \widehat{{\varvec{\beta }}}^{(k+1)} = \widehat{{\varvec{\beta }}}^{(k)} + \left( \widetilde{\mathbf{X }}^\mathrm{T} \widehat{\mathbf{D }} \widehat{\mathbf{V }}_{m0}^{-1} \widehat{\mathbf{D }} \widetilde{\mathbf{X }}\right) ^{-1}\widetilde{\mathbf{X }}^\mathrm{T} \widehat{\mathbf{D }} \widehat{\mathbf{V }}_{m0}^{-1} (\mathbf{y } - \widehat{\mathbf{p }}). \end{aligned}$$

The initial estimates in the first iteration come from fitting a GLM assuming independence.

We then use a simulation-based technique to get the p-values of Q. It can be shown that, under $H_0,$ var$(\mathbf{y }_{j} - \widehat{\mathbf{p }}_{j})$ can be approximated as $\widehat{\mathbf{D }}_{j} - \widehat{\mathbf{D }}_{j} \mathbf{X } (\mathbf{X }^\mathrm{T} \widehat{\mathbf{D }}_{j} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} \widehat{\mathbf{D }}_{j} = \widehat{\mathbf{P }}_{j}.$ This follows from the fact that under $H_0,\,\mathbf{y }_{j} - \widehat{\mathbf{p }}_{j} = \mathbf{D }_{j}( \widetilde{\mathbf{y }}_{j} - \mathbf{X } \widehat{{\varvec{\beta }}}_{j}),$ and that $\text {Cov}(\widetilde{\mathbf{y }}_{j} - \mathbf{X } \widehat{{\varvec{\beta }}}_{j}) = \mathbf{D }_{j}^{-1} - {\mathbf{D }}_{j}^{-1} \mathbf{X } (\mathbf{X }^\mathrm{T} {\mathbf{D }}_{j} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} {\mathbf{D }}_{j}^{-1}$ from linear model theory. If the multiple outcomes were independent, then the variance–covariance matrix of $\mathbf{y } - \widehat{\mathbf{p }}$ would be $\widehat{\mathbf{P }},$ a block diagonal matrix with elements $\widehat{\mathbf{P }}_{j},$ but since the outcomes are correlated, the variance–covariance matrix of $\mathbf{y } - \widehat{\mathbf{p }}$ is $\widehat{\mathbf{P }}^{1/2} \widehat{\mathbf{S }} \widehat{\mathbf{P }}^{1/2},$ which is no longer block diagonal. By defining $\widehat{\mathbf{M }} = \widehat{\mathbf{P }}^{1/2} \widehat{\mathbf{S }} \widehat{\mathbf{P }}^{1/2},$ (6) can be rewritten as $Q = \{ \widehat{\mathbf{M }}^{-1/2}({y}- \widehat{\mathbf{p }})\}^\mathrm{T} \widehat{\mathbf{B }} \{ \widehat{\mathbf{M }} ^{1/2} (\mathbf{y }-\widehat{\mathbf{p }}) \}$ where $\widehat{\mathbf{B }} = \widehat{\mathbf{M }}^{1/2} ( \widehat{\mathbf{S }} \widehat{\mathbf{D }}^{1/2} )^{-1} \widehat{\mathbf{D }}^{1/2} {K} \widehat{\mathbf{D }}^{1/2} ( \widehat{\mathbf{D }}^{1/2} \widehat{\mathbf{S }})^{-1} \widehat{\mathbf{M }}^{1/2}.$ Using eigenvalue decomposition, we can write $\widehat{\mathbf{B }} = \mathbf{U } {\varvec{\Lambda }} \mathbf{U }^\mathrm{T}$, where $\mathbf{U }$ is the matrix of orthogonal eigenvectors of $\widehat{\mathbf{B }}$ and ${\varvec{\Lambda }}$ is a diagonal matrix whose elements are the corresponding eigenvalues. Thus $Q = \widehat{\mathbf{R }}^\mathrm{T} {\varvec{\Lambda }} \widehat{\mathbf{R }},$ where $\widehat{\mathbf{R }} = \mathbf{U }^\mathrm{T} \widehat{\mathbf{M }}^{-1/2} (\mathbf{y } - \widehat{\mathbf{p }}).$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davenport, C.A., Maity, A., Sullivan, P.F. et al. A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression. Stat Biosci 10, 117–138 (2018). https://doi.org/10.1007/s12561-017-9189-9

Download citation

Received: 05 January 2016
Revised: 20 December 2016
Accepted: 15 March 2017
Published: 24 March 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s12561-017-9189-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling the Association Between Clusters of SNPs and Disease Responses

Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record

Genetic Markers in Clinical Trials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling the Association Between Clusters of SNPs and Disease Responses

Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record

Genetic Markers in Clinical Trials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation