Log in

A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a single nucleotide polymorphism-set on multiple, possibly correlated, binary responses. We develop a score-based test using a non-parametric modeling framework that jointly models the global effect of the marker set. We account for the non-linear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrate our methods using the Clinical Antipsychotic Trials of Intervention Effectiveness antibody study data and the CoLaus study data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  2. Arsenault BJ, Rana JS, Stroes ESG, Desprs J-P, Shah PK, Kastelein JJP, Wareham NJ, Boekholdt SM, Khaw K-T (2010) Beyond low-density lipoprotein cholesterol: respective contributions of nonhigh-density lipoprotein cholesterol levels, triglycerides, and the total cholesterol/high-density lipoprotein cholesterol ratio to coronary heart disease risk in apparently healthy men and women. J Am Coll Cardiol 55:3541

    Google Scholar 

  3. Austin MA, Hokanson JE, Edwards KL (1998) Hypertriglyceridemia as a cardiovascular risk factor. Am J Cardiol 81:7B12B

    Article  Google Scholar 

  4. Bauer CR, Shankaran S, Bada HS, Lester B, Wright LL, Krause-Steinrauf H, Smeriglio VL, Finnegan LP, Maza PL, Verter J (2002) The maternal lifestyle study: drug exposure during pregnancy and short-term maternal outcomes. Am J Obstet Gynecol 186:487–495

    Article  Google Scholar 

  5. Buhmann MD (2003) Radial basis functions: theory and implementations. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  6. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167

    Article  Google Scholar 

  7. Chen J, Chen W, Zhao N, Wu MC, Schaid DJ (2016) Small-sample kernel association tests for human genetic and microbiome association studies. Genet Epidemiol 40:5–19

    Article  Google Scholar 

  8. Das A, Poole WK, Bada HS (2004) A repeated measures approach for simultaneous modeling of multiple neurobehavioral outcomes in newborns exposed to cocaine in utero. Am J Epidemiol 159:891–899

    Article  Google Scholar 

  9. Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (2001) Executive summary of the third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). JAMA 285:2486–2497

  10. Firmann M, Mayor V, Vidal PM, Bochud M, Pecoud A, Hayoz D, Paccaud F, Preisig M, Song KS, Yuan X et al (2008) The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord 8:6

    Article  Google Scholar 

  11. Freytag S, Bickeböller H, Amos CI, Kneib T, Schlather M (2012) A novel kernel for correcting size bias in the logistic kernel machine test with an application to rheumatoid arthritis. Hum Hered Hum7:97–108

    Article  Google Scholar 

  12. Girault EM, Foppen E, Ackermans MT, Fliers E, Kalsbeek A (2013) Central administration of an orexin receptor 1 antagonist prevents the stimulatory effect of Olanzapine on endogenous glucose production. Brain Res 1527:238–245

    Article  Google Scholar 

  13. Grundy SM, Cleeman JI, Daniels SR, Donato KA, Eckel RH, Franklin BA, Gordon DJ, Krauss RM, Savage PJ, Smith SC Jr et al (2005) Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation 112:2735–2752

    Article  Google Scholar 

  14. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220

    Article  MathSciNet  MATH  Google Scholar 

  15. Kralisch S, Klein J, Lossner U, Bluher M, Paschke R, Stumvoll M, Fasshauer M (2005) Isoproterenol, TNFalpha, and insulin downregulate adipose triglyceride lipase in 3T3-L1 adipocytes. Mol Cell Endocrinol 240:43–49

    Article  Google Scholar 

  16. Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008) A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397

    Article  Google Scholar 

  17. Lanckriet GRG, Cristianini N, Bartlett P, El Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MathSciNet  MATH  Google Scholar 

  18. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834

    Article  Google Scholar 

  19. Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  MathSciNet  MATH  Google Scholar 

  20. Lieberman JA, Stroup TS, McEvoy JP, Swartz MS, Rosenheck RA, Perkins DO, Keef RSE, Davis SM, Davis CE, Lebowitz BD et al (2005) Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. N Engl J Med 353:1209–1223

    Article  Google Scholar 

  21. Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326

    Article  MathSciNet  MATH  Google Scholar 

  22. Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Sinha D, Parzen M, Lipshultz S (2009) Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome data. J R Stat Soc 172:3–20

    Article  MathSciNet  Google Scholar 

  23. Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multi-dimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics 63:1077–1088

    MATH  Google Scholar 

  24. Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform 9:292

    Article  Google Scholar 

  25. Maity A, Sullivan PF, Tzeng JY (2012) Multivariate phenotype association analysis by marker-set kernel machine regressions. Genet Epidemiol 36:686–695

    Article  Google Scholar 

  26. McCartan C, Mason R, Jayasinghe SR, Griffiths LR (2012) Cardiomyopathy classification: ongoing debate in the genomics era. Biochem Res Int 2012:796926

    Article  Google Scholar 

  27. Miller M, Stone NJ, Ballantyne C, Bittner V, Criqui MH, Ginsberg HN, Goldberg AC, Howard WJ, Jacobson MS, Kris-Etherton PM et al (2011) Triglycerides and cardiovascular disease: a scientific statement from the American Heart Association. Circulation 123:2292–2333

    Article  Google Scholar 

  28. Nam D, Kim SY (2008) Gene-set approach for expression pattern analysis. Brief Bioinform 9:189–197

    Article  Google Scholar 

  29. Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337:100–114

    Article  Google Scholar 

  30. Pan KH, Lih CJ, Cohen SN (2005) Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci USA 102:8961–8965

    Article  Google Scholar 

  31. Shen Y, Zhao Y, Zheng D, Chang X, Ju S, Guo L (2013) Effects of orexin A on GLUT4 expression and lipid content via MAPK signaling in 3T3-L1 adipocytes. J Steroid Biochem Mol Biol 138:376–383

    Article  Google Scholar 

  32. Sikder D, Kodadek T (2007) The neurohormone orexin stimulates hypoxia-inducible factor-1 activity. Genes Dev 21:2995–3005

    Article  Google Scholar 

  33. Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F et al (2008) Genomewide association for schizophrenia in the CATIE study: results of Stage 1. Mol Psychiatry 13:570–584

    Article  Google Scholar 

  34. Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore

    Book  MATH  Google Scholar 

  35. Szafranski M, Grandvalet Y, Rakotomamonjy A (2010) Composite kernel learning. Mach Learn 79:73–103

    Article  MathSciNet  Google Scholar 

  36. Tsuneki H, Wada T, Sasaoka T (2012) Role of orexin in the central regulation of glucose and energy homeostasis. Endocr J 59:365–374

    Article  Google Scholar 

  37. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  38. Wang X, Lee S, Zhu X, Redline S, Lin X (2013) GEE-based SNP set association test for continuous and discrete traits in family based association studies. Genet Epidemiol 37:778–786

    Article  Google Scholar 

  39. Wortley KE, Chang GQ, Davydova Z, Leibowitz SF (2003) Peptides that regulate food intake: orexin gene expression is increased during states of hypertriglyceridemia. Am J Physiol 284:R1454–R1465

    Google Scholar 

  40. Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942

    Article  Google Scholar 

  41. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am J Hum Genet 89:82–93

    Article  Google Scholar 

  42. Wu M, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel S, Molldrem JJ, Armistead PM (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 37:267–275

    Article  Google Scholar 

  43. Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin W, Lou XY, Cui X, Liu N (2015) A sequence kernel association test for dichotomous traits in family samples under a generalized linear mixed model. Hum Hered 79:60–68

    Article  Google Scholar 

  44. Yolken RH, Torrey EF, Lieberman JA, Yang S, Dickerson FB (2011) Serological evidence of exposure to herpes simplex virus Type 1 is associated with cognitive deficits in the CATIE schizophrenia sample. Schizophr Res 128:61–65

    Article  Google Scholar 

  45. Zhang D, Lin X (2003) Hypothesis testing in semiparametric additive mixed models. Biostatistics 4:57–74

    Article  MATH  Google Scholar 

  46. Zhang Y, Xu z, Shen X, Pan W, Alzheimer’s Disease Neuroimaging Initiative (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309–325

  47. Zhao Y, Chen F, Zhai R, Lin X, Diao N (2012) Association test based on SNP set: logistic kernel machine based test vs. principal component analysis. PLoS ONE 7:e44978

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Dr. Robert Yolken at Johns Hopkins University for providing the antibody data. The authors also thank Drs. Peter Vollenweider and Gerard Waeber, PIs of the CoLaus study, and Drs. Meg Ehm and Matthew Nelson, collaborators at GlaxoSmithKline for providing the CoLaus phenotype and sequence data. This work was supported by National Institutes of Health Grants R00 ES017744 (to A.M.), R01 MH084022 (to J.Y.T. and P.F.S.), and P01 CA142538 (to J.Y.T.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clemontina A. Davenport.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare.

Appendix

Appendix

From Sect. 2.1, the parameters \({\varvec{\beta }}\) and h in (1) can be estimated by maximizing the penalized log-likelihood using a Fisher scoring or a Newton–Raphson algorithm. [24] show that, by treating \({\varvec{\beta }}\) as a vector of fixed effects and \(\mathbf{h } = (h_{1},\ldots , h_{n})^\mathrm{T}\) as a vector of random effects, the logistic KM estimator is the same as the penalized quasi-likelihood estimator from a logistic mixed model \(\text {logit}(p_{i}) = \mathbf{x }_{i}^\mathrm{T} {\varvec{\beta }} + h_{i},\) where \(\mathbf{h } \sim N(\mathbf 0 ,\, \tau \mathbf{K }),\, \tau = 1/ \lambda ,\,\lambda \) is the penalty parameter from the penalized likelihood, and \(\mathbf{K }\) is a square matrix whose \((i,\,j)\)th element is (2). The normal equations given in (5) of [24] coincide with iteratively fitting a working linear mixed model \(\widetilde{\mathbf{y }} = \mathbf{X } {\varvec{\beta }} + \mathbf{h } + \varvec{\varepsilon }\) until convergence, where \({\varvec{\beta }}\) and \(\mathbf{h }\) are estimated using BLUE and BLUP, respectively and \(\varvec{\varepsilon } \sim \text {N}(\mathbf 0 ,\, \mathbf{D })\) where \(\mathbf{D } = \text {diag}\{p_{i}(1-p_{i})\}.\) The regularization parameter \(\tau \) can be estimated by treating it as a variance component and maximizing the REML criterion

$$\begin{aligned} \ell \approx {-}\frac{1}{2} \text {log }\left| \mathbf{V }_{u}\right| - \frac{1}{2} \text {log }\left| \mathbf{X }^\mathrm{T} \mathbf{V }_{u}^{-1} \mathbf{X }\right| - \frac{1}{2} (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}})^\mathrm{T} \mathbf{V }_{u}^{-1} (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}), \end{aligned}$$
(8)

where \(\mathbf{V }_{u} = \mathbf{D }^{-1} + \tau \mathbf{K }\) and \(\widetilde{\mathbf{y }} = \mathbf{X } {\varvec{\beta }} + \mathbf{h } + \mathbf{D }^{-1}(\mathbf{y } - \mathbf p ).\) We refer to [24] for full details.

Testing the overall genetic effect \(H_{0}{\text {:}}\,h(\mathbf{z }) = 0\) for UV responses is equivalent to testing \(H_{0}{\text {:}}\, \tau = 0.\) Liu et al. [24] propose the following score test statistic based on the derivative of (8) with respect to \(\tau \)

$$\begin{aligned} S = \frac{Q(\widehat{{\varvec{\beta }}}_{0}) - p_{Q}}{\sigma _{Q}}, \end{aligned}$$
(9)

where \(Q(\widehat{{\varvec{\beta }}}_{0}) = (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}_{0})^\mathrm{T} \mathbf{D } \mathbf{K } \mathbf{D } (\widetilde{\mathbf{y }} - \mathbf{X } \widehat{{\varvec{\beta }}}_{0}) = (\mathbf{y } - \widehat{\mathbf{p }}_{0})^\mathrm{T} \mathbf{K } (\mathbf{y } - \widehat{\mathbf{p }}_{0}),\, \text {logit} (\widehat{\mathbf{p }}_{0}) = \mathbf{X } \widehat{{\varvec{\beta }}}_{0},\, \widehat{{\varvec{\beta }}}_{0}\) is the MLE of \({\varvec{\beta }}\) under the null logistic model, \(p_{Q} = \text {tr}\{\mathbf{P }_{0} \mathbf{K } \},\, \sigma _{Q} = 2 \text {tr}\{\mathbf{P }_{0} \mathbf{K } \mathbf{P }_{0} \mathbf{K } \},\) and \(\mathbf{P }_{0} = \mathbf{D }_{0} - \mathbf{D }_{0} \mathbf{X } (\mathbf{X }^\mathrm{T}\mathbf{D }_{0} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} \mathbf{D }_{0}\) where \(\mathbf{D }_{0} = \text {diag}\{\hat{p}_{i0}(1-\hat{p}_{i0}) \}.\)

From Sect. 2.2.1, we modify the working model in (5). Define \(\mathbf{D }\) as the block diagonal matrix with blocks \(\mathbf{D }_{1}, \ldots , \mathbf{D }_{t}.\) Then the variance–covariance matrix of \(\varvec{\varepsilon }\) is \(\mathbf{D }^{-1} \mathbf{D }^{1/2} \mathbf{S } \mathbf{D }^{1/2} \mathbf{D }^{-1} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}\) and the modified working model will have the same form as (5) but with \(\varvec{\varepsilon } \sim \text {MVN}(\mathbf 0 ,\, \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}).\) The parameters \({\varvec{\beta }}_{j}\) and \(\mathbf{h }_{j}\) can now be estimated using BLUE and BLUP, respectively, and the variance components \(\tau _{j}\) can be estimated by maximizing the restricted quasi-likelihood criterion

$$\begin{aligned} \ell \approx {-}\frac{1}{2} \text {log }\left| \mathbf{V }_{m}\right| - \frac{1}{2} \text {log }\left| \mathbf{X }^\mathrm{T} \mathbf{V }_{m}^{-1} \mathbf{X }\right| - \frac{1}{2} \left( \begin{array}{c} \widetilde{\mathbf{y }}_{1} - \mathbf{X } \widehat{{\varvec{\beta }}}_{1} \\ \vdots \\ \widetilde{\mathbf{y }}_{t} - \mathbf{X } \widehat{{\varvec{\beta }}}_{t} \end{array}\right) ^\mathrm{T} \mathbf{V }_{m}^{-1} \left( \begin{array}{c} \widetilde{\mathbf{y }}_{1} - \mathbf{X } \widehat{{\varvec{\beta }}}_{1} \\ \vdots \\ \widetilde{\mathbf{y }}_{t} - \mathbf{X } \widehat{{\varvec{\beta }}}_{t} \end{array}\right) , \end{aligned}$$
(10)

where \(\mathbf{V }_{m} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2} + \mathbf{V }_{h}.\)

The main goal is to test for genetic pathway effects \(H_{0}{\text {:}}\, \mathbf{h }(\cdot ) = \mathbf 0 \) which is equivalent to testing \(H_{0}{\text {:}}\, \tau _{1} = \cdots = \tau _{t} = 0.\) To do this, we propose a score-type test statistic based on the derivative of the quasi-likelihood like that in (10). Taking the derivative of the criterion in (10) with respect to \(\tau _{j}\) for \(j=1,\ldots , t\) and then setting \(\tau _{j} = 0,\) the score function for \(\tau _{j}\) is \(S_{j} = Q_{j}({\varvec{\beta }},\,\varvec{\theta }) - p_{jQ},\) where

$$\begin{aligned} Q_{j}({\varvec{\beta }},\, \varvec{\theta }) = (\mathbf{y }-\widehat{\mathbf{p }})^\mathrm{T} \left( \mathbf{S } \mathbf{D }^{1/2}\right) ^{-1} \mathbf{D }^{1/2} \mathbf{K }_{j}^{*} \mathbf{D }^{1/2} \left( \mathbf{D }^{1/2} \mathbf{S }\right) ^{-1} (\mathbf{y }-\widehat{\mathbf{p }} ), \end{aligned}$$

and \(p_{jQ} = \text {tr}\{ \mathbf{P } \mathbf{K }_{j}^{*} \}.\) Because the \(\tau _{j}\)’s are considered as variance components and thus are non-negative, testing \(H_{0}{\text {:}}\,\tau _{1} = \cdots = \tau _{t} = 0\) is equivalent to testing \(H_{0}{\text {:}}\,\tau _{1} + \cdots + \tau _{t} = 0\) and we adopt a similar technique to [25].

From Sect. 2.2.2, in order to evaluate Q in (6), we estimate \({\varvec{\beta }}\) and \(\varvec{\theta }\) under the null using GEEs. We posit the GEEs under \(H_{0}\)

$$\begin{aligned} \widetilde{\mathbf{X }}^\mathrm{T} \mathbf{D } \mathbf{V }_{m0}^{-1} (\mathbf{y }-\mathbf{p }) = \mathbf 0 , \end{aligned}$$
(11)

where \(\widetilde{\mathbf{X }}\) is an \(nt \times nt\) block diagonal matrix with elements \(\mathbf{X }\) and \(\mathbf{V }_{m0} = \mathbf{D }^{-1/2} \mathbf{S } \mathbf{D }^{-1/2}.\) If there is no genetic pathway effect (\(H_{0}{\text {:}}\,\mathbf{h }(\cdot ) = \mathbf 0 \) is true), then \(\mathbf {V} _{m0}\) is the working variance–covariance matrix of \(\mathbf{y }.\) To solve (11), Liang and Zeger [19] suggest using a modified Fisher scoring algorithm to find \({\varvec{\beta }}\) and a method of moments estimation for \(\varvec{\theta }.\) The updating equation is

$$\begin{aligned} \widehat{{\varvec{\beta }}}^{(k+1)} = \widehat{{\varvec{\beta }}}^{(k)} + \left( \widetilde{\mathbf{X }}^\mathrm{T} \widehat{\mathbf{D }} \widehat{\mathbf{V }}_{m0}^{-1} \widehat{\mathbf{D }} \widetilde{\mathbf{X }}\right) ^{-1}\widetilde{\mathbf{X }}^\mathrm{T} \widehat{\mathbf{D }} \widehat{\mathbf{V }}_{m0}^{-1} (\mathbf{y } - \widehat{\mathbf{p }}). \end{aligned}$$

The initial estimates in the first iteration come from fitting a GLM assuming independence.

We then use a simulation-based technique to get the p-values of Q. It can be shown that, under \(H_0,\) var\((\mathbf{y }_{j} - \widehat{\mathbf{p }}_{j})\) can be approximated as \(\widehat{\mathbf{D }}_{j} - \widehat{\mathbf{D }}_{j} \mathbf{X } (\mathbf{X }^\mathrm{T} \widehat{\mathbf{D }}_{j} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} \widehat{\mathbf{D }}_{j} = \widehat{\mathbf{P }}_{j}.\) This follows from the fact that under \(H_0,\,\mathbf{y }_{j} - \widehat{\mathbf{p }}_{j} = \mathbf{D }_{j}( \widetilde{\mathbf{y }}_{j} - \mathbf{X } \widehat{{\varvec{\beta }}}_{j}),\) and that \(\text {Cov}(\widetilde{\mathbf{y }}_{j} - \mathbf{X } \widehat{{\varvec{\beta }}}_{j}) = \mathbf{D }_{j}^{-1} - {\mathbf{D }}_{j}^{-1} \mathbf{X } (\mathbf{X }^\mathrm{T} {\mathbf{D }}_{j} \mathbf{X })^{-1} \mathbf{X }^\mathrm{T} {\mathbf{D }}_{j}^{-1}\) from linear model theory. If the multiple outcomes were independent, then the variance–covariance matrix of \(\mathbf{y } - \widehat{\mathbf{p }}\) would be \(\widehat{\mathbf{P }},\) a block diagonal matrix with elements \(\widehat{\mathbf{P }}_{j},\) but since the outcomes are correlated, the variance–covariance matrix of \(\mathbf{y } - \widehat{\mathbf{p }}\) is \(\widehat{\mathbf{P }}^{1/2} \widehat{\mathbf{S }} \widehat{\mathbf{P }}^{1/2},\) which is no longer block diagonal. By defining \(\widehat{\mathbf{M }} = \widehat{\mathbf{P }}^{1/2} \widehat{\mathbf{S }} \widehat{\mathbf{P }}^{1/2},\) (6) can be rewritten as \(Q = \{ \widehat{\mathbf{M }}^{-1/2}({y}- \widehat{\mathbf{p }})\}^\mathrm{T} \widehat{\mathbf{B }} \{ \widehat{\mathbf{M }} ^{1/2} (\mathbf{y }-\widehat{\mathbf{p }}) \}\) where \(\widehat{\mathbf{B }} = \widehat{\mathbf{M }}^{1/2} ( \widehat{\mathbf{S }} \widehat{\mathbf{D }}^{1/2} )^{-1} \widehat{\mathbf{D }}^{1/2} {K} \widehat{\mathbf{D }}^{1/2} ( \widehat{\mathbf{D }}^{1/2} \widehat{\mathbf{S }})^{-1} \widehat{\mathbf{M }}^{1/2}.\) Using eigenvalue decomposition, we can write \(\widehat{\mathbf{B }} = \mathbf{U } {\varvec{\Lambda }} \mathbf{U }^\mathrm{T}\), where \(\mathbf{U }\) is the matrix of orthogonal eigenvectors of \(\widehat{\mathbf{B }}\) and \({\varvec{\Lambda }}\) is a diagonal matrix whose elements are the corresponding eigenvalues. Thus \(Q = \widehat{\mathbf{R }}^\mathrm{T} {\varvec{\Lambda }} \widehat{\mathbf{R }},\) where \(\widehat{\mathbf{R }} = \mathbf{U }^\mathrm{T} \widehat{\mathbf{M }}^{-1/2} (\mathbf{y } - \widehat{\mathbf{p }}).\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Davenport, C.A., Maity, A., Sullivan, P.F. et al. A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression. Stat Biosci 10, 117–138 (2018). https://doi.org/10.1007/s12561-017-9189-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-017-9189-9

Keywords

Navigation