Modeling the Association Between Clusters of SNPs and Disease Responses

  • Chapter
Nonparametric Bayesian Inference in Biostatistics

Abstract

The aim of the paper is to discuss the association between SNP genotype data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two-step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via an MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Argiento, R., Guglielmi, A., and Pievatolo, A. (2010). Bayesian density estimation and model selection using nonparametric hierarchical mixtures. Comput. Stat. Data Anal., 54, 816–832.

    Article  MathSciNet  MATH  Google Scholar 

  • Argiento, R., Cremaschi, A., and Guglielmi, A. (2014). A density-based algorithm for cluster analysis using species sampling Gaussian mixture models. J. Comput. Graph. Stat., 23, 1126–1142.

    Article  MathSciNet  Google Scholar 

  • Argiento, R., Bianchini, I., and Guglielmi, A. (2015). A blocked Gibbs sampler for NGG-mixture models via a priori truncation. Statist. Comp., Online First.

    Google Scholar 

  • Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annu. Rev. Genet., 44, 293–308.

    Article  Google Scholar 

  • Bansal, V., Libiger, O., Torkamani, A., and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet., 11, 773–785.

    Article  Google Scholar 

  • Barrios, E., Lijoi, A., Nieto-Barajas, L. E., PrĂĽnster, I., et al. (2013). Modeling with normalized random measure mixture models. Stat. Sci., 28, 313–334.

    Article  Google Scholar 

  • Caron, F. (2012). Bayesian nonparametric models for bipartite graphs. In Advances in Neural Information Processing Systems, pages 2051–2059.

    Google Scholar 

  • Caron, F. and Fox, E. B. (2014). Bayesian nonparametric models of sparse and exchangeable random graphs. ar**v preprint ar**v:1401.1137.

    Google Scholar 

  • Chen, C., Ding, N., and Buntine, W. (2012). Dependent hierarchical normalized random measures for dynamic topic modeling. International conference on machine learning (ICML), Edimburg, UK.

    Google Scholar 

  • Chen, L. S., Hutter, C. M., Potter, J. D., Liu, Y., Prentice, R. L., Peters, U., and Hsu, L. (2010). Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet., 86, 860–871.

    Article  Google Scholar 

  • Cho, J. H. (2008). The genetics and immunopathogenesis of inflammatory bowel disease. Nat. Rev. Immunol., 8, 458–466.

    Article  Google Scholar 

  • Chung, Y. and Dunson, D. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Am. Stat. Assoc., 104, 1646–1660.

    Article  MathSciNet  MATH  Google Scholar 

  • Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. In V. M. Do K.-A., MĂĽller P., editor, Bayesian inference for gene expression and proteomics, pages 201–218. Cambridge: Cambridge University Press.

    Google Scholar 

  • De Blasi, P., Favaro, S., Lijoi, A., Mena, R., PrĂĽnster, I., and Ruggiero, M. (2014). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell., 37, 212–229.

    Article  Google Scholar 

  • de Paus, R. A., Geilenkirchen, M. A., van Riet, S., van Dissel, J. T., and van de Vosse, E. (2013). Differential expression and function of human il-12rβ2 polymorphic variants. Molecular immunology, 56(4), 380–389.

    Article  Google Scholar 

  • Duerr, R. H., Taylor, K. D., Brant, S. R., Rioux, J. D., Silverberg, M. S., Daly, M. J., Steinhart, A. H., Abraham, C., Regueiro, M., Griffiths, A., et al. (2006). A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science, 314, 1461–1463.

    Article  Google Scholar 

  • Favaro, S. and Teh, Y. W. (2013). MCMC for normalized random measure mixture models. Stat. Sci., 28, 335–359.

    Article  MathSciNet  Google Scholar 

  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist., 1, 209–230.

    Article  MathSciNet  MATH  Google Scholar 

  • Fritsch, A. and Ickstadt, K. (2009). Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal., 4, 367–392.

    Article  MathSciNet  Google Scholar 

  • Glas, J., Seiderer, J., Wetzke, M., Konrad, A., Török, H.-P., Schmechel, S., Tonenchi, L., Grassl, C., Dambacher, J., Pfennig, S., et al. (2007). rs1004819 is the main disease-associated il23r variant in German Crohn’s disease patients: combined analysis of IL23R, CARD15, and OCTN1/2 variants. PloS One, 2, e819.

    Google Scholar 

  • Griffin, J. E. (2014). An adaptive truncation method for inference in Bayesian nonparametric models. Statist. Comp., Online First, 1–19.

    Google Scholar 

  • Griffin, J. E. and Walker, S. G. (2011). Posterior simulation of normalized random measure mixtures. J. Comput. Graph. Stat., 20, 241–259.

    Article  MathSciNet  Google Scholar 

  • Hu, J. and Tzeng, J.-Y. (2014). Integrative gene set analysis of multi-platform data with sample heterogeneity. Bioinformatics, 30, 1501–1507.

    Article  Google Scholar 

  • Huang, H., Chanda, P., Alonso, A., Bader, J. S., and Arking, D. E. (2011). Gene-based tests of association. PLoS Genetics, 7, e1002177.

    Article  Google Scholar 

  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 96, 161–173.

    Article  MathSciNet  MATH  Google Scholar 

  • Jasra, A., Holmes, C., and Stephens, D. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci., 20, 50–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist., 32, 1594–1649.

    Article  MathSciNet  Google Scholar 

  • Jostins, L., Ripke, S., Weersma, R. K., Duerr, R. H., McGovern, D. P., Hui, K. Y., Lee, J. C., Schumm, L. P., Sharma, Y., Anderson, C. A., et al. (2012). Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491, 119–124.

    Article  Google Scholar 

  • Lau, J. W. and Green, P. J. (2007). Bayesian model based clustering procedures. J. Comput. Graph. Stat, 16, 526–558.

    Article  MathSciNet  Google Scholar 

  • Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res., 21, 1109–1121.

    Article  Google Scholar 

  • Lijoi, A., Mena, R. H., and PrĂĽnster, I. (2007). Controlling the reinforcement in Bayesian nonparametric mixture models. J. R. Stat. Soc., B, 69, 715–740.

    Article  Google Scholar 

  • Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M., and Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. J. Stat. Softw., forthcoming.

    Google Scholar 

  • Medvedovic, M., Yeung, K. Y., and Bumgarner, R. E. (2004). Bayesian mixture model based clustering of replicated microarray data. Bioinformatics, 20, 1222–1232.

    Article  Google Scholar 

  • Molitor, J., Papathomas, M., Jerrett, M., and Richardson, S. (2010). Bayesian profile regression with an application to the national survey of children’s health. Biostatistics, 11, 484–498.

    Article  Google Scholar 

  • Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform., 6, 44–56.

    Article  Google Scholar 

  • Muliere, P. and Tardella, L. (1998). Approximating distributions of random functionals of Ferguson-Dirichlet priors. Can. J. Stat., 26, 283–297.

    Article  MathSciNet  MATH  Google Scholar 

  • MĂĽller, P., Quintana, F. A., and Rosner, G. A. (2011). A product partition model with regression on covariates. J. Comput. Graph. Stat, 20, 260–278.

    Article  Google Scholar 

  • Nguyen, L. B., Diskin, S. J., Capasso, M., Wang, K., Diamond, M. A., Glessner, J., Kim, C., Attiyeh, E. F., Mosse, Y. P., Cole, K., et al. (2011). Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genetics, 7, e1002026.

    Article  Google Scholar 

  • Onogi, A., Nurimoto, M., and Morita, M. (2011). Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods. BMC Bioinformatics, 12, 263–278.

    Article  Google Scholar 

  • Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., and Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene Ă— gene patterns. Genet. Epidemiol., 36, 663–674.

    Article  Google Scholar 

  • Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In B. M. Ferguson TS, Shapley LS, editor, Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, pages 245–267. Hayward: Institute of Mathematical Statistics.

    Google Scholar 

  • Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. R. Stat. Soc., B, 65, 557–574.

    Article  MathSciNet  MATH  Google Scholar 

  • Ramanan, V. K., Shen, L., Moore, J. H., and Saykin, A. J. (2012). Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet., 28, 323–332.

    Article  Google Scholar 

  • Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. (2014). Campbell Biology. Boston: Pearson.

    Google Scholar 

  • Regazzini, E., Lijoi, A., and PrĂĽnster, I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist., 31, 560–585.

    Article  MathSciNet  MATH  Google Scholar 

  • Rodriguez, A., Dunson, D. B., and Gelfand, A. E. (2008). The nested Dirichlet process. J. Am. Stat. Assoc., 103, 1131–1154.

    Article  MathSciNet  MATH  Google Scholar 

  • Tadesse, M. G., Sha, N., and Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. J. Am. Stat. Assoc., 100, 602–617.

    Article  MathSciNet  MATH  Google Scholar 

  • The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678.

    Article  Google Scholar 

  • Wakefield, J. (2007). A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet., 81, 208–227.

    Article  Google Scholar 

  • Wakefield, J. (2009). Bayes factors for genome-wide association studies: comparison with p-values. Genet. Epidemiol., 33, 79–86.

    Article  Google Scholar 

  • Wang, C., Ruggeri, F., Hsiao, C., and Argiento, R. (2014). Bayesian nonparametric clustering and association studies for large-scale SNP observations. Submitted.

    Google Scholar 

  • Wei, Y. C., Wen, S. H., Chen, P. C., Wang, C. H., and Hsiao, C. K. (2010). A simple Bayesian mixture model with a hybrid procedure for genome-wide association studies. Eur. J. Hum. Genet., 18.8, 942–947.

    Article  MATH  Google Scholar 

  • Yau, C. and Holmes, C. (2011). Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. Bayesian Anal., 6, 329–351.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raffaele Argiento .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Argiento, R., Guglielmi, A., Hsiao, C.K., Ruggeri, F., Wang, C. (2015). Modeling the Association Between Clusters of SNPs and Disease Responses. In: Mitra, R., MĂĽller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_6

Download citation

Publish with us

Policies and ethics

Navigation