Modeling the Association Between Clusters of SNPs and Disease Responses

Argiento, Raffaele; Guglielmi, Alessandra; Hsiao, Chuhsing Kate; Ruggeri, Fabrizio; Wang, Charlotte

doi:10.1007/978-3-319-19518-6_6

Raffaele Argiento⁸,
Alessandra Guglielmi⁹,
Chuhsing Kate Hsiao¹⁰,
Fabrizio Ruggeri⁸ &
…
Charlotte Wang¹⁰

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

4111 Accesses
2 Citations

Abstract

The aim of the paper is to discuss the association between SNP genotype data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two-step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via an MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The distribution of common-variant effect sizes

Article 29 July 2021

Clinical Assessment of Disease Risk Factors Using SNP Data and Bayesian Methods

Statistical Methods for Disease Risk Prediction with Genotype Data

References

Argiento, R., Guglielmi, A., and Pievatolo, A. (2010). Bayesian density estimation and model selection using nonparametric hierarchical mixtures. Comput. Stat. Data Anal., 54, 816–832.
Article MathSciNet MATH Google Scholar
Argiento, R., Cremaschi, A., and Guglielmi, A. (2014). A density-based algorithm for cluster analysis using species sampling Gaussian mixture models. J. Comput. Graph. Stat., 23, 1126–1142.
Article MathSciNet Google Scholar
Argiento, R., Bianchini, I., and Guglielmi, A. (2015). A blocked Gibbs sampler for NGG-mixture models via a priori truncation. Statist. Comp., Online First.
Google Scholar
Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annu. Rev. Genet., 44, 293–308.
Article Google Scholar
Bansal, V., Libiger, O., Torkamani, A., and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet., 11, 773–785.
Article Google Scholar
Barrios, E., Lijoi, A., Nieto-Barajas, L. E., Prünster, I., et al. (2013). Modeling with normalized random measure mixture models. Stat. Sci., 28, 313–334.
Article Google Scholar
Caron, F. (2012). Bayesian nonparametric models for bipartite graphs. In Advances in Neural Information Processing Systems, pages 2051–2059.
Google Scholar
Caron, F. and Fox, E. B. (2014). Bayesian nonparametric models of sparse and exchangeable random graphs. ar**v preprint ar**v:1401.1137.
Google Scholar
Chen, C., Ding, N., and Buntine, W. (2012). Dependent hierarchical normalized random measures for dynamic topic modeling. International conference on machine learning (ICML), Edimburg, UK.
Google Scholar
Chen, L. S., Hutter, C. M., Potter, J. D., Liu, Y., Prentice, R. L., Peters, U., and Hsu, L. (2010). Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet., 86, 860–871.
Article Google Scholar
Cho, J. H. (2008). The genetics and immunopathogenesis of inflammatory bowel disease. Nat. Rev. Immunol., 8, 458–466.
Article Google Scholar
Chung, Y. and Dunson, D. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Am. Stat. Assoc., 104, 1646–1660.
Article MathSciNet MATH Google Scholar
Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. In V. M. Do K.-A., Müller P., editor, Bayesian inference for gene expression and proteomics, pages 201–218. Cambridge: Cambridge University Press.
Google Scholar
De Blasi, P., Favaro, S., Lijoi, A., Mena, R., Prünster, I., and Ruggiero, M. (2014). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell., 37, 212–229.
Article Google Scholar
de Paus, R. A., Geilenkirchen, M. A., van Riet, S., van Dissel, J. T., and van de Vosse, E. (2013). Differential expression and function of human il-12rβ2 polymorphic variants. Molecular immunology, 56(4), 380–389.
Article Google Scholar
Duerr, R. H., Taylor, K. D., Brant, S. R., Rioux, J. D., Silverberg, M. S., Daly, M. J., Steinhart, A. H., Abraham, C., Regueiro, M., Griffiths, A., et al. (2006). A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science, 314, 1461–1463.
Article Google Scholar
Favaro, S. and Teh, Y. W. (2013). MCMC for normalized random measure mixture models. Stat. Sci., 28, 335–359.
Article MathSciNet Google Scholar
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist., 1, 209–230.
Article MathSciNet MATH Google Scholar
Fritsch, A. and Ickstadt, K. (2009). Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal., 4, 367–392.
Article MathSciNet Google Scholar
Glas, J., Seiderer, J., Wetzke, M., Konrad, A., Török, H.-P., Schmechel, S., Tonenchi, L., Grassl, C., Dambacher, J., Pfennig, S., et al. (2007). rs1004819 is the main disease-associated il23r variant in German Crohn’s disease patients: combined analysis of IL23R, CARD15, and OCTN1/2 variants. PloS One, 2, e819.
Google Scholar
Griffin, J. E. (2014). An adaptive truncation method for inference in Bayesian nonparametric models. Statist. Comp., Online First, 1–19.
Google Scholar
Griffin, J. E. and Walker, S. G. (2011). Posterior simulation of normalized random measure mixtures. J. Comput. Graph. Stat., 20, 241–259.
Article MathSciNet Google Scholar
Hu, J. and Tzeng, J.-Y. (2014). Integrative gene set analysis of multi-platform data with sample heterogeneity. Bioinformatics, 30, 1501–1507.
Article Google Scholar
Huang, H., Chanda, P., Alonso, A., Bader, J. S., and Arking, D. E. (2011). Gene-based tests of association. PLoS Genetics, 7, e1002177.
Article Google Scholar
Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 96, 161–173.
Article MathSciNet MATH Google Scholar
Jasra, A., Holmes, C., and Stephens, D. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci., 20, 50–67.
Article MathSciNet MATH Google Scholar
Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist., 32, 1594–1649.
Article MathSciNet Google Scholar
Jostins, L., Ripke, S., Weersma, R. K., Duerr, R. H., McGovern, D. P., Hui, K. Y., Lee, J. C., Schumm, L. P., Sharma, Y., Anderson, C. A., et al. (2012). Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491, 119–124.
Article Google Scholar
Lau, J. W. and Green, P. J. (2007). Bayesian model based clustering procedures. J. Comput. Graph. Stat, 16, 526–558.
Article MathSciNet Google Scholar
Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res., 21, 1109–1121.
Article Google Scholar
Lijoi, A., Mena, R. H., and Prünster, I. (2007). Controlling the reinforcement in Bayesian nonparametric mixture models. J. R. Stat. Soc., B, 69, 715–740.
Article Google Scholar
Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M., and Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. J. Stat. Softw., forthcoming.
Google Scholar
Medvedovic, M., Yeung, K. Y., and Bumgarner, R. E. (2004). Bayesian mixture model based clustering of replicated microarray data. Bioinformatics, 20, 1222–1232.
Article Google Scholar
Molitor, J., Papathomas, M., Jerrett, M., and Richardson, S. (2010). Bayesian profile regression with an application to the national survey of children’s health. Biostatistics, 11, 484–498.
Article Google Scholar
Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform., 6, 44–56.
Article Google Scholar
Muliere, P. and Tardella, L. (1998). Approximating distributions of random functionals of Ferguson-Dirichlet priors. Can. J. Stat., 26, 283–297.
Article MathSciNet MATH Google Scholar
Müller, P., Quintana, F. A., and Rosner, G. A. (2011). A product partition model with regression on covariates. J. Comput. Graph. Stat, 20, 260–278.
Article Google Scholar
Nguyen, L. B., Diskin, S. J., Capasso, M., Wang, K., Diamond, M. A., Glessner, J., Kim, C., Attiyeh, E. F., Mosse, Y. P., Cole, K., et al. (2011). Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genetics, 7, e1002026.
Article Google Scholar
Onogi, A., Nurimoto, M., and Morita, M. (2011). Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods. BMC Bioinformatics, 12, 263–278.
Article Google Scholar
Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., and Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene × gene patterns. Genet. Epidemiol., 36, 663–674.
Article Google Scholar
Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In B. M. Ferguson TS, Shapley LS, editor, Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, pages 245–267. Hayward: Institute of Mathematical Statistics.
Google Scholar
Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. R. Stat. Soc., B, 65, 557–574.
Article MathSciNet MATH Google Scholar
Ramanan, V. K., Shen, L., Moore, J. H., and Saykin, A. J. (2012). Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet., 28, 323–332.
Article Google Scholar
Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. (2014). Campbell Biology. Boston: Pearson.
Google Scholar
Regazzini, E., Lijoi, A., and Prünster, I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist., 31, 560–585.
Article MathSciNet MATH Google Scholar
Rodriguez, A., Dunson, D. B., and Gelfand, A. E. (2008). The nested Dirichlet process. J. Am. Stat. Assoc., 103, 1131–1154.
Article MathSciNet MATH Google Scholar
Tadesse, M. G., Sha, N., and Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. J. Am. Stat. Assoc., 100, 602–617.
Article MathSciNet MATH Google Scholar
The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678.
Article Google Scholar
Wakefield, J. (2007). A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet., 81, 208–227.
Article Google Scholar
Wakefield, J. (2009). Bayes factors for genome-wide association studies: comparison with p-values. Genet. Epidemiol., 33, 79–86.
Article Google Scholar
Wang, C., Ruggeri, F., Hsiao, C., and Argiento, R. (2014). Bayesian nonparametric clustering and association studies for large-scale SNP observations. Submitted.
Google Scholar
Wei, Y. C., Wen, S. H., Chen, P. C., Wang, C. H., and Hsiao, C. K. (2010). A simple Bayesian mixture model with a hybrid procedure for genome-wide association studies. Eur. J. Hum. Genet., 18.8, 942–947.
Article MATH Google Scholar
Yau, C. and Holmes, C. (2011). Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. Bayesian Anal., 6, 329–351.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

CNR-IMATI, Via Bassini 15, 20133, Milano, Italy
Raffaele Argiento & Fabrizio Ruggeri
Dipartimento di Matematica, Politecnico di Milano, piazza Leonardo 32, 20133, Milano, Italy
Alessandra Guglielmi
Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
Chuhsing Kate Hsiao & Charlotte Wang

Authors

Raffaele Argiento
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Guglielmi
View author publications
You can also search for this author in PubMed Google Scholar
Chuhsing Kate Hsiao
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Ruggeri
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raffaele Argiento .

Editor information

Editors and Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA
Riten Mitra
Department of Mathematics, University of Texas, Austin, Texas, USA
Peter Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Argiento, R., Guglielmi, A., Hsiao, C.K., Ruggeri, F., Wang, C. (2015). Modeling the Association Between Clusters of SNPs and Disease Responses. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-19518-6_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19517-9
Online ISBN: 978-3-319-19518-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Modeling the Association Between Clusters of SNPs and Disease Responses

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The distribution of common-variant effect sizes

Clinical Assessment of Disease Risk Factors Using SNP Data and Bayesian Methods

Statistical Methods for Disease Risk Prediction with Genotype Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Modeling the Association Between Clusters of SNPs and Disease Responses

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The distribution of common-variant effect sizes

Clinical Assessment of Disease Risk Factors Using SNP Data and Bayesian Methods

Statistical Methods for Disease Risk Prediction with Genotype Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation