Abstract
Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.
Similar content being viewed by others
References
J. Aitchison, “The statistical analysis of compositional data,” J. R. Stat. Soc., Ser. B 44, 139–177 (1982).
J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 2003).
I. J. Bear and D. Billheimer, “A logistic normal mixture model allowing essential zeros,” in Proceedings of the 6th Compositional Data Analysis Workshop, Girona, Spain, 2015.
A. Butler and C. Glasbey, “A latent Gaussian model for compositional data with zeros,” J. R. Stat. Soc., Ser. C 57, 505–520 (2008).
G. Campbell and J. E. Mosimann, “Multivariate analysis of size and shape: modelling with the Dirichlet distribution,” in ASA Proceedings of Section on Statistical Graphics (San Francisco, USA, 1987), pp. 93–101.
P. J. Davis, “Leonhard Euler’s integral: a historical profile of the gamma function: In memoriam: Milton Abramowitz,” Am.Math. Mon. 66, 849–869 (1959).
D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Trans. Inform. Theory 49, 1858–1860 (2003).
R. Gueorguieva, R. Rosenheck, and D. Zelterman, “Dirichlet component regression and its applications to psychiatric data,” Comput. Stat. Data Anal. 52, 5344–5355 (2008).
C. Gourieroux, A. Monfort, and A. Trognon “Pseudo maximum likelihood methods: theory,” Econometrica 52, 681–700 (1984).
R. H. Hijazi, “An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models,” in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain, 2011.
R. H. Hijazi and R.W. Jernigan, “Modelling compositional data using Dirichlet regression models,” J. Appl. Probab. Stat. 4, 77–91 (2009).
S. Kullback, Information Theory and Statistics (Dover, New York, 1997).
T. J. Leininger, A. E. Gelfand, J. M. Allen, and J. A. Silander, Jr., “Spatial regression modeling for compositional data with many zeros,” J. Agricult., Biol. Environ. Stat. 18, 314–334 (2013).
J. M. Maier, DirichletReg: Dirichlet Regression in R (2014). http://dirichletreg.r-forge.r-project.org/.
J. A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, and J. Palarea-Albaladejo, “Model-based replacement of rounded zeros in compositional data: Classical and robust approaches,” Comput. Stat. Data Anal. 56, 2688–2704 (2012).
I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2005).
W. Lin, P. Shi, R. Feng, and H. Li, “Variable selection in regression with compositional covariates,” Biometrika 101, 785–797 (2014).
M. R. Murteira Joséand J. J. S. Ramalho, “Regression analysis of multivariate fractional data,” Econometric Rev. 35, 515–552 (2016).
K. W. Ng, G. L. Tian, and M. L. Tang, Dirichlet and Related Distributions: Theory, Methods and Applications (Wiley, Chichester, 2011).
R. Ospina and S. L. P. Ferrari, “Inflated beta distributions,” Stat. Papers 51, 111–126 (2010).
F. Österreicher and I. Vajda, “A new class of metric divergences on probability spaces and its applicability in statistics,” Ann. Inst. Stat. Math. 55, 639–653 (2003).
J. Palarea-Albaladejo and J. A. Martín-Fernández, “Amodified EMalr-algorithm for replacing rounded zeros in compositional data sets,” Comput. Geosci. 34, 902–917 (2008).
J. L. Scealy and A. H. Welsh, “Regression for compositional data by using distributions defined on the hypersphere,” J. R. Stat. Soc., Ser. B 73, 351–375 (2011).
R. L. Smith, “A statistical assessment of Buchanan’s vote in Palm Beach county,” Stat. Sci. 17, 441–457 (2002).
M. A. Stephens, “Use of the vonMises distribution to analyse continuous proportions,” Biometrika 69, 197–203 (1982).
C. Stewart and C. Field, “Managing the essential zeros in quantitative fatty acid signature analysis,” J. Agricult., Biol., Environ. Stat. 16, 45–69 (2011).
M. Templ, K. Hron, and P. Filzmoser, robCompositions: Robust Estimation for Compositional Data, R PackageVersion 0.8-4.
H. Theil, Economics and Information Theory (North-Holland, Amsterdam, 1967).
T. W. Yee, VGAM: Vector Generalized Linear and Additive Models. R Package Version 0.8-4 (2011). http://CRAN. R-project.org/package=VGAM.
G. Zadora, T. Neocleous, and C. Aitken, “A two-level model for evidence evaluation in the presence of zeros,” J. Forensic Sci. 55, 371–384 (2010).
M. Tsagris and G. Athineou, Compositional: Compositional Data Analysis. R package version 2.8 (2017). https://CRAN.R-project.org/package=Compositional.
Author information
Authors and Affiliations
Corresponding author
Additional information
(Submitted by A. I. Volodin)
Rights and permissions
About this article
Cite this article
Tsagris, M., Stewart, C. A Dirichlet Regression Model for Compositional Data with Zeros. Lobachevskii J Math 39, 398–412 (2018). https://doi.org/10.1134/S1995080218030198
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1995080218030198