Log in

A Dirichlet Regression Model for Compositional Data with Zeros

  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Aitchison, “The statistical analysis of compositional data,” J. R. Stat. Soc., Ser. B 44, 139–177 (1982).

    MathSciNet  MATH  Google Scholar 

  2. J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 2003).

    MATH  Google Scholar 

  3. I. J. Bear and D. Billheimer, “A logistic normal mixture model allowing essential zeros,” in Proceedings of the 6th Compositional Data Analysis Workshop, Girona, Spain, 2015.

  4. A. Butler and C. Glasbey, “A latent Gaussian model for compositional data with zeros,” J. R. Stat. Soc., Ser. C 57, 505–520 (2008).

    Article  MathSciNet  Google Scholar 

  5. G. Campbell and J. E. Mosimann, “Multivariate analysis of size and shape: modelling with the Dirichlet distribution,” in ASA Proceedings of Section on Statistical Graphics (San Francisco, USA, 1987), pp. 93–101.

    Google Scholar 

  6. P. J. Davis, “Leonhard Euler’s integral: a historical profile of the gamma function: In memoriam: Milton Abramowitz,” Am.Math. Mon. 66, 849–869 (1959).

    MATH  Google Scholar 

  7. D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Trans. Inform. Theory 49, 1858–1860 (2003).

    Article  MathSciNet  MATH  Google Scholar 

  8. R. Gueorguieva, R. Rosenheck, and D. Zelterman, “Dirichlet component regression and its applications to psychiatric data,” Comput. Stat. Data Anal. 52, 5344–5355 (2008).

    Article  MATH  Google Scholar 

  9. C. Gourieroux, A. Monfort, and A. Trognon “Pseudo maximum likelihood methods: theory,” Econometrica 52, 681–700 (1984).

    Article  MathSciNet  MATH  Google Scholar 

  10. R. H. Hijazi, “An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models,” in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain, 2011.

  11. R. H. Hijazi and R.W. Jernigan, “Modelling compositional data using Dirichlet regression models,” J. Appl. Probab. Stat. 4, 77–91 (2009).

    MathSciNet  MATH  Google Scholar 

  12. S. Kullback, Information Theory and Statistics (Dover, New York, 1997).

    MATH  Google Scholar 

  13. T. J. Leininger, A. E. Gelfand, J. M. Allen, and J. A. Silander, Jr., “Spatial regression modeling for compositional data with many zeros,” J. Agricult., Biol. Environ. Stat. 18, 314–334 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  14. J. M. Maier, DirichletReg: Dirichlet Regression in R (2014). http://dirichletreg.r-forge.r-project.org/.

    Google Scholar 

  15. J. A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, and J. Palarea-Albaladejo, “Model-based replacement of rounded zeros in compositional data: Classical and robust approaches,” Comput. Stat. Data Anal. 56, 2688–2704 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  16. I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2005).

    Book  MATH  Google Scholar 

  17. W. Lin, P. Shi, R. Feng, and H. Li, “Variable selection in regression with compositional covariates,” Biometrika 101, 785–797 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  18. M. R. Murteira Joséand J. J. S. Ramalho, “Regression analysis of multivariate fractional data,” Econometric Rev. 35, 515–552 (2016).

    Article  MathSciNet  Google Scholar 

  19. K. W. Ng, G. L. Tian, and M. L. Tang, Dirichlet and Related Distributions: Theory, Methods and Applications (Wiley, Chichester, 2011).

    Book  MATH  Google Scholar 

  20. R. Ospina and S. L. P. Ferrari, “Inflated beta distributions,” Stat. Papers 51, 111–126 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  21. F. Österreicher and I. Vajda, “A new class of metric divergences on probability spaces and its applicability in statistics,” Ann. Inst. Stat. Math. 55, 639–653 (2003).

    Article  MathSciNet  MATH  Google Scholar 

  22. J. Palarea-Albaladejo and J. A. Martín-Fernández, “Amodified EMalr-algorithm for replacing rounded zeros in compositional data sets,” Comput. Geosci. 34, 902–917 (2008).

    Article  Google Scholar 

  23. J. L. Scealy and A. H. Welsh, “Regression for compositional data by using distributions defined on the hypersphere,” J. R. Stat. Soc., Ser. B 73, 351–375 (2011).

    Article  MathSciNet  Google Scholar 

  24. R. L. Smith, “A statistical assessment of Buchanan’s vote in Palm Beach county,” Stat. Sci. 17, 441–457 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  25. M. A. Stephens, “Use of the vonMises distribution to analyse continuous proportions,” Biometrika 69, 197–203 (1982).

    Article  MathSciNet  Google Scholar 

  26. C. Stewart and C. Field, “Managing the essential zeros in quantitative fatty acid signature analysis,” J. Agricult., Biol., Environ. Stat. 16, 45–69 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  27. M. Templ, K. Hron, and P. Filzmoser, robCompositions: Robust Estimation for Compositional Data, R PackageVersion 0.8-4.

  28. H. Theil, Economics and Information Theory (North-Holland, Amsterdam, 1967).

    Google Scholar 

  29. T. W. Yee, VGAM: Vector Generalized Linear and Additive Models. R Package Version 0.8-4 (2011). http://CRAN. R-project.org/package=VGAM.

    Google Scholar 

  30. G. Zadora, T. Neocleous, and C. Aitken, “A two-level model for evidence evaluation in the presence of zeros,” J. Forensic Sci. 55, 371–384 (2010).

    Article  Google Scholar 

  31. M. Tsagris and G. Athineou, Compositional: Compositional Data Analysis. R package version 2.8 (2017). https://CRAN.R-project.org/package=Compositional.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michail Tsagris.

Additional information

(Submitted by A. I. Volodin)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsagris, M., Stewart, C. A Dirichlet Regression Model for Compositional Data with Zeros. Lobachevskii J Math 39, 398–412 (2018). https://doi.org/10.1134/S1995080218030198

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1995080218030198

Keywords

Navigation