Log in

Semiparametric mixture regression with unspecified error distributions

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal assumption is violated. By extending the semiparametric regression estimator proposed by Hunter and Young (J Nonparametr Stat 24:19–38, 2012a) which requires the component error densities to be the same (including homogeneous variance), we propose semiparametric mixture of linear regression models with unspecified component error distributions to reduce the modeling bias. We establish a more general identifiability result under weaker conditions than existing results, construct a class of new estimators, and establish their asymptotic properties. These asymptotic results also apply to many existing semiparametric mixture regression estimators whose asymptotic properties have remained unknown due to the inherent difficulties in obtaining them. Using simulation studies, we demonstrate the superiority of the proposed estimators over the MLE when the normal error assumption is violated and the comparability when the error is normal. Analysis of a newly collected Equine Infectious Anemia Virus data in 2017 is employed to illustrate the usefulness of the new estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  • Balabdaoui F et al (2017) Revisiting the Hodges–Lehmann estimator in a location mixture model: Is asymptotic normality good enough? Electron J Stat 11(2):4563–4595

    Article  MathSciNet  MATH  Google Scholar 

  • Balabdaoui F, Doss CR et al (2018) Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24(2):1053–1071

    Article  MathSciNet  MATH  Google Scholar 

  • Balakrishnan S, Wainwright MJ, Yu B et al (2017) Statistical guarantees for the em algorithm: from population to sample-based analysis. Ann Stat 45(1):77–120

    Article  MathSciNet  MATH  Google Scholar 

  • Benaglia T, Chauveau D, Hunter D (2009) An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures. J Comput Graph Stat 18:505–526

    Article  Google Scholar 

  • Bordes L, Mottelet S, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model. Ann Stat 34:1204–1232

    Article  MathSciNet  MATH  Google Scholar 

  • Bordes L, Chauveau D, Vandekerkhove P (2007) An EM algorithm for a semiparametric mixture model. Comput Stat Data Anal 51:5429–5443

    Article  MathSciNet  MATH  Google Scholar 

  • Butucea C, Tzoumpe RN, Vandekerkhove P et al (2017) Semiparametric topographical mixture models with symmetric errors. Bernoulli 23(2):825–862

    Article  MathSciNet  MATH  Google Scholar 

  • Chee C-S, Wang Y (2013) Estimation of finite mixtures with symmetric components. Stat Comput 23(2):233–249

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Li P, Fu Y (2012) Inference on the order of a normal mixture. J Am Stat Assoc 107(499):1096–1105

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fan J, Zhang C, Zhang J et al (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Ann Stat 29(1):153–193

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588

    Article  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, Berlin

    MATH  Google Scholar 

  • Hu H, Wu Y, Yao W (2016) Maximum likelihood estimation of the mixture of log-concave densities. Comput Stat Data Anal 101:137–147

    Article  MathSciNet  MATH  Google Scholar 

  • Huang M, Yao W (2012) Mixture of regression models with varying mixing proportions: a semiparametric approach. J Am Stat Assoc 107:711–724

    Article  MathSciNet  MATH  Google Scholar 

  • Huang M, Li R, Wang S (2013) Nonparametric mixture of regression models. J Am Stat Assoc 108:929–941

    Article  MathSciNet  MATH  Google Scholar 

  • Huang M, Yao W, Wang S, Chen Y (2018) Statistical inference and applications of mixture of varying coefficient models. Scand J Stat 45(3):618–643

    Article  MathSciNet  MATH  Google Scholar 

  • Huber P (1981) Robust statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Hunter DR, Young DS (2012a) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter DR, Young DS (2012b) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter DR, Wang S, Hettmansperger TP (2007) Inference for mixtures of symmetric distributions. Ann Stat 35:224–251

    Article  MathSciNet  MATH  Google Scholar 

  • Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. J Neural Comput 3:79–87

    Article  Google Scholar 

  • Jiang W, Tanner MA (1991) Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Ann Stat 27:987–1011

    MathSciNet  MATH  Google Scholar 

  • Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62:49–66

    MathSciNet  MATH  Google Scholar 

  • Kwon J, Caramanis C (2019) EM converges for a mixture of many linear regressions. ar**v preprint ar**v:1905.12106

  • Levine M, Hunter DR, Chauveau D (2011) Maximum smoothed likelihood for multivariate mixtures. Biometrika 98:403–416

    Article  MathSciNet  MATH  Google Scholar 

  • Li P, Chen J (2010) Testing the order of a finite mixture. J Am Stat Assoc 105:1084–1092

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics v 5, Hayward, CA. Institure of Mathematical Statistics

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: Karlin, S, Amemiya T, Goodman LA (eds) Proceedings of the 2006 SIAM international conference on data mining, pp 524–528. Society for Industrial and Applied Mathematics, USA

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53:683–690

    MathSciNet  MATH  Google Scholar 

  • Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Wang S, Huang M, Wu X, Yao W (2016) Mixture of functional linear models and its application to CO2-GDP functional data. Comput Stat Data Anal 97:1–15

    Article  MATH  Google Scholar 

  • Wedel M, Kamakura WA (2000) Market segmentation: conceptual and methodological foundations. Springer, Berlin

    Book  Google Scholar 

  • Wu J, Yao W, **ang S (2017) Computation of an efficient and robust estimator in a semiparametric mixture model. J Stat Comput Simul 87(11):2128–2137

    Article  MathSciNet  MATH  Google Scholar 

  • **ang S, Yao W (2018) Semiparametric mixtures of nonparametric regressions. Ann Inst Stat Math 70(1):131–154

    Article  MathSciNet  MATH  Google Scholar 

  • **ang S, Yao W, Seo B (2016) Semiparametric mixture: continuous scale mixture approach. Comput Stat Data Anal 103:413–425

    Article  MathSciNet  MATH  Google Scholar 

  • **ang S, Yao W, Yang G (2019) An overview of semiparametric extensions of finite mixture models. Stat Sci 34(3):391–404

    Article  MathSciNet  MATH  Google Scholar 

  • Yao W, Zhao Z (2013) Kernel density-based linear regression estimate. Commun Stat Theory Methods 42(24):4499–4512

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao R, Li Y, Sun Y (2018) Statistical convergence of the em algorithm on Gaussian mixture models. ar**v preprint ar**v:1810.04090

Download references

Acknowledgements

We thank referees, the Associate Editor, and the Editor whose comments and suggestions have helped us to improve the paper significantly. Ma’s research is partially supported by grants from national institute of health. Xu’s research is supported by Zhejiang Provincial NSF of China Grant LY19A010006, and the First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics- Statistics). Yao’s research is partially supported by Department of Energy under award DE-EE0008574.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weixin Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 331 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Wang, S., Xu, L. et al. Semiparametric mixture regression with unspecified error distributions. TEST 30, 429–444 (2021). https://doi.org/10.1007/s11749-020-00725-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-020-00725-z

Keywords

Mathematics Subject Classification

Navigation