Semiparametric mixture regression with unspecified error distributions

Ma, Yanyuan; Wang, Shaoli; Xu, Lin; Yao, Weixin

doi:10.1007/s11749-020-00725-z

Semiparametric mixture regression with unspecified error distributions

Original Paper
Published: 08 July 2020

Volume 30, pages 429–444, (2021)
Cite this article

TEST Aims and scope Submit manuscript

Yanyuan Ma¹,
Shaoli Wang²,
Lin Xu³ &
…
Weixin Yao ORCID: orcid.org/0000-0001-5925-5081⁴

552 Accesses
5 Citations
Explore all metrics

Abstract

In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal assumption is violated. By extending the semiparametric regression estimator proposed by Hunter and Young (J Nonparametr Stat 24:19–38, 2012a) which requires the component error densities to be the same (including homogeneous variance), we propose semiparametric mixture of linear regression models with unspecified component error distributions to reduce the modeling bias. We establish a more general identifiability result under weaker conditions than existing results, construct a class of new estimators, and establish their asymptotic properties. These asymptotic results also apply to many existing semiparametric mixture regression estimators whose asymptotic properties have remained unknown due to the inherent difficulties in obtaining them. Using simulation studies, we demonstrate the superiority of the proposed estimators over the MLE when the normal error assumption is violated and the comparability when the error is normal. Analysis of a newly collected Equine Infectious Anemia Virus data in 2017 is employed to illustrate the usefulness of the new estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Balabdaoui F et al (2017) Revisiting the Hodges–Lehmann estimator in a location mixture model: Is asymptotic normality good enough? Electron J Stat 11(2):4563–4595
Article MathSciNet MATH Google Scholar
Balabdaoui F, Doss CR et al (2018) Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24(2):1053–1071
Article MathSciNet MATH Google Scholar
Balakrishnan S, Wainwright MJ, Yu B et al (2017) Statistical guarantees for the em algorithm: from population to sample-based analysis. Ann Stat 45(1):77–120
Article MathSciNet MATH Google Scholar
Benaglia T, Chauveau D, Hunter D (2009) An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures. J Comput Graph Stat 18:505–526
Article Google Scholar
Bordes L, Mottelet S, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model. Ann Stat 34:1204–1232
Article MathSciNet MATH Google Scholar
Bordes L, Chauveau D, Vandekerkhove P (2007) An EM algorithm for a semiparametric mixture model. Comput Stat Data Anal 51:5429–5443
Article MathSciNet MATH Google Scholar
Butucea C, Tzoumpe RN, Vandekerkhove P et al (2017) Semiparametric topographical mixture models with symmetric errors. Bernoulli 23(2):825–862
Article MathSciNet MATH Google Scholar
Chee C-S, Wang Y (2013) Estimation of finite mixtures with symmetric components. Stat Comput 23(2):233–249
Article MathSciNet MATH Google Scholar
Chen J, Li P, Fu Y (2012) Inference on the order of a normal mixture. J Am Stat Assoc 107(499):1096–1105
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Fan J, Zhang C, Zhang J et al (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Ann Stat 29(1):153–193
Article MathSciNet MATH Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Article MATH Google Scholar
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, Berlin
MATH Google Scholar
Hu H, Wu Y, Yao W (2016) Maximum likelihood estimation of the mixture of log-concave densities. Comput Stat Data Anal 101:137–147
Article MathSciNet MATH Google Scholar
Huang M, Yao W (2012) Mixture of regression models with varying mixing proportions: a semiparametric approach. J Am Stat Assoc 107:711–724
Article MathSciNet MATH Google Scholar
Huang M, Li R, Wang S (2013) Nonparametric mixture of regression models. J Am Stat Assoc 108:929–941
Article MathSciNet MATH Google Scholar
Huang M, Yao W, Wang S, Chen Y (2018) Statistical inference and applications of mixture of varying coefficient models. Scand J Stat 45(3):618–643
Article MathSciNet MATH Google Scholar
Huber P (1981) Robust statistics. Wiley, New York
Book MATH Google Scholar
Hunter DR, Young DS (2012a) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38
Article MathSciNet MATH Google Scholar
Hunter DR, Young DS (2012b) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38
Article MathSciNet MATH Google Scholar
Hunter DR, Wang S, Hettmansperger TP (2007) Inference for mixtures of symmetric distributions. Ann Stat 35:224–251
Article MathSciNet MATH Google Scholar
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. J Neural Comput 3:79–87
Article Google Scholar
Jiang W, Tanner MA (1991) Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Ann Stat 27:987–1011
MathSciNet MATH Google Scholar
Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62:49–66
MathSciNet MATH Google Scholar
Kwon J, Caramanis C (2019) EM converges for a mixture of many linear regressions. ar**v preprint ar**v:1905.12106
Levine M, Hunter DR, Chauveau D (2011) Maximum smoothed likelihood for multivariate mixtures. Biometrika 98:403–416
Article MathSciNet MATH Google Scholar
Li P, Chen J (2010) Testing the order of a finite mixture. J Am Stat Assoc 105:1084–1092
Article MathSciNet MATH Google Scholar
Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics v 5, Hayward, CA. Institure of Mathematical Statistics
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: Karlin, S, Amemiya T, Goodman LA (eds) Proceedings of the 2006 SIAM international conference on data mining, pp 524–528. Society for Industrial and Applied Mathematics, USA
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53:683–690
MathSciNet MATH Google Scholar
Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC, Boca Raton
Book MATH Google Scholar
Wang S, Huang M, Wu X, Yao W (2016) Mixture of functional linear models and its application to CO2-GDP functional data. Comput Stat Data Anal 97:1–15
Article MATH Google Scholar
Wedel M, Kamakura WA (2000) Market segmentation: conceptual and methodological foundations. Springer, Berlin
Book Google Scholar
Wu J, Yao W, **ang S (2017) Computation of an efficient and robust estimator in a semiparametric mixture model. J Stat Comput Simul 87(11):2128–2137
Article MathSciNet MATH Google Scholar
**ang S, Yao W (2018) Semiparametric mixtures of nonparametric regressions. Ann Inst Stat Math 70(1):131–154
Article MathSciNet MATH Google Scholar
**ang S, Yao W, Seo B (2016) Semiparametric mixture: continuous scale mixture approach. Comput Stat Data Anal 103:413–425
Article MathSciNet MATH Google Scholar
**ang S, Yao W, Yang G (2019) An overview of semiparametric extensions of finite mixture models. Stat Sci 34(3):391–404
Article MathSciNet MATH Google Scholar
Yao W, Zhao Z (2013) Kernel density-based linear regression estimate. Commun Stat Theory Methods 42(24):4499–4512
Article MathSciNet MATH Google Scholar
Zhao R, Li Y, Sun Y (2018) Statistical convergence of the em algorithm on Gaussian mixture models. ar**v preprint ar**v:1810.04090

Download references

Acknowledgements

We thank referees, the Associate Editor, and the Editor whose comments and suggestions have helped us to improve the paper significantly. Ma’s research is partially supported by grants from national institute of health. Xu’s research is supported by Zhejiang Provincial NSF of China Grant LY19A010006, and the First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics- Statistics). Yao’s research is partially supported by Department of Energy under award DE-EE0008574.

Author information

Authors and Affiliations

Department of Statistics, The Pennsylvania State University, State College, USA
Yanyuan Ma
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
Shaoli Wang
School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Lin Xu
Department of Statistics, University of California, Riverside, Riverside, USA
Weixin Yao

Authors

Yanyuan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shaoli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weixin Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 331 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Y., Wang, S., Xu, L. et al. Semiparametric mixture regression with unspecified error distributions. TEST 30, 429–444 (2021). https://doi.org/10.1007/s11749-020-00725-z

Download citation

Received: 24 July 2019
Accepted: 26 June 2020
Published: 08 July 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11749-020-00725-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric mixture regression with unspecified error distributions

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 331 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation