Abstract
In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal assumption is violated. By extending the semiparametric regression estimator proposed by Hunter and Young (J Nonparametr Stat 24:19–38, 2012a) which requires the component error densities to be the same (including homogeneous variance), we propose semiparametric mixture of linear regression models with unspecified component error distributions to reduce the modeling bias. We establish a more general identifiability result under weaker conditions than existing results, construct a class of new estimators, and establish their asymptotic properties. These asymptotic results also apply to many existing semiparametric mixture regression estimators whose asymptotic properties have remained unknown due to the inherent difficulties in obtaining them. Using simulation studies, we demonstrate the superiority of the proposed estimators over the MLE when the normal error assumption is violated and the comparability when the error is normal. Analysis of a newly collected Equine Infectious Anemia Virus data in 2017 is employed to illustrate the usefulness of the new estimator.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-020-00725-z/MediaObjects/11749_2020_725_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-020-00725-z/MediaObjects/11749_2020_725_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-020-00725-z/MediaObjects/11749_2020_725_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-020-00725-z/MediaObjects/11749_2020_725_Fig4_HTML.png)
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Balabdaoui F et al (2017) Revisiting the Hodges–Lehmann estimator in a location mixture model: Is asymptotic normality good enough? Electron J Stat 11(2):4563–4595
Balabdaoui F, Doss CR et al (2018) Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24(2):1053–1071
Balakrishnan S, Wainwright MJ, Yu B et al (2017) Statistical guarantees for the em algorithm: from population to sample-based analysis. Ann Stat 45(1):77–120
Benaglia T, Chauveau D, Hunter D (2009) An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures. J Comput Graph Stat 18:505–526
Bordes L, Mottelet S, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model. Ann Stat 34:1204–1232
Bordes L, Chauveau D, Vandekerkhove P (2007) An EM algorithm for a semiparametric mixture model. Comput Stat Data Anal 51:5429–5443
Butucea C, Tzoumpe RN, Vandekerkhove P et al (2017) Semiparametric topographical mixture models with symmetric errors. Bernoulli 23(2):825–862
Chee C-S, Wang Y (2013) Estimation of finite mixtures with symmetric components. Stat Comput 23(2):233–249
Chen J, Li P, Fu Y (2012) Inference on the order of a normal mixture. J Am Stat Assoc 107(499):1096–1105
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Fan J, Zhang C, Zhang J et al (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Ann Stat 29(1):153–193
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, Berlin
Hu H, Wu Y, Yao W (2016) Maximum likelihood estimation of the mixture of log-concave densities. Comput Stat Data Anal 101:137–147
Huang M, Yao W (2012) Mixture of regression models with varying mixing proportions: a semiparametric approach. J Am Stat Assoc 107:711–724
Huang M, Li R, Wang S (2013) Nonparametric mixture of regression models. J Am Stat Assoc 108:929–941
Huang M, Yao W, Wang S, Chen Y (2018) Statistical inference and applications of mixture of varying coefficient models. Scand J Stat 45(3):618–643
Huber P (1981) Robust statistics. Wiley, New York
Hunter DR, Young DS (2012a) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38
Hunter DR, Young DS (2012b) Semiparametric mixtures of regressions. J Nonparametr Stat 24:19–38
Hunter DR, Wang S, Hettmansperger TP (2007) Inference for mixtures of symmetric distributions. Ann Stat 35:224–251
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. J Neural Comput 3:79–87
Jiang W, Tanner MA (1991) Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Ann Stat 27:987–1011
Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62:49–66
Kwon J, Caramanis C (2019) EM converges for a mixture of many linear regressions. ar**v preprint ar**v:1905.12106
Levine M, Hunter DR, Chauveau D (2011) Maximum smoothed likelihood for multivariate mixtures. Biometrika 98:403–416
Li P, Chen J (2010) Testing the order of a finite mixture. J Am Stat Assoc 105:1084–1092
Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics v 5, Hayward, CA. Institure of Mathematical Statistics
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: Karlin, S, Amemiya T, Goodman LA (eds) Proceedings of the 2006 SIAM international conference on data mining, pp 524–528. Society for Industrial and Applied Mathematics, USA
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53:683–690
Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC, Boca Raton
Wang S, Huang M, Wu X, Yao W (2016) Mixture of functional linear models and its application to CO2-GDP functional data. Comput Stat Data Anal 97:1–15
Wedel M, Kamakura WA (2000) Market segmentation: conceptual and methodological foundations. Springer, Berlin
Wu J, Yao W, **ang S (2017) Computation of an efficient and robust estimator in a semiparametric mixture model. J Stat Comput Simul 87(11):2128–2137
**ang S, Yao W (2018) Semiparametric mixtures of nonparametric regressions. Ann Inst Stat Math 70(1):131–154
**ang S, Yao W, Seo B (2016) Semiparametric mixture: continuous scale mixture approach. Comput Stat Data Anal 103:413–425
**ang S, Yao W, Yang G (2019) An overview of semiparametric extensions of finite mixture models. Stat Sci 34(3):391–404
Yao W, Zhao Z (2013) Kernel density-based linear regression estimate. Commun Stat Theory Methods 42(24):4499–4512
Zhao R, Li Y, Sun Y (2018) Statistical convergence of the em algorithm on Gaussian mixture models. ar**v preprint ar**v:1810.04090
Acknowledgements
We thank referees, the Associate Editor, and the Editor whose comments and suggestions have helped us to improve the paper significantly. Ma’s research is partially supported by grants from national institute of health. Xu’s research is supported by Zhejiang Provincial NSF of China Grant LY19A010006, and the First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics- Statistics). Yao’s research is partially supported by Department of Energy under award DE-EE0008574.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ma, Y., Wang, S., Xu, L. et al. Semiparametric mixture regression with unspecified error distributions. TEST 30, 429–444 (2021). https://doi.org/10.1007/s11749-020-00725-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-020-00725-z