Abstract
When a population exhibits heterogeneity, we often model it via a finite mixture: decompose it into several different but homogeneous subpopulations. Contemporary practice favors learning the mixtures by maximizing the likelihood for statistical efficiency and the convenient EM algorithm for numerical computation. Yet the maximum likelihood estimate (MLE) is not well defined for finite location-scale mixture in general. We hence investigate feasible alternatives to MLE such as minimum distance estimators. Recently, the Wasserstein distance has drawn increased attention in the machine learning community. It has intuitive geometric interpretation and is successfully employed in many new applications. Do we gain anything by learning finite location-scale mixtures via a minimum Wasserstein distance estimator (MWDE)? This chapter investigates this possibility in several respects. We find that the MWDE is consistent and derive a numerical solution under finite location-scale mixtures. We study its robustness against outliers and mild model mis-specifications. Our moderate scaled simulation study shows the MWDE suffers some efficiency loss against a penalized version of MLE in general without noticeable gain in robustness. We reaffirm the general superiority of the likelihood-based learning strategies even for the non-regular finite location-scale mixtures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. Preprint. ar**v:1701.07875.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Chen, J., & Tan, X. (2009). Inference for multivariate normal mixtures. Journal of Multivariate Analysis, 100(7), 1367–1383.
Chen, J., Tan, X., & Zhang, R. (2008). Inference for normal mixtures in mean and variance. Statistica Sinica, 18(2), 443–465.
Chen, J., Li, P., & Liu, G. (2020). Homogeneity testing under finite location-scale mixtures. Canadian Journal of Statistics, 48(4), 670–684.
Choi, K. (1969). Estimators for the parameters of a finite mixture of distributions. Annals of the Institute of Statistical Mathematics, 21(1), 107–116.
Clark, A. (2015). Pillow (PIL Fork) documentation.
Clarke, B., & Heathcote, C. (1994). Robust estimation of k-component univariate normal mixtures. Annals of the Institute of Statistical Mathematics, 46(1), 83–93.
Cutler, A., & Cordero-Brana, O. I. (1996). Minimum Hellinger distance estimation for finite mixture models. Journal of the American Statistical Association, 91(436), 1716–1723.
Deely, J., & Kruse, R. (1968). Construction of sequences estimating the mixing distribution. The Annals of Mathematical Statistics, 39(1), 286–288.
Evans, S. N., & Matsen, F. A. (2012). The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. Journal of the Royal Statistical Society: Series B (Methodological), 74(3), 569–592.
Farnoosh, R., & Zarpak, B. (2008). Image segmentation using Gaussian mixture model. IUST International Journal of Engineering Science, 19(1–2), 29–32.
Holzmann, H., Munk, A., & Stratmann, B. (2004). Identifiability of finite mixtures-with applications to circular distributions. Sankhyā: The Indian Journal of Statistics, 66(3), 440–449.
Kolouri, S., Rohde, G. K., & Hoffmann, H. (2018). Sliced Wasserstein distance for learning Gaussian mixture models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3427–3436).
Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185(326-330), 71–110.
Plataniotis, K. N., & Hatzinak, D. (2000). Gaussian mixtures and their applications to signal processing. In S. Stergiopoulos (Ed.), Advanced signal processing handbook: theory and implementation for radar, sonar, and medical imaging real time systems (vol. 25, chapter 3, pp. 3-1–3-35, 1st edn). Boca Raton: CRC Press.
Santosh, D. H. H., Venkatesh, P., Poornesh, P., Rao, L. N., & Kumar, N. A. (2013). Tracking multiple moving objects using Gaussian mixture model. International Journal of Soft Computing and Engineering (IJSCE), 3(2), 114–119.
Schork, N. J., Allison, D. B., & Thiel, B. (1996). Mixture distributions in human genetics research. Statistical Methods in Medical Research, 5(2), 155–178.
Tanaka, K. (2009). Strong consistency of the maximum likelihood estimator for finite mixtures of location–scale distributions when penalty is imposed on the ratios of the scale parameters. Scandinavian Journal of Statistics, 36(1), 171–184.
Teicher, H. (1961). Identifiability of mixtures. The Annals of Mathematical Statistics, 32(1), 244–248.
Van der Vaart, A. W. (2000). Asymptotic statistics (vol. 3). Cambridge University Press.
Villani, C. (2003). Topics in optimal transportation (vol. 58). American Mathematical Society.
Woodward, W. A., Parr, W. C., Schucany, W. R., & Lindsey, H. (1984). A comparison of minimum distance and maximum likelihood estimation of a mixture proportion. Journal of the American Statistical Association, 79(387), 590–598.
Yakowitz, S. (1969). A consistent estimator for the identification of finite mixtures. The Annals of Mathematical Statistics, 40(5), 1728–1735.
Zhu, D. (2016). A two-component mixture model for density estimation and classification. Journal of Interdisciplinary Mathematics, 19(2), 311–319.
Acknowledgements
The authors would like to thank Richard Schonberg for proofreading the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Numerically Friendly Expression of W 2(F N, F(⋅|G))
To learn the finite mixture distribution through MWDE, we must compute
for finite location-scale mixture
We write \({\mathbb E}_k(\cdot )\) as expectation under distribution F(⋅|θ k). For instance,
Let I n = ((n − 1)∕N, n∕N] for n = 1, 2, …, N so that \(F^{-1}_N (t) = x_{(n)}\) when t ∈ I n, where x (n) is the nth order statistic. For ease of notation, we write x (n) as x n. Over this interval, we have
The integration of the first term in (8), after summing over n, is given by
The integration of the third term in (8) is
Let ξ 0 = −∞, ξ N+1 = ∞, and ξ n = F −1(n∕N|G) for n = 1, …, N. Denote
and
Then
These lead to numerically convenient expression
To most effectively use BFGS algorithm, it is best to provide gradients of the objective function. Here are some numerically friendly expressions of some partial derivatives.
Lemma 1
Let δ jk = 1 when j = k and δ jk = 0 when j ≠ k. For n = 1, …, N and j = 1, 2, …, K, we have
and
Furthermore, we have
Based on this lemma, it is seen that
with F 0(ξ 0|θ k) = 0, F 0(ξ N+1|θ k) = 1, \(T \big ( \frac {\xi _{0}-\mu _k}{\sigma _k} \big )=0\), and \(T\big (\frac {\xi _{N+1}-\mu _k}{\sigma _k} \big )=\int _{-\infty }^{\infty } tf_0(t)dt\) is a constant that does not depend on any parameters. Substituting the partial derivatives in Lemma 1, we then get
Similarly, we have
Computing the quantiles of the mixture distribution F(⋅|G) for each G is one of the most demanding tasks. The property stated in the following lemma allows us to develop a bi-section algorithm.
Lemma 2
Let \(F(x| G)=\sum _{k=1}^K F(x|\mu _k,\sigma _k)\) be a K-component mixture, and ξ(t) = F −1(t|G) and ξ k(t) = F −1(t|θ k), respectively, the t-quantile of the mixture and its kth subpopulation. For any t ∈ (0, 1),
Proof
Since F(x|θ) has a continuous CDF, we must have F(ξ k(t)|θ k) = t. By the monotonicity of the CDF F(⋅|θ k), we have
Multiplying by w k and summing over k lead to
This implies (9) and completes the proof. □
In view of this lemma, we can easily find the quantiles of F(⋅|θ k) to form an interval containing the targeting quantile of F(⋅|G). We can quickly find F −1(t|G) value through a bi-section algorithm.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhang, Q., Chen, J. (2022). Minimum Wasserstein Distance Estimator Under Finite Location-Scale Mixtures. In: He, W., Wang, L., Chen, J., Lin, C.D. (eds) Advances and Innovations in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-08329-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-08329-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08328-0
Online ISBN: 978-3-031-08329-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)