Abstract
This chapter shows an application of the MDL principle to statistical model selection. First a number of existing model selection criteria such as AIC, BIC, MML, and cross-validation are introduced. The MDL criterion is introduced as an information-theoretic model selection criterion. It is justified in terms of consistency, estimation optimality, and rate of convergence. Some variants of the MDL criterion, such as the sequential NML criterion and the luckiness NML criterion, are also introduced. We give examples of model selection with the MDL criterion, including histogram density optimization, non-negative matrix factorization, decision tree learning, word-embedding, time series, and linear regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The program code for dimensionality selection with SNML is available at https://github.com/truythu169/snml-skip-gram.
The video explaining this technology can be viewed at https://encyclopedia.pub/video/video_detail/353.
- 2.
- 3.
References
H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
S. Amari, H. Nagaoka, Methods of Information Geometry, vol. 191 (American Mathematical Soc., 2000)
A. Barron, T. Cover, Minimum complexity density estimation. IEEE Trans. Inf. Theory 17(37), 1034–1054 (1991)
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (1984), https://doi.org/10.1201/9781315139470
C.D. Giurcaneanu, J. Rissanen, Estimation of AR and ARMA models by stochastic complexity. Inst. Math. Stat. Lect. Not. Monogr. Ser. 52, 48–59 (2006)
P.T. Hung, K. Yamanisihi, Word2vec skip-gram dimensionality selection via sequential normalized maximum likelihood. Entropy 23(8), 997 (2021), https://doi.org/10.3390/e23080997
M. Kawakita, J. Takeuchi, Barron and Cover’s theory in supervised learning and its application to lasso, in Proceedings of the 33rd International Conference on Machine Learning (ICML’13), PMLR 48, pp. 1958–1966 (2016)
Y. Fu, S. Matsushima, K. Yamanishi, Model selection for non-negative tensor factorization with minimum description length. Entropy 21(7), 632 (2019), https://doi.org/10.3390/e21070632
P. Miettinen, J. Vreeken, MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans. Knowl. Discov. Data 8(4), 18:1–18:31 (2014)
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of Neural Information Processing (NIPS’13), pp. 3111–3119 (2013)
M. Kearns, R. Schapire, Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 31st Symposium on Foundation of Computer Sciences (FOCS’90), pp. 382–391 (1990)
P.S. Laplace, Mémories de mathématique et de physique, tom sixuéme. Stat. Sci. 1(3), 366–367 (1774)
D.D. Lee, H.S. Seung, Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, 1993)
J. Quinlan, R. Rivest, Inferring decision trees with the minimum description length principle. Inf. Comput. 80, 227–248 (1989)
J. Rissanen, Stochastic complexity. J. Roy. Stat. Soc. Ser. B 49(3), 223–239 (1987)
J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42(1), 40–47 (1996)
J. Rissanen, Stochastic complexity in learning. J. Comput. Syst. Sci. 55(1), 89–95 (1997)
J. Rissanen, MDL denoising. IEEE Trans. Inf. Theory 46(7), 2537–2543 (2000)
J. Rissanen, Information and Complexity in Statistical Modeling (Springer, New York, 2007)
J. Rissanen, Optimal Estimation of Parameters (Cambridge University Press, 2012)
J. Rissanen, T. Roos, P. Myllymäki, Model selection by sequentially normalized least squares. J. Multivar. Anal. 101(4), 839–849 (2010)
J. Rissanen, T. Speed, B. Yu, Density estimation by stochastic complexity. IEEE Trans. Inf. Theory 38(2), 315–323 (1992)
R.L. Rivest, Learning decision lists. Mach. Learn. 2, 229–246 (1987)
G. Schwarz, Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
R. Shibata, An optimal selection of regression variables. Biometrika 68(1), 45–54 (1981)
S. Squires, A. Prägel-Bennett, M. Niranjan, Rank selection in nonnegative matrix factorization using minimum description length. Neural Comput. 29(8), 2164–2176 (2017)
S. Tanaka, Y. Kawamura, M. Kawakita, N. Murata, J. Takeuchi, MDL criterion for NMF with application to botnet detection, in Proceedings of International Conference on Neural Information Processing (ICONIP 2016), pp. 570–578 (2016)
G.T. Walker, On periodicity in series of related terms. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 1931 (Published: 03 June 1931), https://doi.org/10.1098/rspa.1931.0069
C.S. Wallace, D.M. Boulton, An information measure for classification. Comput. J. 11(2), 185–194 (1968)
C.S. Wallace, P.R. Freeman, Estimation and inference by compact coding. J. Roy. Stat. Soc. Ser. B 49(3), 240–252 (1987)
C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length (Springer, 2005)
A.J. Wilson, Volume of n-dimensional ellipsoid. Sci. Acta Xaveriana 1(1), 101–106 (2009). ISSN: 0976-1152
K. Yamanishi, A learning criterion for stochastic rules. Mach. Learn. 9, 165–203 (1992)
Z. Yin, Y. Shen, On the dimensionality of word embedding, in Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY, USA, 2018), pp. 895–906
M. Zhai, J. Tan, J.D. Choi, Intrinsic and extrinsic evaluations of word embeddings, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Yamanishi, K. (2023). Model Selection. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-1790-7_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1789-1
Online ISBN: 978-981-99-1790-7
eBook Packages: Computer ScienceComputer Science (R0)