Model Selection

Yamanishi, Kenji

doi:10.1007/978-981-99-1790-7_3

Kenji Yamanishi²

304 Accesses

Abstract

This chapter shows an application of the MDL principle to statistical model selection. First a number of existing model selection criteria such as AIC, BIC, MML, and cross-validation are introduced. The MDL criterion is introduced as an information-theoretic model selection criterion. It is justified in terms of consistency, estimation optimality, and rate of convergence. Some variants of the MDL criterion, such as the sequential NML criterion and the luckiness NML criterion, are also introduced. We give examples of model selection with the MDL criterion, including histogram density optimization, non-negative matrix factorization, decision tree learning, word-embedding, time series, and linear regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 96.29; Price includes VAT (France)

Hardcover Book: EUR 131.86; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The program code for dimensionality selection with SNML is available at https://github.com/truythu169/snml-skip-gram.
The video explaining this technology can be viewed at https://encyclopedia.pub/video/video_detail/353.
2.
http://mattmahoney.net/dc/textdata.
3.
https://dumps.wikimedia.org/.

References

H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Google Scholar
S. Amari, H. Nagaoka, Methods of Information Geometry, vol. 191 (American Mathematical Soc., 2000)
Google Scholar
A. Barron, T. Cover, Minimum complexity density estimation. IEEE Trans. Inf. Theory 17(37), 1034–1054 (1991)
Article MathSciNet MATH Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (1984), https://doi.org/10.1201/9781315139470
C.D. Giurcaneanu, J. Rissanen, Estimation of AR and ARMA models by stochastic complexity. Inst. Math. Stat. Lect. Not. Monogr. Ser. 52, 48–59 (2006)
Article MathSciNet MATH Google Scholar
P.T. Hung, K. Yamanisihi, Word2vec skip-gram dimensionality selection via sequential normalized maximum likelihood. Entropy 23(8), 997 (2021), https://doi.org/10.3390/e23080997
M. Kawakita, J. Takeuchi, Barron and Cover’s theory in supervised learning and its application to lasso, in Proceedings of the 33rd International Conference on Machine Learning (ICML’13), PMLR 48, pp. 1958–1966 (2016)
Google Scholar
Y. Fu, S. Matsushima, K. Yamanishi, Model selection for non-negative tensor factorization with minimum description length. Entropy 21(7), 632 (2019), https://doi.org/10.3390/e21070632
P. Miettinen, J. Vreeken, MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans. Knowl. Discov. Data 8(4), 18:1–18:31 (2014)
Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of Neural Information Processing (NIPS’13), pp. 3111–3119 (2013)
Google Scholar
M. Kearns, R. Schapire, Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 31st Symposium on Foundation of Computer Sciences (FOCS’90), pp. 382–391 (1990)
Google Scholar
P.S. Laplace, Mémories de mathématique et de physique, tom sixuéme. Stat. Sci. 1(3), 366–367 (1774)
Google Scholar
D.D. Lee, H.S. Seung, Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Google Scholar
J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, 1993)
Google Scholar
J. Quinlan, R. Rivest, Inferring decision trees with the minimum description length principle. Inf. Comput. 80, 227–248 (1989)
Google Scholar
J. Rissanen, Stochastic complexity. J. Roy. Stat. Soc. Ser. B 49(3), 223–239 (1987)
Google Scholar
J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42(1), 40–47 (1996)
Google Scholar
J. Rissanen, Stochastic complexity in learning. J. Comput. Syst. Sci. 55(1), 89–95 (1997)
Google Scholar
J. Rissanen, MDL denoising. IEEE Trans. Inf. Theory 46(7), 2537–2543 (2000)
Google Scholar
J. Rissanen, Information and Complexity in Statistical Modeling (Springer, New York, 2007)
Google Scholar
J. Rissanen, Optimal Estimation of Parameters (Cambridge University Press, 2012)
Google Scholar
J. Rissanen, T. Roos, P. Myllymäki, Model selection by sequentially normalized least squares. J. Multivar. Anal. 101(4), 839–849 (2010)
Article MathSciNet MATH Google Scholar
J. Rissanen, T. Speed, B. Yu, Density estimation by stochastic complexity. IEEE Trans. Inf. Theory 38(2), 315–323 (1992)
Google Scholar
R.L. Rivest, Learning decision lists. Mach. Learn. 2, 229–246 (1987)
Google Scholar
G. Schwarz, Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Google Scholar
R. Shibata, An optimal selection of regression variables. Biometrika 68(1), 45–54 (1981)
Article MathSciNet MATH Google Scholar
S. Squires, A. Prägel-Bennett, M. Niranjan, Rank selection in nonnegative matrix factorization using minimum description length. Neural Comput. 29(8), 2164–2176 (2017)
Google Scholar
S. Tanaka, Y. Kawamura, M. Kawakita, N. Murata, J. Takeuchi, MDL criterion for NMF with application to botnet detection, in Proceedings of International Conference on Neural Information Processing (ICONIP 2016), pp. 570–578 (2016)
Google Scholar
G.T. Walker, On periodicity in series of related terms. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 1931 (Published: 03 June 1931), https://doi.org/10.1098/rspa.1931.0069
C.S. Wallace, D.M. Boulton, An information measure for classification. Comput. J. 11(2), 185–194 (1968)
Google Scholar
C.S. Wallace, P.R. Freeman, Estimation and inference by compact coding. J. Roy. Stat. Soc. Ser. B 49(3), 240–252 (1987)
Google Scholar
C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length (Springer, 2005)
Google Scholar
A.J. Wilson, Volume of n-dimensional ellipsoid. Sci. Acta Xaveriana 1(1), 101–106 (2009). ISSN: 0976-1152
Google Scholar
K. Yamanishi, A learning criterion for stochastic rules. Mach. Learn. 9, 165–203 (1992)
Google Scholar
Z. Yin, Y. Shen, On the dimensionality of word embedding, in Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY, USA, 2018), pp. 895–906
Google Scholar
M. Zhai, J. Tan, J.D. Choi, Intrinsic and extrinsic evaluations of word embeddings, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan
Kenji Yamanishi

Authors

Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenji Yamanishi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yamanishi, K. (2023). Model Selection. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-1790-7_3
Published: 15 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1789-1
Online ISBN: 978-981-99-1790-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics