• 304 Accesses

Abstract

This chapter shows an application of the MDL principle to statistical model selection. First a number of existing model selection criteria such as AIC, BIC, MML, and cross-validation are introduced. The MDL criterion is introduced as an information-theoretic model selection criterion. It is justified in terms of consistency, estimation optimality, and rate of convergence. Some variants of the MDL criterion, such as the sequential NML criterion and the luckiness NML criterion, are also introduced. We give examples of model selection with the MDL criterion, including histogram density optimization, non-negative matrix factorization, decision tree learning, word-embedding, time series, and linear regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 96.29
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 131.86
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The program code for dimensionality selection with SNML is available at https://github.com/truythu169/snml-skip-gram.

    The video explaining this technology can be viewed at https://encyclopedia.pub/video/video_detail/353.

  2. 2.

    http://mattmahoney.net/dc/textdata.

  3. 3.

    https://dumps.wikimedia.org/.

References

  1. H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)

    Google Scholar 

  2. S. Amari, H. Nagaoka, Methods of Information Geometry, vol. 191 (American Mathematical Soc., 2000)

    Google Scholar 

  3. A. Barron, T. Cover, Minimum complexity density estimation. IEEE Trans. Inf. Theory 17(37), 1034–1054 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  4. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (1984), https://doi.org/10.1201/9781315139470

  5. C.D. Giurcaneanu, J. Rissanen, Estimation of AR and ARMA models by stochastic complexity. Inst. Math. Stat. Lect. Not. Monogr. Ser. 52, 48–59 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. P.T. Hung, K. Yamanisihi, Word2vec skip-gram dimensionality selection via sequential normalized maximum likelihood. Entropy 23(8), 997 (2021), https://doi.org/10.3390/e23080997

  7. M. Kawakita, J. Takeuchi, Barron and Cover’s theory in supervised learning and its application to lasso, in Proceedings of the 33rd International Conference on Machine Learning (ICML’13), PMLR 48, pp. 1958–1966 (2016)

    Google Scholar 

  8. Y. Fu, S. Matsushima, K. Yamanishi, Model selection for non-negative tensor factorization with minimum description length. Entropy 21(7), 632 (2019), https://doi.org/10.3390/e21070632

  9. P. Miettinen, J. Vreeken, MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans. Knowl. Discov. Data 8(4), 18:1–18:31 (2014)

    Google Scholar 

  10. T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of Neural Information Processing (NIPS’13), pp. 3111–3119 (2013)

    Google Scholar 

  11. M. Kearns, R. Schapire, Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 31st Symposium on Foundation of Computer Sciences (FOCS’90), pp. 382–391 (1990)

    Google Scholar 

  12. P.S. Laplace, Mémories de mathématique et de physique, tom sixuéme. Stat. Sci. 1(3), 366–367 (1774)

    Google Scholar 

  13. D.D. Lee, H.S. Seung, Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Google Scholar 

  14. J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, 1993)

    Google Scholar 

  15. J. Quinlan, R. Rivest, Inferring decision trees with the minimum description length principle. Inf. Comput. 80, 227–248 (1989)

    Google Scholar 

  16. J. Rissanen, Stochastic complexity. J. Roy. Stat. Soc. Ser. B 49(3), 223–239 (1987)

    Google Scholar 

  17. J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42(1), 40–47 (1996)

    Google Scholar 

  18. J. Rissanen, Stochastic complexity in learning. J. Comput. Syst. Sci. 55(1), 89–95 (1997)

    Google Scholar 

  19. J. Rissanen, MDL denoising. IEEE Trans. Inf. Theory 46(7), 2537–2543 (2000)

    Google Scholar 

  20. J. Rissanen, Information and Complexity in Statistical Modeling (Springer, New York, 2007)

    Google Scholar 

  21. J. Rissanen, Optimal Estimation of Parameters (Cambridge University Press, 2012)

    Google Scholar 

  22. J. Rissanen, T. Roos, P. Myllymäki, Model selection by sequentially normalized least squares. J. Multivar. Anal. 101(4), 839–849 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. J. Rissanen, T. Speed, B. Yu, Density estimation by stochastic complexity. IEEE Trans. Inf. Theory 38(2), 315–323 (1992)

    Google Scholar 

  24. R.L. Rivest, Learning decision lists. Mach. Learn. 2, 229–246 (1987)

    Google Scholar 

  25. G. Schwarz, Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Google Scholar 

  26. R. Shibata, An optimal selection of regression variables. Biometrika 68(1), 45–54 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  27. S. Squires, A. Prägel-Bennett, M. Niranjan, Rank selection in nonnegative matrix factorization using minimum description length. Neural Comput. 29(8), 2164–2176 (2017)

    Google Scholar 

  28. S. Tanaka, Y. Kawamura, M. Kawakita, N. Murata, J. Takeuchi, MDL criterion for NMF with application to botnet detection, in Proceedings of International Conference on Neural Information Processing (ICONIP 2016), pp. 570–578 (2016)

    Google Scholar 

  29. G.T. Walker, On periodicity in series of related terms. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 1931 (Published: 03 June 1931), https://doi.org/10.1098/rspa.1931.0069

  30. C.S. Wallace, D.M. Boulton, An information measure for classification. Comput. J. 11(2), 185–194 (1968)

    Google Scholar 

  31. C.S. Wallace, P.R. Freeman, Estimation and inference by compact coding. J. Roy. Stat. Soc. Ser. B 49(3), 240–252 (1987)

    Google Scholar 

  32. C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length (Springer, 2005)

    Google Scholar 

  33. A.J. Wilson, Volume of n-dimensional ellipsoid. Sci. Acta Xaveriana 1(1), 101–106 (2009). ISSN: 0976-1152

    Google Scholar 

  34. K. Yamanishi, A learning criterion for stochastic rules. Mach. Learn. 9, 165–203 (1992)

    Google Scholar 

  35. Z. Yin, Y. Shen, On the dimensionality of word embedding, in Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY, USA, 2018), pp. 895–906

    Google Scholar 

  36. M. Zhai, J. Tan, J.D. Choi, Intrinsic and extrinsic evaluations of word embeddings, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenji Yamanishi .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yamanishi, K. (2023). Model Selection. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1790-7_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1789-1

  • Online ISBN: 978-981-99-1790-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation