Model Selection

Emmert-Streib, Frank; Moutari, Salissou; Dehmer, Matthias

doi:10.1007/978-3-031-13339-8_12

Frank Emmert-Streib⁴,
Salissou Moutari⁵ &
Matthias Dehmer⁶

617 Accesses

Abstract

In this chapter, we discuss approaches for a problem called model selection. Model selection is always needed when there are a number of candidate models that could be used for a prediction task and the best among them must be chosen. For instance, for a classification problem, we may consider a support vector machine or a decision tree. Similarly, for a regression analysis, there may be different options for the number of predictors of the model. In either case, one needs to decide which statistical model to select from the available candidates. We’ve just discussed the topic of model selection. There is a related topic called model assessment. Model selection and model assessment are frequently confused, although each of these topics focuses on a different goal. For this reason, we start our discussion about model selection by clarifying the difference compared to model assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

K. Aho, D. Derryberry, T. Peterson, Model selection for ecologists: the worldviews of AIC and BIC. Ecology 95(3), 631–636 (2014).
Article Google Scholar
H. Akaike, A new look at the statistical model identification, in Selected Papers of Hirotugu Akaike (Springer, Berlin, 1974), pp. 215–222.
Book Google Scholar
S. Arlot, A. Celisse, et al., A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010).
Article MathSciNet MATH Google Scholar
E.M.L. Beale, M.G. Kendall, D.W. Mann, The discarding of variables in multivariate analysis. Biometrika 54(3–4), 357–366 (1967).
Article MathSciNet Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006).
MATH Google Scholar
K.P. Burnham, D.R. Anderson, Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33(2), 261–304 (2004).
Article MathSciNet Google Scholar
S. Derksen, H.J. Keselman, Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. Br. J. Math. Stat. Psychol. 45(2), 265–282 (1992).
Article Google Scholar
J. Ding, V. Tarokh, Y. Yang, Model selection techniques: an overview. IEEE Sig. Proces. Mag. 35(6), 16–34 (2018).
Article Google Scholar
N.R. Draper, H. Smith, Applied regression analysis, vol. 326. (John Wiley & Sons, Hoboken, 2014).
MATH Google Scholar
P.K. Dunn, G.K. Smyth, Generalized linear models with examples in R (Springer, Berlin, 2018).
Book MATH Google Scholar
M.R. Forster, Key concepts in model selection: Performance and generalizability. J. Math. Psychol. 44(1), 205–231 (2000).
Article MATH Google Scholar
M.R. Forster, Predictive accuracy as an achievable goal of science. Philos. Sci. 69(S3), S124–S134 (2002).
Article Google Scholar
S. Geisser, The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70(350), 320–328 (1975).
Article MATH Google Scholar
S.G. Gilmour, The interpretation of Mallows’s c_p-statistic. Statistician, 49–56 (1996).
Google Scholar
E.T. Jaynes, Probability theory: the logic of science (Cambridge University Press, Cambridge, 2003).
Book MATH Google Scholar
R.E. Kass, A.E. Raftery, Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995).
Article MathSciNet MATH Google Scholar
J. Kuha, AIC and BIC: comparisons of assumptions and performance. Sociol. Methods Res. 33(2), 188–229 (2004).
Article MathSciNet Google Scholar
M. Lavine, M.J. Schervish, Bayes factors: what they are and what they are not. Am. Stat. 53(2), 119–122 (1999).
MathSciNet Google Scholar
R.D. Morey, J.-W. Romeijn, J.N. Rouder, The philosophy of Bayes factors and the quantification of statistical evidence. J. Math. Psychol. 72, 6–18 (2016).
Article MathSciNet MATH Google Scholar
A.A. Neath, J.E. Cavanaugh, The bayesian information criterion: background, derivation, and applications. Wiley Interdiscip. Rev. Comput. Stat. 4(2), 199–203 (2012).
Article Google Scholar
A.M. Nicholson, Generalization error estimates and training data valuation. Ph.D. Thesis, California Institute of Technology (2002).
Google Scholar
A.E. Raftery, Bayesian model selection in social research. Sociol. Methodol. 111–163 (1995).
Google Scholar
G. Schwarz, et al., Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978).
Article MathSciNet MATH Google Scholar
M. Stone, Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol. 111–147 (1974).
Google Scholar
M.R.E. Symonds, A. Moussalli, A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behav. Ecol. Sociobiol. 65(1), 13–21 (2011).
Article Google Scholar
S.I. Vrieze, Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the bayesian information criterion (BIC). Psychol. Methods 17(2), 228 (2012).
Google Scholar
Q.H. Vuong, Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 307–333 (1989).
Google Scholar
J. Wand, X. Shen, Estimation of generalization error: random and fixed inputs. Stat. Sin. 16(2), 569 (2006).
Google Scholar
S. Wright, Correlation and causation. J. Agricult. Res. 20, 557–585 (1921).
Google Scholar
Y. Yang, Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92(4), 937–950 (2005).
MATH Google Scholar
C. Zuccaro, Mallows’ cp statistic and model selection in multiple linear regression. Market Res. Soc. J. 34(2), 1–10 (1992).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tampere University, Tampere, Finland
Frank Emmert-Streib
Queen’s University Belfast, Belfast, UK
Salissou Moutari
Swiss Distance University of Applied Science, Birg, Switzerland
Matthias Dehmer

Authors

Frank Emmert-Streib
View author publications
You can also search for this author in PubMed Google Scholar
Salissou Moutari
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Dehmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Emmert-Streib .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Emmert-Streib, F., Moutari, S., Dehmer, M. (2023). Model Selection. In: Elements of Data Science, Machine Learning, and Artificial Intelligence Using R. Springer, Cham. https://doi.org/10.1007/978-3-031-13339-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-13339-8_12
Published: 04 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13338-1
Online ISBN: 978-3-031-13339-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics