Model Selection and Regularization

Cerulli, Giovanni

doi:10.1007/978-3-031-41337-7_3

Giovanni Cerulli ORCID: orcid.org/0000-0002-5892-3372³

Part of the book series: Statistics and Computing ((SCO))

1321 Accesses

Abstract

This chapter presents regularization and selection methods for linear and nonlinear (parametric) models. These are important Machine Learning techniques as they allow for targeting three distinct objectives: (1) prediction improvement; (2) model identification and causal inference in high-dimensional data settings; (3) feature-importance detection. The chapter starts by presenting model selection for improving prediction accuracy and model identification and estimation in high-dimensional data settings. Then, it addresses regularized linear models focusing on Lasso, Ridge, and Elastic-net models. Next, it focuses on regularized nonlinear models, which are extensions of the linear ones to generalized linear models (GLMs). Subsequently, it illustrates optimal Subset selection algorithms, which are pure computational approaches to optimal modeling and feature-importance extraction. After delving into the statistical properties of regularized regression, the chapter discusses causal inference in high-dimensional settings, both with an exogenous and endogenous treatment. The applied part of the chapter is fully dedicated to the Stata, R, and Python implementations of the methods presented in the theoretical part.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahrens, A., Hansen, C., & Schaffer, M. (2020). Lassopack: Model selection and prediction with regularized regression in Stata. Stata Journal, 20(1), 176–235.
Article Google Scholar
Angrist, J. D., & Krueger, A. B. (1991). Does compulsory school attendance affect schooling and earnings? The Quarterly Journal of Economics, 106(4), 979–1014.
Article Google Scholar
Belloni, A., & Chernozhukov, V. (2014b). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies, 81(2(287)), 608–650.
Google Scholar
Belloni, A., Chen, D., Chernozhukov, V., & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6), 2369–2429.
Article MathSciNet MATH Google Scholar
Belloni, A., & Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. In P. Alquier, E. Gautier, & G. Stoltz (Eds.), Inverse problems and high-dimensional estimation: Stats in the chteau summer school, August 31-September 4, 2009 (pp. 121–156). Lecture notes in statistics. Berlin, Heidelberg: Springer.
Chapter Google Scholar
Belloni, A., Chernozhukov, V., & Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2), 29–50.
Article MATH Google Scholar
Belloni, A., Chernozhukov, V., & Wei, Y. (2016). Post-selection inference for generalized linear models with many controls. Journal of Business & Economic Statistics, 34(4), 606–619.
Article MathSciNet Google Scholar
Bergmeir, C., Hyndman, R. J., & Koo, B. (2016). hdm: High-dimensional metrics. The R Journal, 8(2), 185–199.
Article Google Scholar
Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120, 70–83.
Article MathSciNet MATH Google Scholar
Bühlmann, P., & van de Geer, S. (2011). Theory for l1/l2-penalty procedures. Springer Series in StatisticsIn P. Bühlmann & S. van de Geer (Eds.), Statistics for high-dimensional data: Methods, theory and applications (pp. 249–291). Berlin, Heidelberg: Springer.
Google Scholar
Cerulli, G. (2020). SUBSET: Stata module to implement best covariates and stepwise subset selection. https://econpapers.repec.org/software/bocbocode/s458647.htm
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/ectj.12097
Chernozhukov, V., Hansen, C., & Spindler, M. (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. The American Economic Review, 105(5), 486–490. Publisher: American Economic Association. https://www.jstor.org/stable/43821933
Daubechies, I., Defrise, M., & Mol, C. D. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457.
Article MathSciNet MATH Google Scholar
Daubechies, I., Defrise, M., & Mol, C. D. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Google Scholar
Eaton, J. P., & Haas, C. A. (1995). Titanic: Triumph and tragedy (2nd ed.). New York: W. W. Norton & Company.
Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499. Publisher: Institute of Mathematical Statistics.
Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(3), 273–282. Publisher: [Royal Statistical Society, Wiley]. https://www.jstor.org/stable/41262671
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Google Scholar
Foster, D. P., & George, E. I. (1994). The risk inflation criterion for multiple regression. The Annals of Statistics, 22(4), 1947–1975.
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332.
Article MathSciNet MATH Google Scholar
Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3), 397–416.
MathSciNet Google Scholar
Geer, S. A. V. D., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electronic Journal of Statistics, 3(none), 1360–1392.
Google Scholar
Gorman, J. W., & Toman, R. J. (1966). Selection of variables for fitting equations to data. Technometrics, 8(1):27–51. Publisher: Taylor & Francis. https://amstat.tandfonline.com/doi/abs/10.1080/00401706.1966.10490322
Harrison, D. J., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102. Publisher: Elsevier. https://ideas.repec.org/a/eee/jeeman/v5y1978i1p81-102.html
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer series in statistics. New York: Springer. https://www.springer.com/gp/book/9780387848570
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Chapman and Hall/CRC.
Google Scholar
Hocking, R. R., & Leslie, R. N. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531–540.
Article MathSciNet Google Scholar
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Applications to nonorthogonal problems. Technometrics, 12(1), 69–82. Publisher: Taylor & Francis. https://amstat.tandfonline.com/doi/abs/10.1080/00401706.1970.10488635
Hoerl, A. E., Kannard, R. W., & Baldwin, K. F. (2007). Ridge regression: Some simulations. Communications in Statistics—Theory and Methods. Publisher: Marcel Dekker, Inc. https://www.tandfonline.com/doi/abs/10.1080/03610927508827232
Leeb, H., & Pötscher, B. M. (2008). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory, 24(2), 338–376. Publisher: Cambridge University Press. https://www.jstor.org/stable/20142496
Loh, P. -L., & Wainwright, M. J. (2017). Support recovery without incoherence: A case for nonconvex regularization. The Annals of Statistics, 45(6), 2455–2482. Publisher: Institute of Mathematical Statistics.
Google Scholar
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.-H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., Poggio, T., Gerald, W., Loda, M., Lander, E. S., & Golub, T. R. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences, 98(26), 15149–15154.
Article Google Scholar
Robinson, P. M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56(4), 931–954. Publisher: [Wiley, Econometric Society]. https://www.jstor.org/stable/1912705
Roecker, E. B. (1991). Prediction error and its estimation for subset-selected models. Technometrics, 33(4), 459–468.
Article Google Scholar
Theil, H. (1957). Specification errors and the estimation of economic relationships. Revue de l’Institut International de Statistique/Review of the International Statistical Institute, 25(1/3), 41–51. Publisher: [International Statistical Institute (ISI), Wiley]. https://www.jstor.org/stable/1401673
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. Publisher: [Royal Statistical Society, Wiley]. https://www.jstor.org/stable/2346178
van Wieringen, W. N. (2020). Lecture notes on ridge regression. ar**v:1509.09169 [stat].
Wang, Z., Liu, H., & Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. The Annals of Statistics, 42(6), 2164–2201.
Article MathSciNet MATH Google Scholar
Wold, H., & Faxer, P. (1957). On the specification error in regression analysis. The Annals of Mathematical Statistics, 28(1), 265–267. Publisher: Institute of Mathematical Statistics. https://www.jstor.org/stable/2237040
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IRCRES-CNR, Research Institute for Sustainable Economic Growth, National Research Council of Italy, Rome, Italy
Giovanni Cerulli

Authors

Giovanni Cerulli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Cerulli .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cerulli, G. (2023). Model Selection and Regularization. In: Fundamentals of Supervised Machine Learning. Statistics and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-41337-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-41337-7_3
Published: 15 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41336-0
Online ISBN: 978-3-031-41337-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Model Selection and Regularization