Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection

Marquis, Bastien; Jansen, Maarten

doi:10.1007/978-3-030-57306-5_32

Bastien Marquis⁴ &
Maarten Jansen⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 339))

Included in the following conference series:

Conference of the International Society for Non-Parametric Statistics

1131 Accesses

Abstract

In sparse high-dimensional data, the selection of a model can lead to an overestimation of the number of nonzero variables. Indeed, the use of an $\ell _1$ norm constraint while minimising the sum of squared residuals tempers the effects of false positives, thus they are more likely to be included in the model. On the other hand, an $\ell _0$ regularisation is a non-convex problem and finding its solution is a combinatorial challenge which becomes unfeasible for more than 50 variables. To overcome this situation, one can perform selection via an $\ell _1$ penalisation but estimate the selected components without shrinkage. This leads to an additional bias in the optimisation of an information criterion over the model size. Used as a stop** rule, this IC must be modified to take into account the deviation of the estimation with and without shrinkage. By looking into the difference between the prediction error and the expected Mallows’s Cp, previous work has analysed a correction for the optimisation bias and an expression can be found for a signal-plus-noise model given some assumptions. A focus on structured models, in particular, grouped variables, shows similar results, though the bias is noticeably reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 160.49; Price includes VAT (Germany)

Softcover Book: EUR 213.99; Price includes VAT (Germany)

Hardcover Book: EUR 213.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Bi-level Variable Selection and Application to Estimation of Multiple Covariance Matrices

$$\ell _1$$ Regularized Robust and Sparse Linear Modeling Using Discrete Optimization

A doubly sparse approach for group variable selection

Article 28 June 2016

Notes

1.
A similar discussion would hold for any distance between selected and true model.

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: B.N. Petrov, F. Csáki (eds) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akadémiai Kiadó, Budapest (1973)
Google Scholar
Donoho, D.L.: For most large underdetermined systems of equations, the minimal $\ell _1$-norm near-solution approximates the sparsest near-solution. Commun. Pure Appl. Math. 59, 907–934 (2006)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Article Google Scholar
Jansen, M.: Information criteria for variable selection under sparsity. Biometrika 101, 37–55 (2014)
Article MathSciNet Google Scholar
Mallows, C.L.: Some comments on cp. Technometrics 15, 661–675 (1973)
MATH Google Scholar
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Annal. Stat. 34, 1436–1462 (2006)
Article MathSciNet Google Scholar
Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: J. Neyman (ed), Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, pp. 197–206. University of California Press (1956)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 67(1), 91–108 (2005)
Google Scholar
Wainwright, M.J.: Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell _1$-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory 55(5), 2183–2202 (2009)
Article MathSciNet Google Scholar
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93, 120–131 (1998)
Article MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2007)
Google Scholar
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Annal. Stat. 37, 3468–3497 (2009)
Article MathSciNet Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universit libre de Bruxelles, Brussels, Belgium
Bastien Marquis & Maarten Jansen

Authors

Bastien Marquis
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Jansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bastien Marquis .

Editor information

Editors and Affiliations

Department of Economics and Statistics, University of Salerno, Salerno, Italy
Michele La Rocca
Department of Methods and Models for Economics, Terrority and Finance, Sapienza University of Rome, Rome, Italy
Brunero Liseo
Department of Management and Engineering, University of Padua, Vicenza, Italy
Luigi Salmaso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marquis, B., Jansen, M. (2020). Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection. In: La Rocca, M., Liseo, B., Salmaso, L. (eds) Nonparametric Statistics. ISNPS 2018. Springer Proceedings in Mathematics & Statistics, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-57306-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-57306-5_32
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57305-8
Online ISBN: 978-3-030-57306-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Bi-level Variable Selection and Application to Estimation of Multiple Covariance Matrices

$$\ell _1$$ Regularized Robust and Sparse Linear Modeling Using Discrete Optimization

A doubly sparse approach for group variable selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Bi-level Variable Selection and Application to Estimation of Multiple Covariance Matrices

$$\ell _1$$ Regularized Robust and Sparse Linear Modeling Using Discrete Optimization

A doubly sparse approach for group variable selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation