Abstract
In sparse high-dimensional data, the selection of a model can lead to an overestimation of the number of nonzero variables. Indeed, the use of an \(\ell _1\) norm constraint while minimising the sum of squared residuals tempers the effects of false positives, thus they are more likely to be included in the model. On the other hand, an \(\ell _0\) regularisation is a non-convex problem and finding its solution is a combinatorial challenge which becomes unfeasible for more than 50 variables. To overcome this situation, one can perform selection via an \(\ell _1\) penalisation but estimate the selected components without shrinkage. This leads to an additional bias in the optimisation of an information criterion over the model size. Used as a stop** rule, this IC must be modified to take into account the deviation of the estimation with and without shrinkage. By looking into the difference between the prediction error and the expected Mallows’s Cp, previous work has analysed a correction for the optimisation bias and an expression can be found for a signal-plus-noise model given some assumptions. A focus on structured models, in particular, grouped variables, shows similar results, though the bias is noticeably reduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A similar discussion would hold for any distance between selected and true model.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: B.N. Petrov, F. Csáki (eds) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akadémiai Kiadó, Budapest (1973)
Donoho, D.L.: For most large underdetermined systems of equations, the minimal \(\ell _1\)-norm near-solution approximates the sparsest near-solution. Commun. Pure Appl. Math. 59, 907–934 (2006)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Jansen, M.: Information criteria for variable selection under sparsity. Biometrika 101, 37–55 (2014)
Mallows, C.L.: Some comments on cp. Technometrics 15, 661–675 (1973)
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Annal. Stat. 34, 1436–1462 (2006)
Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: J. Neyman (ed), Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, pp. 197–206. University of California Press (1956)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 67(1), 91–108 (2005)
Wainwright, M.J.: Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _1\)-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory 55(5), 2183–2202 (2009)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93, 120–131 (1998)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2007)
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Annal. Stat. 37, 3468–3497 (2009)
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Marquis, B., Jansen, M. (2020). Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection. In: La Rocca, M., Liseo, B., Salmaso, L. (eds) Nonparametric Statistics. ISNPS 2018. Springer Proceedings in Mathematics & Statistics, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-57306-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-57306-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57305-8
Online ISBN: 978-3-030-57306-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)