Abstract
Boosting is one of the most powerful statistical learning methods that combines multiple weak learners into a strong learner. The main idea of boosting is to sequentially apply the algorithm to enhance its performance. Recently, boosting methods have been implemented to handle variable selection. However, little work has been available to deal with complex data such as measurement error in covariates. In this paper, we adopt the boosting method to do variable selection, especially in the presence of measurement error. We develop two different approximated correction approaches to deal with different types of responses, and meanwhile, eliminate measurement error effects. In addition, the proposed algorithms are easy to implement and are able to derive precise estimators. Throughout numerical studies under various settings, the proposed method outperforms other competitive approaches.
Similar content being viewed by others
References
Brown, B., Miller, C.J., Wolfson, J.: ThrEEBoost: thresholded boosting for variable selection and prediction via estimating equations. J. Comput. Graph. Stat. 26, 579–588 (2017)
Brown, B., Weaver, T., Wolfson, J.: MEBoost: variable selection in the presence of measurement error. Stat. Med. 38, 2705–2718 (2019)
Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Stat. 35, 2313–2404 (2007)
Carroll, R.J., Küchenhoff, H., Lombard, F., Stefanski, L.A.: Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Am. Stat. Assoc. 91, 242–250 (1996)
Carroll, R.J., Fan, J., Gijbels, I., Wand, M.P.: Generalized partially linear single-index models. J. Am. Stat. Assoc. 92, 477–489 (1997)
Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Model. CRC Press, New York (2006)
Chen, L.-P.: Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. J. Stat. Comput. Simul. 90, 3261–3300 (2020)
Chen, L.-P.: Ultrahigh-dimensional sufficient dimension reduction with measurement error in covariates. Stat. Probab. Lett. 168, 108931 (2021)
Chen, L.-P., Yi, G.Y.: Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron. J. Stat. 14, 4054–4109 (2020)
Chen, L.-P., Yi, G.Y.: Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 77, 956–969 (2021a)
Chen, L.-P., Yi, G.Y.: Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann. Inst. Stat. Math. 73, 451–481 (2021b)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 409–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Hall, P., Li, K.-C.: On almost linearity of low-dimensional projections from high-dimensional data. Ann. Stat. 21, 867–889 (1993)
Hastie, T.: Comment: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 513–515 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, New York (2009)
Küchenhoff, H., Carroll, R.J.: Segmented regression with errors in predictors: semi-parametric and parametric methods. Stat. Med. 16, 169–188 (1997)
Ma, Y., Li, R.: Variable selection in measurement error models. Bernoulli 16, 274–300 (2010)
Nghiem, L., Potgieter, C.: Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics 75, 1133–1144 (2019)
Sørensen, Ø., Hellton, K.H., Frigessi, A., Thoresen, M.: Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Stat. 27, 739–749 (2018)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62, 961–071 (2006)
Wang, C.Y.: Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Stat. Sin. 10, 905–921 (2000)
Wolfson, J.: EEBoost: a general method for prediction and variable selection based on estimating equation. J. Am. Stat. Assoc. 106, 296–305 (2011)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgements
The author would like to thank the Editor, Associate Editor, and one referee for their useful comments to significantly improve the presentation of the initial manuscript. Chen’s research was supported by National Science and Technology Council with Grant ID 110-2118-M-004-006-MY2.
Author information
Authors and Affiliations
Contributions
Li-Pang Chen is the sole author for this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, LP. De-noising boosting methods for variable selection and estimation subject to error-prone variables. Stat Comput 33, 38 (2023). https://doi.org/10.1007/s11222-023-10209-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10209-3