De-noising boosting methods for variable selection and estimation subject to error-prone variables

Chen, Li-Pang

doi:10.1007/s11222-023-10209-3

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Original Paper
Published: 04 February 2023

Volume 33, article number 38, (2023)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Li-Pang Chen¹

257 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Boosting is one of the most powerful statistical learning methods that combines multiple weak learners into a strong learner. The main idea of boosting is to sequentially apply the algorithm to enhance its performance. Recently, boosting methods have been implemented to handle variable selection. However, little work has been available to deal with complex data such as measurement error in covariates. In this paper, we adopt the boosting method to do variable selection, especially in the presence of measurement error. We develop two different approximated correction approaches to deal with different types of responses, and meanwhile, eliminate measurement error effects. In addition, the proposed algorithms are easy to implement and are able to derive precise estimators. Throughout numerical studies under various settings, the proposed method outperforms other competitive approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Article 13 January 2016

Generalized Estimating Equations Boosting (GEEB) machine for correlated data

Article Open access 22 January 2024

Adaptive step-length selection in gradient boosting for Gaussian location and scale models

Article Open access 30 January 2022

References

Brown, B., Miller, C.J., Wolfson, J.: ThrEEBoost: thresholded boosting for variable selection and prediction via estimating equations. J. Comput. Graph. Stat. 26, 579–588 (2017)
Article MathSciNet Google Scholar
Brown, B., Weaver, T., Wolfson, J.: MEBoost: variable selection in the presence of measurement error. Stat. Med. 38, 2705–2718 (2019)
Article MathSciNet Google Scholar
Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)
MathSciNet MATH Google Scholar
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Stat. 35, 2313–2404 (2007)
MATH Google Scholar
Carroll, R.J., Küchenhoff, H., Lombard, F., Stefanski, L.A.: Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Am. Stat. Assoc. 91, 242–250 (1996)
Article MathSciNet MATH Google Scholar
Carroll, R.J., Fan, J., Gijbels, I., Wand, M.P.: Generalized partially linear single-index models. J. Am. Stat. Assoc. 92, 477–489 (1997)
Article MathSciNet MATH Google Scholar
Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Model. CRC Press, New York (2006)
Book MATH Google Scholar
Chen, L.-P.: Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. J. Stat. Comput. Simul. 90, 3261–3300 (2020)
Article MathSciNet MATH Google Scholar
Chen, L.-P.: Ultrahigh-dimensional sufficient dimension reduction with measurement error in covariates. Stat. Probab. Lett. 168, 108931 (2021)
Article MathSciNet MATH Google Scholar
Chen, L.-P., Yi, G.Y.: Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron. J. Stat. 14, 4054–4109 (2020)
Article MathSciNet MATH Google Scholar
Chen, L.-P., Yi, G.Y.: Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 77, 956–969 (2021a)
Article MathSciNet Google Scholar
Chen, L.-P., Yi, G.Y.: Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann. Inst. Stat. Math. 73, 451–481 (2021b)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 409–499 (2004)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Hall, P., Li, K.-C.: On almost linearity of low-dimensional projections from high-dimensional data. Ann. Stat. 21, 867–889 (1993)
Article MathSciNet MATH Google Scholar
Hastie, T.: Comment: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 513–515 (2007)
Article MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, New York (2009)
MATH Google Scholar
Küchenhoff, H., Carroll, R.J.: Segmented regression with errors in predictors: semi-parametric and parametric methods. Stat. Med. 16, 169–188 (1997)
Article Google Scholar
Ma, Y., Li, R.: Variable selection in measurement error models. Bernoulli 16, 274–300 (2010)
Article MathSciNet MATH Google Scholar
Nghiem, L., Potgieter, C.: Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics 75, 1133–1144 (2019)
Article MathSciNet MATH Google Scholar
Sørensen, Ø., Hellton, K.H., Frigessi, A., Thoresen, M.: Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Stat. 27, 739–749 (2018)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62, 961–071 (2006)
Article MathSciNet MATH Google Scholar
Wang, C.Y.: Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Stat. Sin. 10, 905–921 (2000)
MathSciNet MATH Google Scholar
Wolfson, J.: EEBoost: a general method for prediction and variable selection based on estimating equation. J. Am. Stat. Assoc. 106, 296–305 (2011)
Article MathSciNet MATH Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the Editor, Associate Editor, and one referee for their useful comments to significantly improve the presentation of the initial manuscript. Chen’s research was supported by National Science and Technology Council with Grant ID 110-2118-M-004-006-MY2.

Author information

Authors and Affiliations

Department of Statistics, National Chengchi University, Taipei, Taiwan
Li-Pang Chen

Authors

Li-Pang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Li-Pang Chen is the sole author for this manuscript.

Corresponding author

Correspondence to Li-Pang Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, LP. De-noising boosting methods for variable selection and estimation subject to error-prone variables. Stat Comput 33, 38 (2023). https://doi.org/10.1007/s11222-023-10209-3

Download citation

Received: 30 August 2022
Accepted: 10 January 2023
Published: 04 February 2023
DOI: https://doi.org/10.1007/s11222-023-10209-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Generalized Estimating Equations Boosting (GEEB) machine for correlated data

Adaptive step-length selection in gradient boosting for Gaussian location and scale models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Generalized Estimating Equations Boosting (GEEB) machine for correlated data

Adaptive step-length selection in gradient boosting for Gaussian location and scale models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation