High-dimensional robust inference for censored linear models

Huang, Jiayu; Wu, Yuanshan

doi:10.1007/s11425-022-2070-2

High-dimensional robust inference for censored linear models

Articles
Published: 25 January 2024

Volume 67, pages 891–918, (2024)
Cite this article

Science China Mathematics Aims and scope Submit manuscript

Jiayu Huang¹ &
Yuanshan Wu²

127 Accesses
Explore all metrics

Abstract

Due to the direct statistical interpretation, censored linear regression offers a valuable complement to the Cox proportional hazards regression in survival analysis. We propose a rank-based high-dimensional inference for censored linear regression without imposing any moment condition on the model error. We develop a theory of the high-dimensional U-statistic, circumvent challenges stemming from the non-smoothness of the loss function, and establish the convergence rate of the regularized estimator and the asymptotic normality of the resulting de-biased estimator as well as the consistency of the asymptotic variance estimation. As censoring can be viewed as a way of trimming, it strengthens the robustness of the rank-based high-dimensional inference, particularly for the heavy-tailed model error or the outlier in the presence of the response. We evaluate the finite-sample performance of the proposed method via extensive simulation studies and demonstrate its utility by applying it to a subcohort study from The Cancer Genome Atlas (TCGA).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Buckley J, James I. Linear regression with censored data. Biometrika, 1979, 66: 429–436
Article Google Scholar
Bühlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer, 2011
Book Google Scholar
Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics, 2009, 65: 394–404
Article MathSciNet Google Scholar
Candés E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist, 2007, 35: 2313–2351
MathSciNet Google Scholar
Chai H, Zhang Q Z, Huang J, et al. Inference for low-dimensional covariates in a high-dimensional accelerated failure time model. Statist Sinica, 2019, 29: 877–894
MathSciNet Google Scholar
Chen X H, Linton O, van Keilegom I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica, 2003, 71: 1591–1608
Article MathSciNet Google Scholar
Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. Econom J, 2018, 21: 1–68
Article MathSciNet Google Scholar
Chin K, DeVries S, Fridlyand J, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell, 2006, 10: 529–541
Article Google Scholar
Cox D R. Regression models and life-tables (with discussion). J R Stat Soc Ser B Stat Methodol, 1972, 34: 187–220
Google Scholar
Cox D R, Oakes D. Analysis of Survival Data. New York: Chapman & Hall/CRC, 1984
Google Scholar
Fan J Q, Li Q F, Wang Y Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Ser B Stat Methodol, 2017, 79: 247–265
Article MathSciNet Google Scholar
Fygenson M, Ritov Y. Monotone estimating equations for censored data. Ann Statist, 1994, 22: 732–746
Article MathSciNet Google Scholar
Harrell F E, Lee K L, Mark D B. Tutorial in biostatistics: Multivariable prognostic models: Issues in develo** models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 1996, 15: 361–387
Article Google Scholar
Heller G. Smoothed rank regression with censored data. J Amer Statist Assoc, 2007, 102: 552–559
Article MathSciNet Google Scholar
Huang J, Ma S G. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal, 2010, 16: 176–195
Article MathSciNet Google Scholar
Huang J, Ma S G, **e H L. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 2006, 62: 813–820
Article MathSciNet Google Scholar
Huber P J, Ronchetti E M. Robust Statistics. New York: Wiley, 2004
Google Scholar
Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res, 2014, 15: 2869–2909
MathSciNet Google Scholar
** Z Z, Lin D Y, Wei L J, et al. Rank-based inference for the accelerated failure time model. Biometrika, 2003, 90: 341–353
Article MathSciNet Google Scholar
Johnson B A. Variable selection in semiparametric linear regression with censored data. J R Stat Soc Ser B Stat Methodol, 2008, 70: 351–370
Article MathSciNet Google Scholar
Kalbfleisch J D, Prentice R L. The Statistical Analysis of Failure Time Data. New York: Wiley, 2002
Book Google Scholar
Lai T L, Ying Z L. Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann Statist, 1991, 19: 1370–1402
Article MathSciNet Google Scholar
Leng C. Variable selection and coefficient estimation via regularized rank regression. Statist Sinica, 2010, 20: 167–181
MathSciNet Google Scholar
Miller L D, Smeds J, George J, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA, 2005, 102: 13550–13555
Article Google Scholar
Muüller P, van de Geer S. Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. TEST, 2016, 25: 75–92
Article MathSciNet Google Scholar
Naderi A, Teschendorff A E, Barbosa-Morais N L, et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 2007, 26: 1507–1516
Article Google Scholar
Neyman J. Optimal asymptotic tests of composite hypotheses. In: Probability and Statistics. The Harald Cramér Volume. New York: Wiley, 1959, 213–234
Google Scholar
Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195
Article MathSciNet Google Scholar
Peel T, Anthoine S, Ralaivola L. Empirical Bernstein inequalities for U-statistics. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, vol. 2. Red Hook: Curran Associates, 2010, 1903–1911
Google Scholar
Peng L M, Huang Y J. Survival analysis with quantile regression models. J Amer Statist Assoc, 2008, 103: 637–649
Article MathSciNet Google Scholar
Portnoy S. Censored regression quantiles. J Amer Statist Assoc, 2003, 98: 1001–1012
Article MathSciNet Google Scholar
Prentice R L. Linear rank tests with right censored data. Biometrika, 1978, 65: 167–179
Article MathSciNet Google Scholar
Reid N. A conversation with Sir David Cox. Statist Sci, 1994, 9: 439–455
Article MathSciNet Google Scholar
Ritov Y. Estimation in a linear regression model with censored data. Ann Statist, 1990, 18: 303–328
Article MathSciNet Google Scholar
Schuster E F. Estimation of a probability density function and its derivatives. Ann Math Stat, 1969, 40: 1187–1195
Article MathSciNet Google Scholar
Song R, Lu W B, Ma S G, et al. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 101: 799–814
Article MathSciNet Google Scholar
Stute W. Consistent estimation under random censorship when covariables are present. J Multivariate Anal, 1993, 45: 89–103
Article MathSciNet Google Scholar
Sun Q, Zhou W-X, Fan J Q. Adaptive Huber regression. J Amer Statist Assoc, 2020, 115: 254–265
Article MathSciNet Google Scholar
Tukey J W, McLaughlin D H. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhya A, 1963, 25: 331–352
MathSciNet Google Scholar
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202
Article MathSciNet Google Scholar
Wainwright M J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. New York: Cambridge University Press, 2019
Book Google Scholar
Wand M P, Jones M C. Kernel Smoothing. Boca Raton: CRC Press, 1994
Book Google Scholar
Wang L, Peng B, Bradic J, et al. A tuning-free robust and efficient approach to high-dimensional regression. J Amer Statist Assoc, 2020, 115: 1700–1714
Article MathSciNet Google Scholar
Xu J F, Leng C L, Ying Z L. Rank-based variable selection with censored data. Stat Comput, 2010, 20: 165–176
Article MathSciNet Google Scholar
Yu Y, Bradic J, Samworth R J. Confidence intervals for high-dimensional Cox models. Statist Sinica, 2021, 31: 243–267
Google Scholar
Zeng D, Lin D Y. Efficient estimation for the accelerated failure time model. J Amer Statist Assoc, 2007, 102: 1387–1396
Article MathSciNet Google Scholar
Zhang C-H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242
Article MathSciNet Google Scholar
Zhang J, Yin G S, Liu Y Y, et al. Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal, 2018, 24: 273–292
Article MathSciNet Google Scholar
Zhao X Q, Wu Y S, Yin G S. Sieve maximum likelihood estimation for a general class of accelerated hazards models with bundled parameters. Bernoulli, 2017, 23: 3385–3411
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 12071483). The authors thank two referees for their constructive comments leading to significant improvements in the article.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
Jiayu Huang
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
Yuanshan Wu

Authors

Jiayu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanshan Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanshan Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, J., Wu, Y. High-dimensional robust inference for censored linear models. Sci. China Math. 67, 891–918 (2024). https://doi.org/10.1007/s11425-022-2070-2

Download citation

Received: 03 March 2022
Accepted: 05 December 2022
Published: 25 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11425-022-2070-2

Keywords

MSC(2020)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional robust inference for censored linear models

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC(2020)

Subscribe and save

Buy Now

Search

Navigation