Abstract
Due to the direct statistical interpretation, censored linear regression offers a valuable complement to the Cox proportional hazards regression in survival analysis. We propose a rank-based high-dimensional inference for censored linear regression without imposing any moment condition on the model error. We develop a theory of the high-dimensional U-statistic, circumvent challenges stemming from the non-smoothness of the loss function, and establish the convergence rate of the regularized estimator and the asymptotic normality of the resulting de-biased estimator as well as the consistency of the asymptotic variance estimation. As censoring can be viewed as a way of trimming, it strengthens the robustness of the rank-based high-dimensional inference, particularly for the heavy-tailed model error or the outlier in the presence of the response. We evaluate the finite-sample performance of the proposed method via extensive simulation studies and demonstrate its utility by applying it to a subcohort study from The Cancer Genome Atlas (TCGA).
References
Buckley J, James I. Linear regression with censored data. Biometrika, 1979, 66: 429–436
Bühlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer, 2011
Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics, 2009, 65: 394–404
Candés E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist, 2007, 35: 2313–2351
Chai H, Zhang Q Z, Huang J, et al. Inference for low-dimensional covariates in a high-dimensional accelerated failure time model. Statist Sinica, 2019, 29: 877–894
Chen X H, Linton O, van Keilegom I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica, 2003, 71: 1591–1608
Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. Econom J, 2018, 21: 1–68
Chin K, DeVries S, Fridlyand J, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell, 2006, 10: 529–541
Cox D R. Regression models and life-tables (with discussion). J R Stat Soc Ser B Stat Methodol, 1972, 34: 187–220
Cox D R, Oakes D. Analysis of Survival Data. New York: Chapman & Hall/CRC, 1984
Fan J Q, Li Q F, Wang Y Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Ser B Stat Methodol, 2017, 79: 247–265
Fygenson M, Ritov Y. Monotone estimating equations for censored data. Ann Statist, 1994, 22: 732–746
Harrell F E, Lee K L, Mark D B. Tutorial in biostatistics: Multivariable prognostic models: Issues in develo** models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 1996, 15: 361–387
Heller G. Smoothed rank regression with censored data. J Amer Statist Assoc, 2007, 102: 552–559
Huang J, Ma S G. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal, 2010, 16: 176–195
Huang J, Ma S G, **e H L. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 2006, 62: 813–820
Huber P J, Ronchetti E M. Robust Statistics. New York: Wiley, 2004
Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res, 2014, 15: 2869–2909
** Z Z, Lin D Y, Wei L J, et al. Rank-based inference for the accelerated failure time model. Biometrika, 2003, 90: 341–353
Johnson B A. Variable selection in semiparametric linear regression with censored data. J R Stat Soc Ser B Stat Methodol, 2008, 70: 351–370
Kalbfleisch J D, Prentice R L. The Statistical Analysis of Failure Time Data. New York: Wiley, 2002
Lai T L, Ying Z L. Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann Statist, 1991, 19: 1370–1402
Leng C. Variable selection and coefficient estimation via regularized rank regression. Statist Sinica, 2010, 20: 167–181
Miller L D, Smeds J, George J, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA, 2005, 102: 13550–13555
Muüller P, van de Geer S. Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. TEST, 2016, 25: 75–92
Naderi A, Teschendorff A E, Barbosa-Morais N L, et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 2007, 26: 1507–1516
Neyman J. Optimal asymptotic tests of composite hypotheses. In: Probability and Statistics. The Harald Cramér Volume. New York: Wiley, 1959, 213–234
Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195
Peel T, Anthoine S, Ralaivola L. Empirical Bernstein inequalities for U-statistics. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, vol. 2. Red Hook: Curran Associates, 2010, 1903–1911
Peng L M, Huang Y J. Survival analysis with quantile regression models. J Amer Statist Assoc, 2008, 103: 637–649
Portnoy S. Censored regression quantiles. J Amer Statist Assoc, 2003, 98: 1001–1012
Prentice R L. Linear rank tests with right censored data. Biometrika, 1978, 65: 167–179
Reid N. A conversation with Sir David Cox. Statist Sci, 1994, 9: 439–455
Ritov Y. Estimation in a linear regression model with censored data. Ann Statist, 1990, 18: 303–328
Schuster E F. Estimation of a probability density function and its derivatives. Ann Math Stat, 1969, 40: 1187–1195
Song R, Lu W B, Ma S G, et al. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 101: 799–814
Stute W. Consistent estimation under random censorship when covariables are present. J Multivariate Anal, 1993, 45: 89–103
Sun Q, Zhou W-X, Fan J Q. Adaptive Huber regression. J Amer Statist Assoc, 2020, 115: 254–265
Tukey J W, McLaughlin D H. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhya A, 1963, 25: 331–352
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202
Wainwright M J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. New York: Cambridge University Press, 2019
Wand M P, Jones M C. Kernel Smoothing. Boca Raton: CRC Press, 1994
Wang L, Peng B, Bradic J, et al. A tuning-free robust and efficient approach to high-dimensional regression. J Amer Statist Assoc, 2020, 115: 1700–1714
Xu J F, Leng C L, Ying Z L. Rank-based variable selection with censored data. Stat Comput, 2010, 20: 165–176
Yu Y, Bradic J, Samworth R J. Confidence intervals for high-dimensional Cox models. Statist Sinica, 2021, 31: 243–267
Zeng D, Lin D Y. Efficient estimation for the accelerated failure time model. J Amer Statist Assoc, 2007, 102: 1387–1396
Zhang C-H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242
Zhang J, Yin G S, Liu Y Y, et al. Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal, 2018, 24: 273–292
Zhao X Q, Wu Y S, Yin G S. Sieve maximum likelihood estimation for a general class of accelerated hazards models with bundled parameters. Bernoulli, 2017, 23: 3385–3411
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 12071483). The authors thank two referees for their constructive comments leading to significant improvements in the article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, J., Wu, Y. High-dimensional robust inference for censored linear models. Sci. China Math. 67, 891–918 (2024). https://doi.org/10.1007/s11425-022-2070-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-022-2070-2