Log in

High-dimensional robust inference for censored linear models

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

Due to the direct statistical interpretation, censored linear regression offers a valuable complement to the Cox proportional hazards regression in survival analysis. We propose a rank-based high-dimensional inference for censored linear regression without imposing any moment condition on the model error. We develop a theory of the high-dimensional U-statistic, circumvent challenges stemming from the non-smoothness of the loss function, and establish the convergence rate of the regularized estimator and the asymptotic normality of the resulting de-biased estimator as well as the consistency of the asymptotic variance estimation. As censoring can be viewed as a way of trimming, it strengthens the robustness of the rank-based high-dimensional inference, particularly for the heavy-tailed model error or the outlier in the presence of the response. We evaluate the finite-sample performance of the proposed method via extensive simulation studies and demonstrate its utility by applying it to a subcohort study from The Cancer Genome Atlas (TCGA).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

References

  1. Buckley J, James I. Linear regression with censored data. Biometrika, 1979, 66: 429–436

    Article  Google Scholar 

  2. Bühlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer, 2011

    Book  Google Scholar 

  3. Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics, 2009, 65: 394–404

    Article  MathSciNet  Google Scholar 

  4. Candés E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist, 2007, 35: 2313–2351

    MathSciNet  Google Scholar 

  5. Chai H, Zhang Q Z, Huang J, et al. Inference for low-dimensional covariates in a high-dimensional accelerated failure time model. Statist Sinica, 2019, 29: 877–894

    MathSciNet  Google Scholar 

  6. Chen X H, Linton O, van Keilegom I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica, 2003, 71: 1591–1608

    Article  MathSciNet  Google Scholar 

  7. Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. Econom J, 2018, 21: 1–68

    Article  MathSciNet  Google Scholar 

  8. Chin K, DeVries S, Fridlyand J, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell, 2006, 10: 529–541

    Article  Google Scholar 

  9. Cox D R. Regression models and life-tables (with discussion). J R Stat Soc Ser B Stat Methodol, 1972, 34: 187–220

    Google Scholar 

  10. Cox D R, Oakes D. Analysis of Survival Data. New York: Chapman & Hall/CRC, 1984

    Google Scholar 

  11. Fan J Q, Li Q F, Wang Y Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Ser B Stat Methodol, 2017, 79: 247–265

    Article  MathSciNet  Google Scholar 

  12. Fygenson M, Ritov Y. Monotone estimating equations for censored data. Ann Statist, 1994, 22: 732–746

    Article  MathSciNet  Google Scholar 

  13. Harrell F E, Lee K L, Mark D B. Tutorial in biostatistics: Multivariable prognostic models: Issues in develo** models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 1996, 15: 361–387

    Article  Google Scholar 

  14. Heller G. Smoothed rank regression with censored data. J Amer Statist Assoc, 2007, 102: 552–559

    Article  MathSciNet  Google Scholar 

  15. Huang J, Ma S G. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal, 2010, 16: 176–195

    Article  MathSciNet  Google Scholar 

  16. Huang J, Ma S G, **e H L. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 2006, 62: 813–820

    Article  MathSciNet  Google Scholar 

  17. Huber P J, Ronchetti E M. Robust Statistics. New York: Wiley, 2004

    Google Scholar 

  18. Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res, 2014, 15: 2869–2909

    MathSciNet  Google Scholar 

  19. ** Z Z, Lin D Y, Wei L J, et al. Rank-based inference for the accelerated failure time model. Biometrika, 2003, 90: 341–353

    Article  MathSciNet  Google Scholar 

  20. Johnson B A. Variable selection in semiparametric linear regression with censored data. J R Stat Soc Ser B Stat Methodol, 2008, 70: 351–370

    Article  MathSciNet  Google Scholar 

  21. Kalbfleisch J D, Prentice R L. The Statistical Analysis of Failure Time Data. New York: Wiley, 2002

    Book  Google Scholar 

  22. Lai T L, Ying Z L. Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann Statist, 1991, 19: 1370–1402

    Article  MathSciNet  Google Scholar 

  23. Leng C. Variable selection and coefficient estimation via regularized rank regression. Statist Sinica, 2010, 20: 167–181

    MathSciNet  Google Scholar 

  24. Miller L D, Smeds J, George J, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA, 2005, 102: 13550–13555

    Article  Google Scholar 

  25. Muüller P, van de Geer S. Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. TEST, 2016, 25: 75–92

    Article  MathSciNet  Google Scholar 

  26. Naderi A, Teschendorff A E, Barbosa-Morais N L, et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 2007, 26: 1507–1516

    Article  Google Scholar 

  27. Neyman J. Optimal asymptotic tests of composite hypotheses. In: Probability and Statistics. The Harald Cramér Volume. New York: Wiley, 1959, 213–234

    Google Scholar 

  28. Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195

    Article  MathSciNet  Google Scholar 

  29. Peel T, Anthoine S, Ralaivola L. Empirical Bernstein inequalities for U-statistics. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, vol. 2. Red Hook: Curran Associates, 2010, 1903–1911

    Google Scholar 

  30. Peng L M, Huang Y J. Survival analysis with quantile regression models. J Amer Statist Assoc, 2008, 103: 637–649

    Article  MathSciNet  Google Scholar 

  31. Portnoy S. Censored regression quantiles. J Amer Statist Assoc, 2003, 98: 1001–1012

    Article  MathSciNet  Google Scholar 

  32. Prentice R L. Linear rank tests with right censored data. Biometrika, 1978, 65: 167–179

    Article  MathSciNet  Google Scholar 

  33. Reid N. A conversation with Sir David Cox. Statist Sci, 1994, 9: 439–455

    Article  MathSciNet  Google Scholar 

  34. Ritov Y. Estimation in a linear regression model with censored data. Ann Statist, 1990, 18: 303–328

    Article  MathSciNet  Google Scholar 

  35. Schuster E F. Estimation of a probability density function and its derivatives. Ann Math Stat, 1969, 40: 1187–1195

    Article  MathSciNet  Google Scholar 

  36. Song R, Lu W B, Ma S G, et al. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 101: 799–814

    Article  MathSciNet  Google Scholar 

  37. Stute W. Consistent estimation under random censorship when covariables are present. J Multivariate Anal, 1993, 45: 89–103

    Article  MathSciNet  Google Scholar 

  38. Sun Q, Zhou W-X, Fan J Q. Adaptive Huber regression. J Amer Statist Assoc, 2020, 115: 254–265

    Article  MathSciNet  Google Scholar 

  39. Tukey J W, McLaughlin D H. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhya A, 1963, 25: 331–352

    MathSciNet  Google Scholar 

  40. van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202

    Article  MathSciNet  Google Scholar 

  41. Wainwright M J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. New York: Cambridge University Press, 2019

    Book  Google Scholar 

  42. Wand M P, Jones M C. Kernel Smoothing. Boca Raton: CRC Press, 1994

    Book  Google Scholar 

  43. Wang L, Peng B, Bradic J, et al. A tuning-free robust and efficient approach to high-dimensional regression. J Amer Statist Assoc, 2020, 115: 1700–1714

    Article  MathSciNet  Google Scholar 

  44. Xu J F, Leng C L, Ying Z L. Rank-based variable selection with censored data. Stat Comput, 2010, 20: 165–176

    Article  MathSciNet  Google Scholar 

  45. Yu Y, Bradic J, Samworth R J. Confidence intervals for high-dimensional Cox models. Statist Sinica, 2021, 31: 243–267

    Google Scholar 

  46. Zeng D, Lin D Y. Efficient estimation for the accelerated failure time model. J Amer Statist Assoc, 2007, 102: 1387–1396

    Article  MathSciNet  Google Scholar 

  47. Zhang C-H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242

    Article  MathSciNet  Google Scholar 

  48. Zhang J, Yin G S, Liu Y Y, et al. Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal, 2018, 24: 273–292

    Article  MathSciNet  Google Scholar 

  49. Zhao X Q, Wu Y S, Yin G S. Sieve maximum likelihood estimation for a general class of accelerated hazards models with bundled parameters. Bernoulli, 2017, 23: 3385–3411

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 12071483). The authors thank two referees for their constructive comments leading to significant improvements in the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanshan Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Wu, Y. High-dimensional robust inference for censored linear models. Sci. China Math. 67, 891–918 (2024). https://doi.org/10.1007/s11425-022-2070-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-022-2070-2

Keywords

MSC(2020)

Navigation