Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data

Yang, Zehan; Wang, HaiYing; Yan, Jun

doi:10.1007/s11222-024-10391-y

Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data

Original Paper
Published: 14 February 2024

Volume 34, article number 77, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Zehan Yang¹,
HaiYing Wang¹ &
Jun Yan¹

206 Accesses
1 Citation
Explore all metrics

Abstract

Massive survival data are increasingly common in many research fields, and subsampling is a practical strategy for analyzing such data. Although optimal subsampling strategies have been developed for Cox models, little has been done for semiparametric accelerated failure time (AFT) models due to the challenges posed by non-smooth estimating functions for the regression coefficients. We develop optimal subsampling algorithms for fitting semi-parametric AFT models using the least-squares approach. By efficiently estimating the slope matrix of the non-smooth estimating functions using a resampling approach, we construct optimal subsampling probabilities for the observations. For feasible point and interval estimation of the unknown coefficients, we propose a two-step method, drawing multiple subsamples in the second stage to correct for overestimation of the variance in higher censoring scenarios. We validate the performance of our estimators through a simulation study that compares single and multiple subsampling methods and apply the methods to analyze the survival time of lymphoma patients in the Surveillance, Epidemiology, and End Results program.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Broken adaptive ridge regression for right-censored survival data

Article 05 April 2021

Efficiency of the Breslow estimator in semiparametric transformation models

Article 26 November 2023

Semiparametric quantile-difference estimation for length-biased and right-censored data

Article 03 December 2018

References

Ai, M., Yu, J., Zhang, H., Wang, H.: Optimal subsampling algorithms for big data generalized linear models. Stat. Sin. 31(2), 749–772 (2021)
Google Scholar
Buckley, J., James, I.: Linear regression with censored data. Biometrika 66(3), 429–436 (1979)
Article Google Scholar
Chiou, S., Kang, S., Yan, J.: Rank-based estimating equations with general weight for accelerated failure time models: an induced smoothing approach. Stat. Med. 34(9), 1495–1510 (2015)
Article MathSciNet Google Scholar
Chiou, S.H., Kang, S., Yan, J.: Fitting accelerated failure time models in routine survival analysis with R package aftgee. J. Stat. Softw. 61(11), 1–23 (2014)
Article Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for \(L_2\) regression and applications. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136. Association of Computing Machinary (2006)
Hesterberg, T.: Weighted average importance sampling and defensive mixture distributions. Technometrics 37(2), 185–194 (1995)
Article MathSciNet Google Scholar
**, Z., Lin, D., Wei, L., Ying, Z.: Rank-based inference for the accelerated failure time model. Biometrika 90(2), 341–353 (2003)
Article MathSciNet Google Scholar
**, Z., Lin, D., Ying, Z.: On least-squares regression with censored data. Biometrika 93(1), 147–161 (2006)
Article MathSciNet Google Scholar
Keret, N., Gorfine, M.: Analyzing big EHR data–Optimal Cox regression subsampling procedure with rare events. Journal of the American Statistical Association. 118(544), 2262–2275 (2023)
Article MathSciNet Google Scholar
Li, R., Chang, C., Justesen, J.M., Tanigawa, Y., Qian, J., Hastie, T., Rivas, M.A., Tibshirani, R.: Fast lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK biobank. Biostatistics 23(3), 522–540 (2022)
Article MathSciNet Google Scholar
Ma, P., Chen, Y., Zhang, X., **ng, X., Ma, J., Mahoney, M.W.: Asymptotic analysis of sampling estimators for randomized numerical linear algebra algorithms. J. Mach. Learn. Res. 23(1), 7970–8014 (2022)
MathSciNet Google Scholar
Ma, P., Mahoney, M.W., Yu, B.: A statistical perspective on algorithmic leveraging. J. Mach. Learn. Res. 16(27), 861–911 (2015)
MathSciNet Google Scholar
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3(2), 123–224 (2011)
Google Scholar
Su, W., Yin, G., Zhang, J., Zhao, X.: Divide and conquer for accelerated failure time model with massive time-to-event data. Can. J. Stat. 51(2), 400–419 (2023)
Article MathSciNet Google Scholar
Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 18(1), 354–372 (1990)
Article MathSciNet Google Scholar
Wang, H., Ma, Y.: Optimal subsampling for quantile regression in big data. Biometrika 108(1), 99–112 (2021)
Article MathSciNet Google Scholar
Wang, H., Zhu, R., Ma, P.: Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 113(522), 829–844 (2018)
Wang, J., Zou, J., Wang, H.: Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling. IEEE Trans. Inf. Theory 68(10), 6605–6630 (2022)
Article MathSciNet Google Scholar
Wang, W., Lu, S.E., Cheng, J.Q., **e, M., Kostis, J.B.: Multivariate survival analysis in big data: a divide-and-combine approach. Biometrics 78(3), 852–866 (2022)
Article MathSciNet Google Scholar
Wang, Y., Hong, C., Palmer, N., Di, Q., Schwartz, J., Kohane, I., Cai, T.: A fast divide-and-conquer sparse Cox regression. Biostatistics 22(2), 381–401 (2021)
Article MathSciNet Google Scholar
Wu, J., Chen, M.H., Schifano, E.D., Yan, J.: Online updating of survival analysis. J. Comput. Graph. Stat. 30(4), 1209–1223 (2021)
Article MathSciNet Google Scholar
Xue, Y., Wang, H., Yan, J., Schifano, E.D.: An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics 76(1), 171–182 (2020)
Article MathSciNet Google Scholar
Yang, Z., Wang, H., Yan, J.: Optimal subsampling for parametric accelerated failure time models with massive survival data. Stat. Med. 41(27), 5421–5431 (2022)
Article MathSciNet Google Scholar
Zeng, D., Lin, D.: Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9(2), 355–363 (2008)
Article MathSciNet Google Scholar
Zhang, H., Zuo, L., Wang, H., Sun, L.: Approximating partial likelihood estimators via optimal subsampling. J. Comput. Graph. Stat. (2023)
Zuo, L., Zhang, H., Wang, H., Liu, L.: Sampling-based estimation for massive survival data with additive hazards model. Stat. Med. 40(2), 441–450 (2021)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Wang’s research was supported by NSF grant CCF 2105571 and UConn CLAS Research Funding in Academic Themes.

Author information

Authors and Affiliations

Department of Statistics, University of Connecticut, Storrs, CT, 06269–4120, USA
Zehan Yang, HaiYing Wang & Jun Yan

Authors

Zehan Yang
View author publications
You can also search for this author in PubMed Google Scholar
HaiYing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zehan Yang or HaiYing Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Z., Wang, H. & Yan, J. Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data. Stat Comput 34, 77 (2024). https://doi.org/10.1007/s11222-024-10391-y

Download citation

Received: 27 April 2023
Accepted: 18 January 2024
Published: 14 February 2024
DOI: https://doi.org/10.1007/s11222-024-10391-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Broken adaptive ridge regression for right-censored survival data

Efficiency of the Breslow estimator in semiparametric transformation models

Semiparametric quantile-difference estimation for length-biased and right-censored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Broken adaptive ridge regression for right-censored survival data

Efficiency of the Breslow estimator in semiparametric transformation models

Semiparametric quantile-difference estimation for length-biased and right-censored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation