Abstract
In this paper, we propose a stochastic approximation to the well-studied expectation–maximization (EM) algorithm for finding the maximum likelihood (ML)-type estimates in situations where missing data arise naturally and a proportion of individuals are immune to the event of interest. A flexible family of three parameter exponentiated Weibull (EW) distributions is assumed to characterize lifetimes of the non-immune individuals as it accommodates both monotone (increasing and decreasing) and non-monotone (unimodal and bathtub) hazard functions. To evaluate the performance of the proposed algorithm, an extensive simulation study is carried out under various parameter settings. Using likelihood ratio tests, we also carry out model discrimination within the EW family of distributions. Furthermore, we study the robustness of the proposed algorithm with respect to outliers in the data and the choice of initial values to start the algorithm. In particular, we show that our proposed algorithm is less sensitive to the choice of initial values when compared to the EM algorithm. For illustration, we analyze a real survival data on cutaneous melanoma. Through this data, we illustrate the applicability of the likelihood ratio test toward rejecting several well-known lifetime distributions that are nested within the wider class of the proposed EW distributions.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-022-00274-8/MediaObjects/42519_2022_274_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-022-00274-8/MediaObjects/42519_2022_274_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-022-00274-8/MediaObjects/42519_2022_274_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-022-00274-8/MediaObjects/42519_2022_274_Fig4_HTML.png)
Similar content being viewed by others
References
Amico M, Van Keilegom I, Legrand C (2019) The single-index/Cox mixture cure model. Biometrics 75:452–462
Balakrishnan N, Barui S, Milienos F (2017) Proportional hazards under Conway–Maxwell–Poisson cure rate model and associated inference. Stat Methods Med Res 26(5):2055–2077
Balakrishnan N, Barui S, Milienos FS (2022) Piecewise linear approximations of baseline under proportional hazards based COM-Poisson cure models. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2022.2032157
Balakrishnan N, Koutras M, Milienos F (2016) Piecewise linear approximations for cure rate models and associated inferential issues. Methodol Comput Appl Probab 18(4):937–966
Balakrishnan N, Pal S (2012) EM algorithm-based likelihood estimation for some cure rate models. J Stat Theory Pract 6:698–724
Balakrishnan N, Pal S (2013) Lognormal lifetimes and likelihood-based inference for flexible cure rate models based on COM-Poisson family. Comput Stat Data Anal 67:41–67
Balakrishnan N, Pal S (2014) COM-Poisson cure rate models and associated likelihood-based inference with exponential and Weibull lifetimes. In: Frenkel I, Karagrigoriou A, Lisnianski A, Kleyner A (eds) Applied reliability engineering and risk analysis: probabilistic models and statistical inference applied reliability engineering and risk analysis: probabilistic models and statistical inference. Wiley, Chichester, pp 308–348
Balakrishnan N, Pal S (2015) An EM algorithm for the estimation of parameters of a flexible cure rate model with generalized gamma lifetime and model discrimination using likelihood-and information-based methods. Comput Stat 30:151–189
Balakrishnan N, Pal S (2015) Likelihood inference for flexible cure rate models with gamma lifetimes. Commun Stat Theory Methods 44(19):4007–4048
Balakrishnan N, Pal S (2016) Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Stat Methods Med Res 25:1535–1563
Barui S, Grace YY (2020) Semiparametric methods for survival data with measurement error under additive hazards cure rate models. Lifetime Data Anal 26(3):421–450
Bedair KF, Hong Y, Al-Khalidi HR (2021) Copula-frailty models for recurrent event data based on Monte Carlo EM algorithm. J Stat Comput Simul 91(17):3530–3548
Bedair KF, Hong Y, Li J, Al-Khalidi HR (2016) Multivariate frailty models for multi-type recurrent event data and its application to cancer prevention trial. Comput Stat Data Anal 10(11):61–173
Cariou C, Chehdi K (2008) Unsupervised texture segmentation/classification using 2-d autoregressive modeling and the stochastic expectation-maximization algorithm. Pattern Recogn Lett 29(7):905–917
Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55(4):287–314
Celeux G, Diebolt J (1985) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput Stat Q 2:73–82
Celeux G, Diebolt J (1992) A stochastic approximation type EM algorithm for the mixture problem. Stoch Int J Probab Stoch Process 41(1–2):119–134
Chauveau D (1995) A stochastic EM algorithm for mixtures with censored data. J Stat Plan Inference 46(1):1–25
Chen M-H, Ibrahim JG, Sinha D (1999) A new Bayesian model for survival data with a surviving fraction. J Am Stat Assoc 94:909–919
Davies K, Pal S, Siddiqua JA (2021) Stochastic EM algorithm for generalized exponential cure rate model and an empirical study. J Appl Stat 48:2112–2135
Diebolt J, Celeux G (1993) Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Stoch Model 9(4):599–613
Diebolt J, Ip EH (1995) A stochastic EM algorithm for approximating the maximum likelihood estimate. https://www.osti.gov/biblio/49148
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Fletcher R (2013) Practical methods of optimization. Wiley, New York
Ibrahim JG, Chen M-H, Sinha D (2005) Bayesian survival analysis. Wiley, New York
Khan SA (2018) Exponentiated Weibull regression for time-to-event data. Lifetime Data Anal 24(2):328–354
Kosovalic N, Barui S (2022) A hard EM algorithm for prediction of the cured fraction in survival data. Comput Stat 37:817–835
Kuk AY, Chen C-H (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541
López-Cheda A, Cao R, Van Jácome MA, Keilegom I (2017) Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Comput Stat Data Anal 105:144–165
Lu W, Ying Z (2004) On semiparametric transformation cure models. Biometrika 91:2331–343
Majakwara J, Pal S (2019) On some inferential issues for the destructive COM-Poisson-generalized gamma regression cure rate model. Commun Stat Simul Comput 48(10):3118–3142
Marschner IC (2001) Miscellanea on stochastic versions of the algorithm. Biometrika 88(1):281–286
Meeker WQ (1987) Limited failure population life tests: application to integrated circuit reliability. Technometrics 29(1):51–65
Mudholkar GS, Hutson AD (1996) The exponentiated Weibull family: some properties and a flood data application. Commun Stat Theory Methods 25(12):3059–3083
Mudholkar GS, Srivastava DK (1993) Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Reliab 42(2):299–302
Musta E, Patilea V, Van Keilegom I (2020) A presmoothing approach for estimation in mixture cure models. ar**v:2008.05338
Nadarajah S, Cordeiro GM, Ortega EM (2013) The exponentiated Weibull distribution: a survey. Stat Pap 54(3):839–877
Nielsen SF (2000) The stochastic EM algorithm: estimation and asymptotic results. Bernoulli 6(3):457–489
Pal M, Ali MM, Woo J (2006) Exponentiated Weibull distribution. Statistica (Bologna) 66(2):139–147
Pal S (2021) A simplified stochastic EM algorithm for cure rate model with negative binomial competing risks: an application to breast cancer data A simplified stochastic EM algorithm for cure rate model with negative binomial competing risks: An application to breast cancer data. Stat Med 40:6387–6409
Pal S, Balakrishnan N (2016) Destructive negative binomial cure rate model and EM-based likelihood inference under Weibull lifetime. Stat Probab Lett 116:9–20
Pal S, Balakrishnan N (2017) An EM type estimation procedure for the destructive exponentially weighted Poisson regression cure model under generalized gamma lifetime. J Stat Comput Simul 87(6):1107–1129
Pal S, Balakrishnan N (2017) Expectation maximization algorithm for Box–Cox transformation cure rate model and assessment of model misspecification under Weibull lifetimes. IEEE J Biomed Health Inform 22:926–934
Pal S, Balakrishnan N (2017) Likelihood inference for COM-Poisson cure rate model with interval-censored data and Weibull lifetimes. Stat Methods Med Res 26:2093–2113
Pal S, Balakrishnan N (2017) Likelihood inference for the destructive exponentially weighted Poisson cure rate model with Weibull lifetime and an application to melanoma data. Comput Stat 32:429–449
Pal S, Balakrishnan N (2018) Likelihood inference based on EM algorithm for the destructive length-biased Poisson cure rate model with Weibull lifetime. Commun Stat Simul Comput 47:644–660
Pal S, Majakwara J, Balakrishnan N (2018) An EM algorithm for the destructive COM-Poisson regression cure rate model. Metrika 81(2):143–171
Pal S, Roy S (2020) A new non-linear conjugate gradient algorithm for destructive cure rate model and a simulation study: illustration with negative binomial competing risks. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2020.1819321
Pal S, Roy S (2021) On the estimation of destructive cure rate model: a new study with exponentially weighted Poisson competing risks. Stat Neerl 75(3):324–342
Patilea V, Van Keilegom I (2020) A general approach for cure models in survival analysis. Ann Stat 48:2323–2346
Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56(1):237–243
Peng Y, Yu B (2021) Cure models: methods, applications and implementation. Chapman and Hall/CRC, London
Rodrigues J, de Castro M, Balakrishnan N, Cancho VG (2011) Destructive weighted Poisson cure rate models. Lifetime Data Anal 17:333–346
Rodrigues J, de Castro M, Cancho VG, Balakrishnan N (2009) COM-Poisson cure rate survival models and an application to a Cutaneous Melanoma data. J Stat Plan Inference 139:3605–3611
Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model estimation in a cox proportional hazards cure model. Biometrics 56:227–236
Taylor JMG (1995) Semi-parametric estimation in failure time mixture models. Biometrics 51:899–907
Tsodikov A, Ibrahim J, Yakovlev A (2003) Estimating cure rates from survival data. J Am Stat Assoc 98:1063–1078
Wang P, Pal S (2022) A two-way flexible generalized gamma transformation cure rate model. Stat Med 41(13):2427–2447
Wiangnak P, Pal S (2018) Gamma lifetimes and associated inference for interval-censored cure rate model with COM-Poisson competing cause. Commun Stat Theory Methods 47(6):1491–1509
Yakovlev AY, Tsodikov AD, Asselain B (1996) Stochastic models of tumor latency and their biostatistical applications (1). World Scientific, Singapore
Ye Z, Ng HKT (2014) On analysis of incomplete field failure data On analysis of incomplete field failure data. Ann Appl Stat 8(3):1713–1727
Yin G, Ibrahim JG (2005) Cure rate models: a unified approach. Can J Stat 33(4):559–570
Zeng D, Yin G, Ibrahim JG (2006) Semiparametric transformation models for survival data with a cure fraction. J Am Stat Assoc 101:670–684
Acknowledgements
The authors express their thanks to the Guest Editor and three anonymous reviewers for their careful reviews and useful comments and suggestions on an earlier version of this manuscript which led to this improved version.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix: Development of the EM Algorithm
Appendix: Development of the EM Algorithm
To implement the EM algorithm, we define the complete data likelihood function as:
where \({\varvec{\eta }}=\left( \eta _1, \dots , \eta _n\right) ^{\tiny \mathrm T}\) and \(\pi _0({\varvec{x}}^*_i; {\varvec{\beta }})=\left\{ 1+e^{{\varvec{x}}_i^{\tiny \mathrm T} {\varvec{\beta }}}\right\} ^{-1}\) is the cure rate. Equivalently, the expression for the complete data log-likelihood function is obtained as:
For the mixture cure rate model, the expression given in (33) takes the following form:
1.1 Steps Involved in the EM Algorithm
Begin the iterative process by considering an initial estimate \({\varvec{\theta }}^{(0)}=\left( {\varvec{\beta }}^{(0)}, \alpha ^{(0)}, k^{(0)}, \lambda ^{(0)}\right) ^{\tiny \mathrm T}\) of \({\varvec{\theta }}\). The choice of \({\varvec{\theta }}^{(0)}\) requires justifications based on background knowledge and some sample real-life data. For \(r=1, 2, \dots \), let \({\varvec{\theta }}^{(r)}\) be the estimate of \({\varvec{\theta }}\) at the r-th step of the iteration. Then, \({\varvec{\theta }}^{(r+1)}\) is obtained using the following steps:
-
1.
E-Step Find the conditional expectation \(Q\left( {\varvec{\theta }}; {\varvec{\theta }}^{(r)} \right) = E\left\{ l_{C}({\varvec{\theta }}; {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}}, {\varvec{\eta }})| \left( {\varvec{\theta }}^{(r)},\right. \right. \left. \left. {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}} \right) \right\} \), which is given by
$$\begin{aligned} Q\left( {\varvec{\theta }}; {\varvec{\theta }}^{(r)} \right)&= \text {constant}+ n_1 (\log \alpha +\log k - k \log \lambda ) + (k-1)\sum _{i \in \Delta _1} \log t_i \nonumber \\&\quad - \sum _{i \in \Delta _1} \left( \frac{t_i}{\lambda }\right) ^k + \sum _{i \in \Delta _1} (\alpha -1) \log \left\{ 1 - e^{-(t_i/\lambda )^k}\right\} \nonumber \\&\quad + \sum _{i \in \Delta _1} {\varvec{x}}_i^{\tiny \mathrm T} {\varvec{\beta }}- \sum _{i=1}^n \log \left( 1+ e^{{\varvec{x}}_i^{\tiny \mathrm T} {\varvec{\beta }}} \right) \nonumber \\&\quad + \sum _{i \in \Delta _0} E\left\{ \eta _i \big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}} \right) \right\} e^{{\varvec{x}}_i^{\tiny \mathrm T} {\varvec{\beta }}} \nonumber \\&\quad + \sum _{i \in \Delta _0} E\left\{ \eta _i \big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}} \right) \right\} \log \left\{ 1- \left[ 1 - e^{-(t_i/\lambda )^k}\right] ^{\alpha } \right\} , \end{aligned}$$(35)where
$$\begin{aligned}&E\left\{ \eta _i \big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}} \right) \right\} \nonumber \\&\quad = P\left\{ \eta _i = 1\big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{t}}, {\varvec{\delta }}, {\varvec{X}} \right) \right\} \nonumber \\&\quad =P\left\{ \eta _i = 1\big \vert \left( {\varvec{\theta }}^{(r)}, Y_i>t_i, {\varvec{x}}^*_i, i \in \Delta _0 \right) \right\} \nonumber \\&\quad =\frac{P\left\{ Y_i>t_i\big \vert \left( \eta _i=1, {\varvec{\theta }}^{(r)}, {\varvec{x}}^*_i, i \in \Delta _0 \right) \right\} P\left\{ \eta _i=1 \big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{x}}^*_i, i \in \Delta _0\right) \right\} }{P\left\{ Y_i>t_i\big \vert \left( {\varvec{\theta }}^{(r)}, {\varvec{x}}^*_i, i \in \Delta _0 \right) \right\} } \nonumber \\&\quad =\frac{S_p\left( t_i; {\varvec{\theta }}^{(r)}, \delta _i, {\varvec{x}}^*_i\right) - \pi _0\left( {\varvec{x}}^*_i; {\varvec{\beta }}^{(r)}\right) }{S_p\left( t_i; {\varvec{\theta }}^{(r)}, \delta _i, {\varvec{x}}^*_i\right) }\nonumber \\&\quad =1- \frac{ \pi _0\left( {\varvec{x}}^*_i; {\varvec{\beta }}^{(r)}\right) }{S_p\left( t_i; {\varvec{\theta }}^{(r)}, \delta _i, {\varvec{x}}^*_i\right) }. \end{aligned}$$(36) -
2.
M-Step Find
$$\begin{aligned} {\varvec{\theta }}^{(r+1)}=\left( {\varvec{\beta }}^{(r+1)}, \alpha ^{(r+1)}, k^{(r+1)}, \lambda ^{(r+1)}\right) ^{\tiny \mathrm T} = \underset{{\varvec{\theta }}}{{\arg \max }} \text { }Q\left( {\varvec{\theta }}; {\varvec{\theta }}^{(r)} \right) . \end{aligned}$$(37)The maximization step can be carried out using multidimensional unconstrained optimization methods such as Nelder–Mead simplex search algorithm or quasi Newton methods such as BFGS algorithm. These algorithms are available in statistical software R version 4.0.3 under General Purpose Optimization package called optimr().
-
3.
Convergence Check if the stop** or convergence criterion for the iterative process is met. For our analysis, we consider that the EM algorithm has converged to a local maxima if
$$\begin{aligned} \underset{1 \le k' \le d+4}{{\max }}\text { }{\left| \frac{\theta ^{(r+1)}_{k'}- \theta ^{(r)}_{k'}}{\theta ^{(r)}_{k'}} \right| < \epsilon }, \end{aligned}$$(38)where \(\theta ^{(r)}_{k'}\) and \(\theta ^{(r+1)}_{k'}\) are the \(k'\)-th component of \({\varvec{\theta }}^{(r)}\) and \({\varvec{\theta }}^{(r+1)}\), respectively, and \(\epsilon \) is a tolerance such as 0.001.
If the condition in (38) is satisfied, then the iterative process is stopped and \({\varvec{\theta }}^{(r)}\) is considered as the ML estimate of \({\varvec{\theta }}\).
Rights and permissions
About this article
Cite this article
Pal, S., Barui, S., Davies, K. et al. A Stochastic Version of the EM Algorithm for Mixture Cure Model with Exponentiated Weibull Family of Lifetimes. J Stat Theory Pract 16, 48 (2022). https://doi.org/10.1007/s42519-022-00274-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-022-00274-8