Abstract
Case–cohort studies are commonly used in various investigations, and many methods have been proposed for their analyses. However, most of the available methods are for right-censored data or assume that the censoring is independent of the underlying failure time of interest. In addition, they usually apply only to a specific model such as the Cox model that may often be restrictive or violated in practice. To relax these assumptions, we discuss regression analysis of interval-censored data, which arise more naturally in case–cohort studies than and include right-censored data as a special case, and propose a two-step inverse probability weighting estimation procedure under a general class of semiparametric transformation models. Among other features, the approach allows for informative censoring. In addition, an EM algorithm is developed for the determination of the proposed estimators and the asymptotic properties of the proposed estimators are established. Simulation results indicate that the approach works well for practical situations and it is applied to a HIV vaccine trial that motivated this investigation.
Similar content being viewed by others
References
Chen, K., **, Z., Ying, Z.: Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–668 (2002)
Chen, K., Lo, S.H.: Case–cohort and case–control analysis with Cox’s model. Biometrika 86, 755–764 (1999)
Chen, K., Sun, L., Tong, X.: Analysis of cohort survival data with transformation model. Stat. Sin. 22, 489–508 (2012)
Chen, Y.H., Zucker, D.M.: Case–cohort analysis with semiparametric transformation models. J. Stat. Plan. Inference 139, 3706–3717 (2009)
Cheng, S.C., Wei, L.J., Ying, Z.: Analysis of transformation models with censored data. Biometrika 82, 835–845 (1995)
Du, M., Zhou, Q., Zhao, S., Sun, J.: Regression analysis of case–cohort studies in the presence of dependent interval censoring. J. Appl. Stat. 48(5), 846–865 (2021)
Fine, J.P., Ying, Z., Wei, L.J.: On the linear transformation model for censored data. Biometrika 85, 980–986 (1998)
Fong, Y., Shen, X., Ashley, V.C., et al.: Modification of the association between T-cell immune responses and human immunodeficiency virus type 1 infection risk by vaccine-induced antibody responses in the HVTN 505 trial. J. Infect. Dis. 217, 1280–1288 (2018)
Gilbert, P.B., Peterson, M.L., Follmann, D., Hudgens, M.G., Francis, D.P., Gurwith, M., Heyward, W.L., Jobes, D.V., Popovic, V., Self, S.G., et al.: Correlation between immunologic responses to a recombinant glycoprotein 120 vaccine and incidence of HIV-1 infection in a phase 3 HIV-1 preventive vaccine trial. J. Infect. Dis. 191, 666–677 (2005)
Hammer, S.M., Sobieszczyk, M.E., Janes, H., et al.: HVTN 505 study team: efficacy trial of a DNA/rAd5 HIV-1 preventive vaccine. N. Engl. J. Med. 369, 2083–2092 (2013)
Huang, X., Wolfe, R.: A frailty model for informative censoring. Biometrics 58, 510–520 (2002)
Janes, H.E., Cohen, K.W., Frahm, N., et al.: Higher T-cell responses induced by DNA/rAd5 HIV-1 preventive vaccine are associated with lower HIV-1 infection risk in an efficacy trial. J. Infect. Dis. 215, 1376–1385 (2017)
Jewell, N.P., van der Laan, M.J.: Case–control current status data. Biometrika 91, 529–541 (2004)
Kang, S., Cai, J.: Marginal hazards model for case–cohort studies with multiple disease outcomes. Biometrika 96, 887–901 (2009)
Keogh, R.H., White, I.R.: Using full-cohort data in nested case–control and case–cohort studies by multiple imputation. Stat. Med. 32, 4021–4043 (2013)
Kim, S., Cai, J., Lu, W.: More efficient estimators for case–cohort studies. Biometrika 100, 695–708 (2013)
Li, S., Hu, T., Zhao, S., Sun, J.: Regression analysis of multivariate current status data with semiparametric transformation frailty models. Stat. Sin. 30, 1117–1134 (2020)
Li, Z., Nan, B.: Relative risk regression for current status data in case–cohort studies. Can. J. Stat. 39, 557–577 (2011)
Lu, W.B., Liu, M.: On estimation of linear transformation models with nested case–control sampling. Lifetime Data Anal. 18, 80–93 (2012)
Lu, W.B., Tsiatis, A.A.: Semiparametric transformation models for the case–cohort study. Biometrika 93, 207–214 (2006)
Ma, L., Hu, T., Sun, J.: Cox regression analysis of dependent interval-censored failure time data. Comput. Stat. Data Anal. 103, 79–90 (2016)
Ma, S., Kosorok, M.R.: Robust semiparametric M-estimation and the weighted bootstrap. J. Multivar. Anal. 96, 190–217 (2005)
Marti, H., Chavance, M.: Multiple imputation analysis of case–cohort studies. Stat. Med. 30, 1595–1607 (2011)
Prentice, R.L.: A case–cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986)
Self, S.G., Prentice, R.L.: Asymptotic distribution theory and efficiency results for case–cohort studies. Ann. Stat. 16, 64–81 (1988)
Sun, J.: A nonparametric test for current status data with unequal censoring. J. R. Stat. Soc. B 61, 243–250 (1999)
Sun, J.: The Statistical Analysis of Interval-Censored Failure Time Data. Springer, New York (2006)
Wang, L.M., McMahan, C.S., Hudgens, M.G., Qureshi, Z.P.: A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 72, 222–231 (2016)
Wang, M.C., Qin, J., Chiang, C.T.: Analyzing recurrent event data with informative censoring. J. Am. Stat. Assoc. 96, 1057–1065 (2001)
Wang, P., Zhao, H., Du, M.Y., Sun, J.: Inference on semiparametric transformation model with general interval-censored failure time data. J. Nonparametr. Stat. 30, 753–758 (2018)
Wang, P., Zhao, H., Sun, J.: Regression analysis of case K interval-censored failure time data in the presence if informative censoring. Biometrics 72, 1103–1112 (2016)
Zeng, D., Gao, F., Lin, D.Y.: Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104, 505–525 (2017)
Zeng, D., Mao, L., Lin, D.Y.: Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–271 (2016)
Zhao, X., Zhou, J., Sun, L.: Semiparametric transformation models with time-varying coefficients for recurrent and terminal events. Biometrics 67, 401–414 (2011)
Zhou, Q., Zhou, H., Cai, J.: Case–cohort studies with interval-censored failure time data. Biometrika 104, 17–29 (2017)
Acknowledgements
The authors wish to thank two reviewers for their many insightful and helpful comments and suggestions that greatly improved the paper. We also want to thank Dr. Peter Gibert for providing the HVTN 505 Vaccine Trial data. The first author’s work was partially supported by the National Natural Science Foundation of China grant # 12101522 and the second author’s work in part by the U.S. National Science Foundation grant DMS1916170.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of Theorems 4.1 and 4.2
Appendix: Proofs of Theorems 4.1 and 4.2
In the following, we will sketch the proofs of Theorems 4.1 and 4.2. Let \({\mathbb {P}}_n\) denote the empirical measure for n independent observations, \({\mathbb {P}}\) the true probability measure, and \({\mathbb {G}}_n=n^{1/2}({\mathbb {P}}_n-{\mathbb {P}})\) the empirical process. Let \(l(\beta ,\Lambda |u)\) be the log-likelihood for a single subject based on the complete data O, given by
and let \(l^w(\beta ,\Lambda |u) = w \,l(\beta ,\Lambda |u)\) be the weighted log-likelihood for a single subject based on the observed data \(O^\xi \) under the case–cohort design, where the weight w is given by \(w=\xi /\pi _q(\delta _1,\ldots ,\delta _{K+1})\). Since \(E(w|\delta _1,\ldots ,\delta _{K+1})=1\), we have \({\mathbb {P}}\{l^w(\beta ,\Lambda |u)\}={\mathbb {P}}\{l(\beta ,\Lambda |u)\}\).
Proof of Theorem 4.1
We first show that \(\limsup _n{\hat{\Lambda }}(\tau _0-\epsilon )<\infty \) with probability 1 for any \(\epsilon >0\). By the definition of \(({{\hat{\beta }}},{{\hat{\Lambda }}})\), we have
From the consistency of \(({{\hat{\alpha }}},{\hat{\Lambda }}_{h})\) established by Wang et al. [29], we can show that
with probability 1. Define \(u(\alpha ,\Lambda _h;\tau ,K,z)=\log \{K/[\Lambda _h(\tau )\exp (\alpha ^T z)]\}\). Let \(\eta >0\) be such that \(\exp \{\beta _1^T z + \beta _2 u(\alpha ,\Lambda _h;\tau ,K,z)\}\ge \eta \) for \(\beta \in {\mathcal {B}}\), \(\alpha \in {\mathcal {A}}\), \(\tau \in [\zeta _0,\tau _0]\), \(1\le K \le k_0\), and nondecreasing functions \(\Lambda _h\) such that \(\Lambda _h(\zeta _0)\ge \Lambda _{0h}(\zeta _0)-c_0>0\) and \(\Lambda _h(\tau _0)\le 1\), where \(k_0>1\) and \(c_0\) are positive constants. Then, we have
Hence,
Note that as \(n\rightarrow \infty \), \({\mathbb {P}}_n \{w \delta _{K+1} I(1\le K\le k_0, U_K\ge \tau _0-\epsilon )\}\rightarrow {\mathbb {P}} \{w \delta _{K+1} I(1\le K\le k_0, U_K\ge \tau _0-\epsilon )\}\), which is positive under Condition (C3). Thus, by Condition (C4), \(\limsup _n{\hat{\Lambda }}(\tau _0-\epsilon )<\infty \) with probability 1 for any \(\epsilon >0\). By Helly’s selection theorem and arguing as in the proof of Theorem 4.1 of Zeng et al. [32], for any subsequence of \(({{\hat{\beta }}},{{\hat{\Lambda }}})\), we can choose a further subsequence such that \({{\hat{\Lambda }}}\) converges weakly to some function \(\Lambda ^*\) on \([0,\tau _0]\) almost everywhere and \({{\hat{\beta }}}\) converges to some constant \(\beta ^*\). The remaining is to show \((\beta ^*,\Lambda ^*)=(\beta _0,\Lambda _0)\).
Define
where \(p(\beta ,\Lambda |u)=\exp (l(\beta ,\Lambda |u))\). Since \({\mathbb {P}}_n l^w({{\hat{\beta }}},{{\hat{\Lambda }}}|{\hat{u}})\ge {\mathbb {P}}_n l^w(\beta _0,\Lambda _0|{\hat{u}})\), we have
and thereby
Arguing as in Zeng et al. [32], we can show that \({\mathcal {M}} = \{m(\beta ,\Lambda |u(\alpha ,\Lambda _h;\tau ,K,z)):\, \beta \in {\mathcal {B}},\,\alpha \in {\mathcal {A}},\,\Lambda \in {\mathcal {L}},\,\Lambda _h\in {\mathcal {L}}_h\}\) is a Glivenko–Cantelli class, where \({\mathcal {L}}\) is the set of nondecreasing functions \(\Lambda \) on \([0,\tau _0]\) satisfying \(\Lambda (0)=0\) and \({\mathcal {L}}_h\) is the set of nondecreasing functions \(\Lambda _h\) on \([0,\tau _0]\) satisfying \(\Lambda _h(0)=0\), \(\Lambda _h(\zeta _0)\ge \Lambda _{0h}(\zeta _0)-c_0>0\) for some positive constant \(c_0\) and \(\Lambda _h(\tau _0)\le 1\). Furthermore, based on the asymptotic properties of \(({{\hat{\alpha }}},{\hat{\Lambda }}_h)\) established by Wang et al. [29], we can show that \({\mathbb {P}}_n m(\beta ,\Lambda |{\hat{u}})\) converges to \({\mathbb {P}} m(\beta ,\Lambda |u)\) almost surely for any fixed \((\beta ,\Lambda )\). Therefore, we have \({\mathbb {P}} m(\beta ^*,\Lambda ^*|u)\ge {\mathbb {P}} m(\beta _0,\Lambda _0|u)\) and further
By the properties of the Kullback–Leibler information, \(p(\beta ^*,\Lambda ^*|u)=p(\beta _0,\Lambda _0|u)\) with probability 1. Thus, for any \(t\in [0,\tau _0]\), \(\log \{\Lambda ^*(t)\}+\beta _1^{*T} z +\beta _2^* u=\log \{\Lambda _0(t)\}+\beta _{01}^T z +\beta _{02} u\). Under Condition (C2), we obtain \(\beta ^*=\beta _0\) and \(\Lambda ^*=\Lambda _0\). This completes the proof. \(\square \)
Proof of Theorem 4.2
Let \(\beta =(\beta _1^T,\beta _2)^T\) and \(x=(z^T,u)^T\). The score function for \(\beta \) based on the log-likelihood \(l(\beta ,\Lambda |u)\) is
where
The score function for \(\beta \) based on the weighted log-likelihood \(l^w(\beta ,\Lambda |u)\) is given by
To obtain the score operator for \(\Lambda \), we consider a parametric submodel of \(\Lambda \) defined by \(d\Lambda _{\epsilon ,h}=(1+\epsilon h)d\Lambda \) for \(h\in L_2([0,\tau _0])\). The score function along this submodel based on the log-likelihood \(l(\beta ,\Lambda |u)\) is
The score function along this submodel based on the weighted log-likelihood \(l^w(\beta ,\Lambda |u)\) is
By the definition of \(({\hat{\beta }},{\hat{\Lambda }})\), we have \({\mathbb {P}}_n\{l_\beta ^w({\hat{\beta }},{\hat{\Lambda }}|{\hat{u}})\}=0\) and \({\mathbb {P}}_n\{l_\Lambda ^w({\hat{\beta }},{\hat{\Lambda }}|{\hat{u}})(h)\}=0\). Also, \({\mathbb {P}}\{l_\beta ^w(\beta _0,\Lambda _0|u)\}={\mathbb {P}}\{l_\beta (\beta _0,\Lambda _0|u)\}=0\) and \({\mathbb {P}}\{l_\Lambda ^w(\beta _0,\Lambda _0|u)(h)\}={\mathbb {P}}\{l_\Lambda (\beta _0,\Lambda _0|u)(h)\}=0\). Therefore,
and
We first consider \({\mathbb {P}}_n\{{l}_\beta ^w(\beta ,\Lambda |{\hat{u}})\}-{\mathbb {P}}\{l_\beta (\beta ,\Lambda |u)\}\) and \({\mathbb {P}}_n\{{l}_\Lambda ^w(\beta ,\Lambda |{\hat{u}})(h)\}-{\mathbb {P}}\{l_\Lambda (\beta ,\Lambda |u)(h)\}\) for fixed \((\beta ,\Lambda )\). Define the functions \(H(t)=E[\exp (u) I(\tau \ge t)]\), \(R(t)=H(t)\Lambda _{0h}(t)\), \(Q(t)=\int _0^t H(s)\textrm{d}\Lambda _{0h}(s)\), and for \(i=1,\ldots ,n\),
In addition, for \(i=1,\ldots ,n\), define
where \({\tilde{z}}_i=(1,z_i^T)^T\), \(\gamma =(\log \{E[\exp (u)]\},\alpha ^T)^T\), \({\tilde{w}}_i\) is the weight given in the estimating equations for \(\alpha \), and \({\mathcal {P}}(\cdot )\) denotes the joint probability measure of \(({\tilde{w}},{\tilde{z}},K,\tau )\). From Wang et al. [29], we have \({\hat{\Lambda }}_h(t)-\Lambda _{0h}(t)=n^{-1}\sum _{i=1}^n \Lambda _{0h}(t)b_i(t)+o_p(n^{-1/2})\) for \(\inf \{s:\Lambda _{0h}(s)>0\}<t<\tau _0\) and \({\hat{\alpha }}-\alpha _0=n^{-1}\sum _{i=1}^n f_i(\alpha _0)+o_p(n^{-1/2})\), where \(f_i(\alpha )=E[-\partial e_1/\partial \gamma ]^{-1}e_i\) without the first entry. Define the function \(u(\alpha ,\Lambda _h;\tau ,K,z)=\log \{K/[\Lambda _h(\tau )\exp (\alpha ^T z)]\}\). Then \({\hat{u}}=u({\hat{\alpha }},{\hat{\Lambda }}_h;\tau ,K,z)\). Furthermore, define
and
Then, we have
The \(c_{\beta i}(\beta ,\Lambda )\)’s are independent random variables because \(c_{\beta i}(\beta ,\Lambda )\) depends only on the observed data from the ith subject. It follows from the law of large numbers that for fixed \((\beta ,\Lambda )\), \({\mathbb {P}}_n\{{l}_\beta ^w(\beta ,\Lambda |{\hat{u}})\}-{\mathbb {P}}\{l_\beta (\beta ,\Lambda |u)\}\rightarrow 0\) almost surely as \(n\rightarrow \infty \). Furthermore, by the central limit theorem, \(n^{1/2}[{\mathbb {P}}_n\{{l}_\beta ^w(\beta ,\Lambda |{\hat{u}})\}-{\mathbb {P}}\{l_\beta (\beta ,\Lambda |u)\}]\) converges in distribution to a zero-mean normal random vector. Similarly, we can derive the asymptotic properties of \({\mathbb {P}}_n\{{l}_\Lambda ^w(\beta ,\Lambda |{\hat{u}})(h)\}-{\mathbb {P}}\{l_\Lambda (\beta ,\Lambda |u)(h)\}\). In particular, define
and
Then, we have
The \(c_{\Lambda i}(\beta ,\Lambda )(h)\)’s are independent random variables because \(c_{\Lambda i}(\beta ,\Lambda )(h)\) depends only on the observed data from the ith subject. By the law of large numbers, \({\mathbb {P}}_n\{{l}_\Lambda ^w(\beta ,\Lambda |{\hat{u}})(h)\}-{\mathbb {P}}\{l_\Lambda (\beta ,\Lambda |u)(h)\}\rightarrow 0\) almost surely as \(n\rightarrow \infty \), for fixed \((\beta ,\Lambda )\). By the central limit theorem, \(n^{1/2}[{\mathbb {P}}_n\{{l}_\Lambda ^w(\beta ,\Lambda |{\hat{u}})(h)\}-{\mathbb {P}}\{l_\Lambda (\beta ,\Lambda |u)(h)\}]\) converges in distribution to a zero-mean normal random vector.
On the other hand, arguing as in the proof of Theorem 2 of Zeng et al. [33], we can show that
where \(l_\beta =l_\beta (\beta _0,\Lambda _0|u)\), \(l_\Lambda (h^*)=l_\beta (\beta _0,\Lambda _0|u)(h^*)\), and \(h^*\) is the least favourable direction, a \((p+1)\)-vector with components in \(L_2([0,\tau _0])\), that solves the normal equation \(l_\Lambda ^*l_\Lambda (h^*)=l_\Lambda ^*l_\beta \) with \(l_\Lambda ^*\) being the adjoint operator of \(l_\Lambda \). The existence of \(h^*\) can be established as in Zeng et al. [33]. From Eqs. (8.3)–(8.5), the difference between (8.1) and (8.2) yields
The left-hand side of (8.6) can be written as \({\mathbb {G}}_n\{c_{\beta }({\hat{\beta }},{\hat{\Lambda }})-c_{\Lambda }({\hat{\beta }},{\hat{\Lambda }})(h^*)\}+o_p(1)\). As argued in Zeng et al. [26], we can show that \(h^*(t)\) is continuously differentiable on \([0,\tau _0]\), and further we are able to prove that \(c_{\beta }({\hat{\beta }},{\hat{\Lambda }})-c_{\Lambda }({\hat{\beta }},{\hat{\Lambda }})(h^*)\) belongs to a Donsker class and converges in the \(L_2({\mathbb {P}})\)-norm to \(c_{\beta }-c_{\Lambda }(h^*)\), where \(c_{\beta }\) and \(c_{\Lambda }(h^*)\) are evaluated at \((\beta _0,\Lambda _0)\). In addition, it is easy to show via proof by contradiction that the matrix \(E[\{l_\beta -l_\Lambda (h^*)\}\{l_\beta -l_\Lambda (h^*)\}^T]\) is invertible. Therefore, (8.6) entails \(n^{1/2}({{\hat{\beta }}}-\beta _0)=O_p(1)\) and yields
This implies that \(n^{1/2}({{\hat{\beta }}}-\beta _0)\) converges to a zero-mean normal random vector. \(\square \)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Du, M., Zhou, Q. Analysis of Informatively Interval-Censored Case–Cohort Studies with the Application to HIV Vaccine Trials. Commun. Math. Stat. (2023). https://doi.org/10.1007/s40304-022-00322-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40304-022-00322-6