Abstract
In longitudinal studies, it is of interest to investigate how repeatedly measured markers are associated with time to an event. Joint models have received increasing attention on analyzing such complex longitudinal–survival data with multiple data features, but most of them are mean regression-based models. This paper formulates a quantile regression (QR) based joint models in general forms that consider left-censoring due to the limit of detection, covariates with measurement errors and skewness. The joint models consist of three components: (i) QR-based nonlinear mixed-effects Tobit model using asymmetric Laplace distribution for response dynamic process; (ii) nonparametric linear mixed-effects model with skew-normal distribution for mismeasured covariate; and (iii) Cox proportional hazard model for event time. For the purpose of simultaneously estimating model parameters, we propose a Bayesian method to jointly model the three components which are linked through the random effects. We apply the proposed modeling procedure to analyze the Multicenter AIDS Cohort Study data, and assess the performance of the proposed models and method through simulation studies. The findings suggest that our QR-based joint models may provide comprehensive understanding of heterogeneous outcome trajectories at different quantiles, and more reliable and robust results if the data exhibits these features.
Similar content being viewed by others
References
Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96(1):93–116
Arellano-Valle R, Bolfarine H, Lachos V (2007) Bayesian inference for skew-normal linear mixed models. J Appl Stat 34(6):663–682
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc 61(3):579–602
Brown ER (2009) Assessing the association between trends in a biomarker and risk of event with an application in pediatric HIV/AIDS. Ann Appl Stat 3(3):1163
Brown ER, Ibrahim JG (2003) A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics 59(2):221–228
Brown ER, Ibrahim JG, DeGruttola V (2005) A flexible b-spline model for multiple longitudinal biomarkers and survival. Biometrics 61(1):64–73
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca Raton
Chen Q, May RC, Ibrahim JG, Chu H, Cole SR (2014) Joint modeling of longitudinal and survival data with missing and left-censored time-varying covariates. Stat Med 33(26):4560–4576
Clayton DG (1991) A Monte Carlo method for Bayesian inference in frailty models. Biometrics 47(2):467–485
Dagne GA, Huang Y (2011) Mixed-effects Tobit joint models for longitudinal data with skewness, detection limits, and measurement errors. J Probab Stat 2012:1–19
Dagne G, Huang Y (2012) Bayesian inference for a nonlinear mixed-effects Tobit model with multivariate skew-t distributions: application to AIDS studies. Int J Biostat 8(1)
Davidian M, Giltinan DM (1995) Nonlinear models for repeated measurement data, vol 62. CRC Press, Boca Raton
Davino C, Furno M, Vistocco D (2013) Quantile regression: theory and applications. Wiley, Hoboken
Elashoff RM, Li G, Li N (2008) A joint model for longitudinal measurements and survival data in the presence of multiple failure types. Biometrics 64(3):762–771
Farcomeni A (2012) Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat Comput 22(1):141–152
Farcomeni A, Viviani S (2015) Longitudinal quantile regression in the presence of informative dropout through longitudinal–survival joint modeling. Stat Med 34(7):1199–1213
Ganjali M, Baghfalaki T (2015) A copula approach to joint modeling of longitudinal measurements and survival times using monte carlo expectation-maximization with application to aids studies. J Biopharm Stat 25(5):1077–1099
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis, vol 2. CRC Press, Boca Raton
Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. CRC Press, Boca Raton
Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric laplace distribution. Biostatistics 8(1):140–154
He X, Fu B, Fung WK (2003) Median regression for longitudinal data. Stat Med 22(23):3655–3669
Henderson R, Diggle P, Dobson A (2000) Joint modelling of longitudinal measurements and event time data. Biostatistics 1(4):465–480
Hu W, Li G, Li N (2009) A Bayesian approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med 28(11):1601–1619
Huang Y (2016) Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. J Stat Comput Simul 86(6):1183–1202
Huang Y, Chen J (2016) Bayesian quantile regression-based nonlinear mixed-effects joint models for time-to-event and longitudinal data with multiple features. Stat Med 35(30):5666–5685
Huang Y, Dagne G (2011) A Bayesian approach to joint mixed-effects models with a skew-normal distribution and measurement errors in covariates. Biometrics 67(1):260–269
Huang Y, Liu D, Wu H (2006) Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62(2):413–423
Huang Y, Dagne G, Wu L (2011) Bayesian inference on joint models of HIV dynamics for time-to-event and longitudinal data with skewness and covariate measurement errors. Stat Med 30(24):2930–2946
Jara A, Quintana F, San Martín E (2008) Linear mixed models with skew-elliptical distributions: a Bayesian approach. Comput Stat Data Anal 52(11):5033–5045
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distribution, vol 2, 2nd edn. Wiley, New York
Kaslow RA, Ostrow DG, Detels R, Phair JP, Polk BF, Rinaldo CR (1987) The Multicenter AIDS Cohort Study: rationale, organization, and selected characteristics of the participants. Am J Epidemiol 126(2):310–318
Kim MO, Yang Y (2012) Semiparametric approach to a random effects quantile regression model. J Am Stat Assoc 106(496):1405–1417
Kobayashi G, Kozumi H (2012) Bayesian analysis of quantile regression for censored dynamic panel data. Comput Stat 27(2):359–380
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89
Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge
Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46:33–50
Koenker R, Machado JA (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94(448):1296–1310
Kotz S, Kozubowski TJ, Podgórski K (2002) Maximum likelihood estimation of asymmetric Laplace parameters. Ann Inst Stat Math 54(4):816–826
Kotz S, Kozubowski TJ, Podgórski K (2001) Asymmetric multivariate Laplace distribution. In: The Laplace distribution and generalizations. Springer, New York, pp 239–272
Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578
Lipsitz SR, Fitzmaurice GM, Molenberghs G, Zhao LP (1997) Quantile regression methods for longitudinal data with drop-outs: application to CD4 cell counts of patients infected with the human immunodeficiency virus. J R Stat Soc 46(4):463–476
Liu Y, Bottai M (2009) Mixed-effects models for conditional quantiles with longitudinal data. Int J Biostat 5(1)
Liu W, Wu L (2007) Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics 63(2):342–350
Lunn DJ, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10(4):325–337
Luo Y, Lian H, Tian M (2012) Bayesian quantile regression for longitudinal data models. J Stat Comput Simul 82(11):1635–1649
Perelson AS, Essunger P, Cao Y, Vesanen M, Hurley A, Saksela K, Markowitz M, Ho DD (1997) Decay characteristics of HIV-1-infected compartments during combination therapy. Nature 387:188–191
Reich BJ, Fuentes M, Dunson DB (2012) Bayesian spatial quantile regression. J Am Stat Assoc 106:6–22
Rizopoulos D (2010) Jm: an R package for the joint modelling of longitudinal and time-to-event data. J Stat Softw 35(9):1–33
Rizopoulos D (2011) Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67(3):819–829
Rizopoulos D (2012) Joint models for longitudinal and time-to-event data: with applications in R. CRC Press, Boca Raton
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31(2):129–150
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc 64(4):583–639
Tang AM, Tang NS, Zhu H (2017) Influence analysis for skew-normal semiparametric joint models of multivariate longitudinal and multivariate survival data. Stat Med 36(9):1476–1490
Tian Y, Tian M (2015) Bayesian joint quantile regression for mixed effects models with censoring and errors in covariates. Comput Stat 31(3):1031–1057
Tsiatis AA, Davidian M (2004) Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica 14:809–834
Wang HJ, Fygenson M (2009) Inference for censored quantile regression models in longitudinal studies. Ann Stat 37(2):756–781
Wang Y, Taylor JMG (2001) Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. J Am Stat Assoc 96(455):895–905
Wu L (2002) A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to aids studies. J Am Stat Assoc 97(460):955–964
Wu H, Ding AA (1999) Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55(2):410–418
Wu H, Zhang JT (2006) Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches, vol 515. Wiley, Hoboken
Wu L, Liu W, Hu X (2010) Joint inference on HIV viral dynamics and immune suppression in presence of measurement errors. Biometrics 66(2):327–335
Yi G, Liu W, Wu L (2011) Simultaneous inference and bias analysis for longitudinal data with covariate measurement error and missing responses. Biometrics 67(1):67–75
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54(4):437–447
Yu K, Stander J (2007) Bayesian analysis of a Tobit quantile regression model. J Econom 137(1):260–276
Yu K, Zhang J (2005) A three-parameter asymmetric Laplace distribution and its extension. Commun Stat Theory Methods 34(9–10):1867–1879
Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. J R Stat Soc 52(3):331–350
Yuan Y, Yin G (2010) Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66(1):105–114
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have declared no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix: Skew-normal distribution and asymmetric Laplace distribution
1.1 A.1: Skew-normal distribution
Different versions of multivariate skew distributions have been proposed and used in the literature (Arellano-Valle et al. 2007; Arellano-Valle and Genton 2005; Azzalini and Capitanio 1999; Jara et al. 2008; Sahu et al. 2003). A new class of distributions by introducing skewness in multivariate elliptically distributions, referred as skew-elliptical (SE) distributions, were developed in the literature (Genton 2004; Sahu et al. 2003). The class, which is obtained by using transformation and conditioning, which contains many standard families including the multivariate skew-normal (SN) distribution as special case. A k-dimensional random vector \(\varvec{Y}\) follows a k-variate SE distribution if its probability density function (pdf) is given by
where \(\varvec{A}=\varvec{\varSigma }+\varvec{\varGamma }^2\), \(\varvec{\mu }\) is a location parameter vector, \(\varvec{\varSigma }\) is a \(k \times k\) positive (diagonal) covariance matrix, \(\varvec{\varGamma }=\text {diag}(\delta _1, \delta _2,\ldots , \delta _k)\) is a \(k \times k\) skewness matrix with the skewness parameter vector \(\varvec{\delta }=(\delta _1,\delta _2,\ldots ,\delta _k)^T\); \(\varvec{V}\) follows the elliptical distribution \(El\left( \varvec{\varGamma }\varvec{A}^{-1}(\varvec{y}-\varvec{\mu }), \varvec{I}_{k}-\varvec{\varGamma }\varvec{A}^{-1}\varvec{\varGamma }; m^{(k)}_{\nu }\right) \) and the density generator function \(m^{(k)}_{\nu }(u)=\frac{\varGamma (k/2)}{\pi ^{k/2}}\frac{m_{\nu }(u)}{\int _0^{\infty }r^{k/2-1}m_{\nu }(u)dr}\), with \(m_{\nu }(u)\) being a function such that \(\int _0^{\infty }r^{k/2-1}m_{\nu }(u)dr\) exists. The function \(m_{\nu }(u)\) provides the kernel of the original elliptical density and may depend on the parameter \(\nu \). This SE distribution is denoted by \(SE(\varvec{\mu },\varvec{\varSigma },\varvec{\varGamma };m^{(k)})\). One example of \(m_{\nu }(u)\), leading to an important special case used throughout the paper, is \(m_{\nu }(u)=\exp (-u/2)\). This expression leads to the multivariate SN distribution.
As we know, a normal distribution is a special case of an SN distribution when the skewness parameter is zero. For completeness, this Appendix briefly summarizes the multivariate SN distribution introduced by (Sahu et al. 2003) to be suitable for a Bayesian inference since it is built using the conditional method. For detailed discussions on properties of SN distribution, see publication by (Sahu et al. 2003). Assume a k-dimensional random vector \(\varvec{Y}\) follows a k variate SN distribution with location vector \(\varvec{\mu }\), \(k \times k\) positive (diagonal) covariance matrix \(\varvec{\varSigma }\) and \(k \times k\) skewness diagonal matrix \(\varvec{\varGamma }=\text {diag}(\delta _1, \delta _2,\ldots , \delta _k)\).
A k-dimensional random vector \({\varvec{Y}}\) follows a k-variate SN distribution, if its pdf is given by
where \( {\varvec{V}} \sim N_k\{\varvec{\varGamma }{\varvec{A}}^{-1}({\varvec{y}}-\varvec{\mu }), {\varvec{I}}_k-\varvec{\varGamma } {\varvec{A}}^{-1}\varvec{\varGamma }\}\), and \(\phi _k(\cdot )\) is the pdf of \(N_k(\mathbf{0 },{\varvec{I}}_k)\). We denote the above distribution by \(SN_k (\varvec{\mu },\varvec{\varSigma },\varvec{\varGamma })\). An appealing feature of Eq. (A.2) is that it gives independent marginal when \(\varvec{\varSigma }=diag(\sigma ^2_1, \sigma ^2_2,\ldots , \sigma ^2_k)\). The pdf (A.2) thus simplifies to
where \(\phi (\cdot )\) and \(\varPhi (\cdot )\) are the pdf and cdf of the standard normal distribution, respectively. The mean and covariance matrix are given by
It is noted that when \(\varvec{\delta }=\mathbf{0 }\), the SN distribution reduces to usual normal distribution. In order to have a zero mean vector, we should assume the location parameter \(\varvec{\mu }=-\sqrt{2/\pi }\varvec{\delta }\).
According to the study by (Arellano-Valle et al. 2007), if \({\varvec{Y}}\) follows \(SN_k(\varvec{\mu },\varvec{\varSigma },\varvec{\varGamma })\), it can be expressed by a convenient stochastic representation as follows.
where \({\varvec{X}}_0\) and \({\varvec{X}}_1\) are two independent \(N_k(\mathbf{0 },{\varvec{I}}_k)\) random vectors. Let \({\varvec{w}}=|{\varvec{X}}_0|\); then, \({\varvec{w}}\) follows a k-dimensional standard normal distribution \(N_k(\mathbf{0 },{\varvec{I}}_k)\) truncated in the space \({\varvec{w}}>\mathbf{0 }\). Thus, a two-level hierarchical representation of (A.5) is given by
Note that when \(\varvec{\varGamma }=\varvec{0}\), the hierarchical expression (A.6) presented for the SN distribution \(SN_k(\varvec{\mu },\varvec{\varSigma },\varvec{\varGamma })\) reduces to its counterpart for the normal distribution \(N_k(\varvec{\mu },\varvec{\varSigma })\).
1.2 A.2: Asymmetric Laplace distribution
An asymmetric distribution, referred as asymmetric Laplace distribution (ALD) which is closely related to the check function for quantile regression (QR), has been discussed in the literature (Geraci and Bottai 2007; Koenker and Machado 1999; Yu and Moyeed 2001; Yu and Zhang 2005). A random variable Y is said to follow ALD if its probability density function (pdf) with parameters \(\mu ,\sigma \) and \(\tau \) is given by
where \(\rho _{\tau }(u)=u(\tau -I(u<0))\) is the check function, \(I(\cdot )\) is the indicator function, \(0<\tau <1\) is the skewness parameter, \(\sigma >0\) is the scale parameter and \(-\infty<\mu <\infty \) is the location parameter. The range of y is \((-\infty ,~\infty )\). We denote the above distribution by ALD\((\mu ,\sigma ,\tau )\). It should be noted that the check function \(\rho _{\tau }(\cdot )\) assigns weight \(\tau \) or \(1-\tau \) to the observations greater or less than \(\mu \), respectively, and that \(Pr(y\le \mu )=\tau \). Therefore, the distribution splits along the scale parameter into two parts, one with probability \(\tau \) to the left, and one with probability (\(1-\tau \)) to the right. That is, ALD\((\mu ,\sigma ,\tau )\) is skewed to left when \(\tau >1/2\), and skewed to right when \(\tau <1/2\). When \(\tau =1/2\), ALD\((\mu ,\sigma ,\tau )\) reduces to the Laplace double exponential (or symmetric Laplace) distribution we usually call which has pdf as follows.
If \(Y \sim \) ALD\((\mu ,\sigma ,\tau )\), then \(Pr(y\le \mu )=\tau \) and \(Pr(y>\mu )=1-\tau \), which shows that the parameters \(\mu \) and \(\tau \) in ALD satisfy \(\mu \) to be the \(\tau \)th quantile of the distribution. This important feature of ALD has been generally adopted for quantile inference (Geraci and Bottai 2007; Yu and Moyeed 2001; Yu et al. 2003) and made it more popular than other asymmetric Laplace distributions (Johnson et al. 1995; Kotz et al. 2002). See (Yu and Zhang 2005) for further properties and generalizations of this distribution. It can be shown that the mean and variance of Y are given by
However, the ALD is not smooth and thus difficult to maximize its likelihood function. Fortunately, As shown by (Kotz et al. 2001) and (Kozumi and Kobayashi 2011), the ALD has various mixture representations. To develop Bayesian approach-based sampling algorithms for the QR model, we utilize a hierarchical mixture of exponential and normal distributions (Kotz et al. 2001; Kozumi and Kobayashi 2011). For \(Y \sim \) ALD\((\mu ,\sigma ,\tau )\), then Y can be decomposed as the following mixture representation.
where \(X_1\) and \(X_2\) are mutually independent, \(X_1 \sim Exp(\frac{1}{\sigma })\) with mean \(\sigma \) and \(X_2\sim N(0,1)\), \(\vartheta _1=(1-2\tau )/[\tau (1-\tau )]\) and \(\vartheta _2=2/[\tau (1-\tau )]\). This representation can transform the ALD to smooth conditional normal distribution and has been extensively utilized in the recent studies (Kobayashi and Kozumi 2012; Kozumi and Kobayashi 2011; Reich et al. 2012). Thus, a two-level hierarchical representation of (A.10) is given by
1.3 A.3: Relationship between nonlinear quantile regression and ALD
Let \(y_i\) and \(\varvec{x}_i\) denote the outcome of interest and the corresponding covariate vector for subject i (\(i=1,\ldots ,n\)), where \(y_i\) is independent scalar observations of a continuous random variable with common cumulative distribution function (cdf) \(F_{y_i}(\cdot )\). The \(\tau \)th nonlinear QR model for the response \(y_i\) given \(\varvec{x}_i\) takes the form of
where \(Q_{y_i}(\cdot )\equiv F^{-1}_{y_i}(\cdot ) \) is the inverse of cdf of \(y_i\) given \(\varvec{x}_i\) evaluated at \(\tau \) with \(0<\tau <1\), \(g(\cdot )\) is a nonlinear known function. The nonlinear regression coefficient vector \(\varvec{\beta }\) is estimated by minimizing
where \(\rho _{\tau }(\cdot )\) is the check function defined by \(\rho _{\tau }(u)=u({\tau -I(u<0)})\) and \(I(\cdot )\) denotes the indictor function. In order to highlight the \(\tau \)-distributional dependency, the parameter vector \(\varvec{\beta }\) should be indexed by \(\tau \) (i.e., \(\varvec{\beta }(\tau )\)). For sake of simplicity, however, we will omit this notation in the reminder of the paper. The check function is closely related to the ALD; see (Koenker and Machado 1999; Yu and Moyeed 2001; Yu and Stander 2007) in detail. The density function of an ALD, denoted by ALD(\(\mu , \sigma , \tau \)), is briefly discussed in Appendix. Considering \(\sigma \) a nuisance parameter, it can be easily shown that the minimization of Eq. (A.13) with respect to the parameter \(\varvec{\beta }\) is exactly equivalent to the maximization of a likelihood function of \(y_i\) by assuming \(y_i\) from an ALD(\(\mu , \sigma ,\tau \)) with \(\mu =g(\cdot )\).
The relationship between the check function and ALD can be used to reformulate the QR method in the likelihood framework. By utilizing this property, under independent data setting, a large number of QR-based statistical models and various associated analysis methods have been investigated in the literature. For example, (Koenker and Machado 1999) proposed a likelihood-based goodness-of-fit test for QR. (Yu and Moyeed 2001) developed Bayesian QR, and (Yu and Stander 2007) and (Kozumi and Kobayashi 2011) studied the Bayesian estimation procedure for the Tobit QR model with censored data. More recently, QR-based linear mixed-effects models have been considered via different methods for longitudinal data (Farcomeni 2012; Geraci and Bottai 2007; Kim and Yang 2012; Koenker 2004; Lipsitz et al. 1997; Liu and Bottai 2009; Wang and Fygenson 2009; Yuan and Yin 2010).
Appendix B: R and WinBUGS program codes for Model SN
Rights and permissions
About this article
Cite this article
Zhang, H., Huang, Y. Quantile regression-based Bayesian joint modeling analysis of longitudinal–survival data, with application to an AIDS cohort study. Lifetime Data Anal 26, 339–368 (2020). https://doi.org/10.1007/s10985-019-09478-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-019-09478-w