Abstract
Longitudinal studies involve recording observations at scheduled visits or time points for individuals until a predetermined event, like reaching satisfactory tumor shrinkage in cancer studies. Furthermore, dropout in longitudinal studies leads to incomplete data, which significantly increases the risk of bias. An amended joint shared-random effects model (SREM) is proposed for mixed continuous and binary longitudinal measurements and a time-to-event (TTE) outcome, incorporating missing covariates. In the proposed model, a conditional model is applied for the mixed continuous and binary longitudinal outcomes; a mixed effect model is considered for the continuous longitudinal outcome. For the binary longitudinal outcome, given the continuous longitudinal outcome, a logistic mixed effect model is considered. These models share common random effects with the model for the event time outcome. The model formulation is based on Bayesian statistical thinking via Markov Chain Monte Carlo (MCMC). The proposed joint modelling is applied to contribute to the understanding of the progression of prostate cancer (PCa) by considering a generalized linear mixed effects model for time-varying covariates that incorporate ignorable missingness. The association between prostate-specific antigen (PSA) with alkaline phosphatase (ALP) and tumor status has been studied with mixed conclusions.
Article Highlights
-
1.
The utilization of PSA and ALP biomarkers ensures precision in assessing PCa disease progression after treatment, empowering clinicians with comprehensive and accurate dynamic monitoring.
-
2.
Accounting for missing observations of Platelets and Bilirubin during intermittent follow-up is crucial in improving the accuracy of the analysis, ensuring the generation of valid conclusions regarding PCa insights.
-
3.
Utilizing statistical models that incorporate prior information to update current scenarios is paramount in extracting valuable insights from disease data. The application of Bayesian thinking is instrumental in making this process possible, offering clinicians a powerful tool for informed decision-making.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In medical sciences, two types of data are usually observed simultaneously: repeated measurements and event time data. The repeated measurements are taken on the same subject at some selected schedule visits and data arising from such study design is known as longitudinal data. Event time data are also called survival data, where measurements are observed until any pre-specified event takes place. The observed repeated measurements are recorded before the occurrence of dropout or censoring.
Analyzing longitudinal and event time data, separate modelling techniques are available [1, 2], which do not consider association patterns that may exist among different outcomes recorded at the same individuals until dropout, censoring, or the event of interest took place. Simultaneous modelling is recommended in this regard, receiving valid estimates by accounting for individual variability [3,4,5]. Joint modelling links up sub-models of all responses together [6,7,8].
Many studies observe more than one longitudinal outcome on the same individuals until an event of interest takes place, which leads to a multivariate setting [9, 10]. Li et al. [11] discussed binary, continuous, nominal, and ordinal types of outcomes in longitudinal studies using a mixed joint modelling strategy, event time outcomes may also be observed repeatedly [12]. These different outcomes are analyzed by combining two or more models simultaneously such as Weibull-gamma-normal, probit-beta-normal, and Poisson-gamma-normal models.
Missing data in repeated measurements poses challenges to analysis and resulting stages [13], where outcomes and predictors relationship depends upon the reason for missingness [14]. Missingness in longitudinal studies may arise due to a variety of reasons, such as individuals not having any more interest in follow-up, dropout and/or death of participants, administrative failure to collect data, and censoring [15]. In case of missingness, it is recommended that the missing data should be taken into account in the modelling process to produce valid statistical inferences. This requires the analyst to have an idea about the missing data-generating process, which is known as the missing data mechanism.
Rubin [16] explained in detail about different types of missing data mechanisms by making assumptions related to observed and missing data. Missingness is considered missing completely at random (MCAR) in regards to the probability that a response would be missing and does not depend on missing data or the observed data. Missingness is considered missing at random (MAR) considering the probability that a response would be missing depends upon the observed data but it is independent of unobserved data. Not missing at random (NMAR) is dependent on observed and unobserved data. MCAR and MAR mechanisms are ignorable under some regulatory conditions which must be satisfied, while NMAR mechanism is assumed to be non-ignorable [17]. The ignorability assumption assumes that the process of generating missingness is MAR or MCAR. Philipson et al. [18] gave reviews about missing data handling techniques and provided a brief comparison of methods.
Discussion about the joint modelling of multivariate longitudinal and event time data with missingness is very rare in the literature. However, Njaji et al. [19] elaborated an extended shared parameter joint model framework considering MAR using Creemers et al. [20] ideology. In this paper, we consider the class of joint SREM proposed by Njaji et al. [19] by amending the model to account for association among continuous longitudinal, binary longitudinal, and event time responses while addressing MAR. To the best of our knowledge, no such research work is presented to date.
The main purpose of this research article is to develop joint modelling of mixed longitudinal continuous-binary data and one event time outcome with missing covariates, using an amended SREM. Our proposed joint SREM employs mixed-effects models for longitudinal responses and a proportional hazard model for the event time response, in such a way that the random effects information is shared between the sub-models. Joint SREM is proposed by assuming that the measurement and censoring processes are independent conditional on the random effects. We adopt a Bayesian approach to obtain parameter estimates using MCMC methods. Iterative simulation of the conditional posterior distribution is performed for each parameter using the Gibbs sampler via R and OpenBUGS software [21].
This research is motivated by a dataset collected from PCa patients at one of the most renowned public hospitals in Pakistan. Patients underwent various treatments, which were followed until tumor shrinkage or censoring occurred to take measurements of \({\text{PSA}}\) and \({\text{ALP}}\). While prostate early-stage cancer may not cause significant symptoms, at an advanced level, the presence of cancer can be determined by observing symptoms. If PCa is suspected based on symptoms, certain tests are required to confirm the diagnosis, and for grade detection, a crucial biomarker is the \({\text{PSA}}\) level [22,23,24].
In our research, diagnosed PCa patients are considered as study subjects and followed up until tumor shrinkage occurs, following physicians' directives with different treatments. During this period, various time-dependent and time-independent covariates are observed. PSA measurements are collected repeatedly as a continuous longitudinal response variable, while ALP measurements are collected repeatedly as a binary response variable. Tumor shrinkage is observed as the event of interest, which is not fully observed for all patients due to right censoring or dropout. We propose an amended joint SREM under MAR characterization for the covariates, the SREM poses a conditional independence assumption to be fulfilled; it means the measurement and the dropout processes are independent conditional on the random effects [19]. We have extended this assumption for censoring and measurement processes according to the directives of Papageorgiou and Rizopoulos [25].
This paper is organized as follows: In Sect. 2, we introduce our motivational PCa dataset. In Sect. 3, we describe the proposed joint modelling strategy. Section 4 presents the application of the joint model to analyze the PCa dataset. The final Sect. 5 concludes with a discussion of our results.
2 Motivation: PCa dataset
The motivation of this study is based on the PCa data collected from Mayo Hospital, Lahore. Patients who were diagnosed with PCa as a primary disease were included as study subjects. Data were collected for n = 1504 patients on two longitudinal responses: \({\text{PSA}}\) and \({\text{ALP}}\), with a median follow-up time of 3.00 per patient with a range of 1 to 5. In this study, we use log(PSA) as a continuous response variable and \({\text{ALP}}\) as a binary response variable. Figure 1, shows individual trajectories for \({\text{log}}({\text{PSA}})\) in the PCa dataset. As part of this study, follow-up visits are decided by physicians based on the severity of the disease and other factors.
Consecutive follow-up visits typically have a gap of 28 to 30 days, with the first visit involving a complete check-up and the creation of patients' record files.
Physicians prescribed blood tests over time to monitor changes in time-varying factors such as \({\text{Platelets}}\) and \({\text{Bilirubin}}\) that could affect the outcome variables, including repeated measures of \({\text{PSA}}\) and \({\text{ALP}}\), as well as the event time outcome, which is the time-to-tumor shrinkage.
Our analysis aims to investigate the association between \({\text{logPSA}}\) (mean ± sd: 1.96 ± 2.03) and \({\text{ALP}}\) (1 = high level, 0 = low level) with tumor shrinkage. Missing data occur in time-varying \({\text{Platelets}}\) and \({\text{Bilirubin}}\), with missingness rates of 61.61% and 61.57%, respectively. The primary event of interest in this study is the individuals' condition at the end of the study time, observed through tumor status (1: tumor shrinkage, 0: right-censored). Out of 1504 patients, 960 experienced events of interest, while 544 were right-censored.
Figure 2 presents the Kaplan–Meier (KM) plot for event time, separated by \({\text{Drug}}\) categories: EBRT and (ADT, prostatectomy, and combinations).
The primary goal of analyzing PCa data is to explore the combined impact of \({\text{PSA}}\) levels [26] and \({\text{ALP}}\) levels [27] on tumor shrinkage over time in response to \({\text{Drug}}\).
The mean ± sd of \({\text{Age}}\) (44.69 ± 13.48), \({\text{Platelets}}\) (0.69 ± 0.46), \({\text{BMI}}\) (19.44 ± 2.14), and \({\text{Bilirubin}}\) (1.46 ± 1.25) are presented as continuous covariates. In terms of \({\text{Drug}}\) distribution, out of 1504 patients, 478 were prescribed EBRT.
For analyzing the data, categorical variables are coded into dichotomous variables. The \({\text{Drug}}\) is coded as one for (ADT, prostatectomy, and combinations), and zero for EBRT. Similarly, the \(\mathrm{Greason Score}\) is coded as one for greater than or equal to (4 + 3) and zero for lower than or equal to (3 + 4).
The analysis aims to explore the relationships among three models: linear mixed-effects for \({\text{log}}({\text{PSA}})\), logistic mixed-effects for \({\text{ALP}}\), and the occurrence of the event "shrinkage of tumor." Additionally, a key objective of this study is to assess the impact of covariates on the outcomes. The three sub-models under investigation share the same set of covariates, including two time-varying missing covariates: \({\text{Platelets}}\) per cubic ml/1000 and serum \({\text{Bilirubin}}\) mg/dl, both recorded for individuals. Baseline covariates, observed at the study's onset, encompass the \({\text{Age}}\) of patients in years, \({\text{BMI}}\) (body mass index) of patients in kg/m2, \(\mathrm{Gleason Score}\), and \({\text{Drug}}\). The study aims to elucidate the interplay between these variables and the specified models, contributing valuable insights into the factors influencing the outcomes of interest.
3 Model formation
3.1 Joint modelling of bivariate longitudinal and event time outcomes
The model formulation comprises three sub-models: the first two parts describe the processes for longitudinal outcomes, while the third one pertains to the event time outcome process. Let \({{\text{y}}}_{1{\text{ij}}}\) be the continuous longitudinal response for subject \({\text{i}}\),\({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) at time \({{\text{t}}}_{{\text{ij}}},{\text{j}}=\mathrm{1,2},3,\dots .,{\text{m}},\mathrm{ where m}\) is the dimensional vector of longitudinal continuous measurements such that \({{\text{y}}}_{1{\text{ij}}}\) follows a linear-mixed effects model, which is written as,
where \({{\text{x}}}_{1{\text{ij}}}^{T}\) is a \({{\text{p}}}_{1}\) vector of fixed-effects explanatory variables, \({\upbeta }_{1}\) is a vector of the \({{\text{p}}}_{1}\) fixed effects parameters, \({{\text{b}}}_{{\text{i}}}\) is a \({{\text{q}}}_{1}-\) dimensional vector of random effects, and \({{\text{w}}}_{1{\text{ij}}}^{T}\) is a \({{\text{q}}}_{1}\) dimensional design vector for random effects, also, \({\upvarepsilon }_{{\text{ij}}}\sim \mathrm{ N}\left(0, {\upsigma }_{\upvarepsilon }^{2}\right)\) is a vector of error terms.
Let \({{\text{y}}}_{2{\text{ij}}}\) be the binary repeated measurements for \({\text{ith}}\) individual \({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) at time \({{\text{s}}}_{{\text{ij}}, }{\text{j}}=\mathrm{1,2},3,\dots .,{\text{m}},\) here \({{\text{s}}}_{\mathrm{ij }}={{\text{t}}}_{{\text{ij}}}\). \({{\text{y}}}_{2{\text{ij}}}\) given \({{\text{y}}}_{1{\text{ij}}}\) follows the logistic mixed-effects longitudinal model [28], which can be written as,
where \({{\text{x}}}_{2{\text{ij}}}^{T}\) is a \({{\text{p}}}_{2}\) \(-\mathrm{ p}2-{\text{dimensional}}\) vector of fixed-effects explanatory variable, and \({\upbeta }_{2}\) is the vector of \({{\text{p}}}_{2}\) fixed-effects parameters, which are unknown. \({{\text{u}}}_{{\text{i}}}\) is a \({{\text{q}}}_{2}-\) dimensional vector of the random effects, which are unobserved, and \({{\text{w}}}_{2{\text{ij}}}^{T}\) is its design vector. The term \({\upgamma }_{{\text{j}}}\) is the associated parameter, which is to check the effect of continuous response on the binary response for the time \({{\text{s}}}_{\mathrm{ij }}({{\text{s}}}_{\mathrm{ij }}={{\text{t}}}_{{\text{ij}}})\)
Let \({{\text{T}}}_{{\text{i}}}^{*}\) be the true dropout time for \({\text{ith}}\) individual \({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) in such a way that \({{\text{T}}}_{{\text{i}}}={\text{min}}({{\text{T}}}_{{\text{i}}}^{*},{{\text{C}}}_{{\text{i}}})\) represents the observed dropout or censoring time, where \({{\text{C}}}_{{\text{i}}}\) is censoring.\({\Delta }_{{\text{i}}}={\text{I}}({{\text{T}}}_{{\text{i}}}^{*}\le {{\text{C}}}_{{\text{i}}})\) is the event indicator, which is equal to \(0\) for right censoring and is equal to \(1\) for an observed event. Event time is assumed to follow a Weibull model given as,
where \({{\text{x}}}_{3{\text{i}}}^{T}\) is \({{\text{p}}}_{3}\) \(-\mathrm{ dimensional}\) vector of fixed-effects explanatory variable, \({\upbeta }_{3}\) is the vector of \({{\text{p}}}_{3}\) fixed-effects parameters and r is the Weibull distributions’ shape parameter. \({{\text{b}}}_{{\text{i}}}^{*}+{{\text{u}}}_{{\text{i}}}^{*}\) is the shared parameter that is associated with random effects of longitudinal outcomes such that the random effects \({({\text{b}}}_{{\text{i}}},{\mathrm{ b}}_{{\text{i}}}^{*})\) are assumed to follow a normal distribution with zero mean and variance–covariance matrix \({{\text{D}}}_{1}\mathrm{ and}\) are independent of \({\upvarepsilon }_{{\text{ij}}},\)
Also,\({({\text{u}}}_{{\text{i}}},{{\text{u}}}_{{\text{i}}}^{*}) \sim \mathrm{iid N}\left(0, {{\text{D}}}_{2}\right)\), where
\({{\text{D}}}_{\mathrm{1,11}}\) and \({{\text{D}}}_{\mathrm{2,11}}\) are, respectively, \({{\text{q}}}_{1}\) and \({{\text{q}}}_{2}\) dimensional matrices. Also, \({{\text{D}}}_{\mathrm{1,12}}\) and \({{\text{D}}}_{\mathrm{2,12}}\) measure the degree of association between longitudinal and event time outcomes.
The longitudinal outcomes vector \({{\text{y}}}_{{\text{i}}}=({{\text{y}}}_{1{\text{ij}}},{{\text{y}}}_{2{\text{ij}}})\) and event time outcome \(({{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}})\) are independent given \(\mathrm{random effects}.\) The combined observed data is denoted as,
Let \(\mathrm{\varnothing }\) be the vector of all unknown parameters in the joint model, a full conditional joint distribution of observed data is written by,
where, \(\mathrm{\varnothing }\) is the vector of all unknown parameters. Equation (6) can also be written as,
where, \({\text{p}}\left({{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)={{\text{p}}}^{{\Delta }_{{\text{i}}}}\left({{\text{T}}}_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right){{\text{S}}}^{1-{\Delta }_{{\text{i}}}}\left({{\text{T}}}_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)\), \({\text{p}}\left(.|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)\) and \({\text{S}}(.|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}})\) are the density and the survival functions, respectively.
3.2 Bayesian computation of joint model
Estimation of parameters is done using Bayesian thinking by specifying priors for the unknown parameters, which yields the joint posterior density for all the observed components as,
where, \(\Phi (.,\mu ,\mathrm{ D})\) is the density function of a multivariate normal distribution with mean \(\mu\) and variance D and \({\text{p}}\left(\mathrm{\varnothing }\right) is\) the joint prior distribution of the parameters. The prior distributions for the unknown parameters are,
where \(\mathrm{I\Gamma }\left({\text{a}},{\text{b}}\right)\) and \(\Gamma \left({\text{a}},{\text{b}}\right)\), respectively, denote the inverse gamma distribution and gamma distribution with shape parameter \({\text{a}}\) and scale parameter \({\text{b}}\). \(\mathrm{IWishart }\left(\uppsi ,\mathrm{ v}\right)\) represent the inverse Wishart distribution with scale parameter \({\text{v}}\) and matrix parameter \(\uppsi\). And, \({{\text{N}}}_{{\text{p}}}\left(\upmu , \sum \right)\) denotes a normal distribution with mean vector \(\upmu\) and covariance matrix \(\sum\). Hyperparameters of all the unknown parameters are assumed to be known with proper priors; we assigned low-informative prior distributions for all the parameters, as no previous knowledge is available for elicitation of informative priors. Additionally, we utilize MCMC methods, including the Gibbs sampler and the Metropolis–Hastings algorithm, to iteratively draw samples from the conditional posterior distribution.
3.3 Joint modelling with missing covariates
Specifying a model for missing covariates is essential, yet there is a limited amount of literature available on this topic [29, 30]. For ignorable missing covariates, Hartley and Hocking [31] applied the likelihood factorization method, while Schafer [32] discussed all the standard available techniques to handle incomplete multivariate data. Additionally, Little and Rubin [17] elaborated on addressing missing data problems using observed data likelihood techniques.
Let \({{\text{z}}}_{{\text{ik}}},\mathrm{ k}=1,\dots ,{{\text{K}}}_{1}\) be the time-invariant covariates with missing values. A generalized linear model (GLM) is considered for modelling them, as follows
where \({\uptau }_{1{\text{k}}}\) is the dispersion parameter, and \({{\text{h}}}_{1{\text{k}}}\)(.) and \({{\text{C}}}_{1{\text{k}}}\)(.,.) are known functions. The formulation of the generalized linear model is completed by,
where the link function \({{\text{h}}}_{{\text{k}}}\mathrm{^{\prime}}\) (.) in (10) is the derivative of the function \({{\text{h}}}_{1{\text{k}}}\) (.) in (9); \({{\text{x}}}_{4{\text{ik}}}^{T}\) is a vector of covariates for the regression coefficients \({\mathrm{\vartheta }}_{1{\text{k}}}\). Also, let \({{\text{s}}}_{{\text{ijk}}},\mathrm{ k}=1,\dots ,{{\text{K}}}_{2}\) be the time-varying covariates with missing values. For those, a generalized linear mixed effects model (GLME) is considered as follows,
where \({\uptau }_{2{\text{k}}}\) is the dispersion parameter, and \({{\text{h}}}_{2{\text{k}}}\)(.) and \({{\text{C}}}_{2{\text{k}}}\)(.,.) are known functions. The formulation of the generalized linear mixed model is considered by,
where the link function \({{\text{h}}}_{2{\text{k}}}\mathrm{^{\prime}}\) (.) is the derivative of the function \({{\text{h}}}_{2{\text{k}}}\) (.), \({{\text{x}}}_{5{\text{ijk}}}^{T}\) is vector of covariates for the regression coefficients \({\vartheta }_{2{\text{k}}},{{\text{v}}}_{{\text{ik}}} \sim \mathrm{iid N}\left(0, {D}_{3{\text{k}}}\right)\) are the random effects and \({{\text{w}}}_{3{\text{ijk}}}^{T}\) is the corresponding design matrix.
These additional assumptions pertain to the linear predictor of time-invariant covariates or the value of the covariate at the current time in longitudinal sub-models. However, for time-varying covariates, we consider the value of the linear predictor at the first observed time as a covariate for the event time sub-model. Additionally, we need to include the following term in the conditional posterior distribution (in Eq. 8),
where \(\Theta\) is the vector of all unknown parameters in the missing covariate modelling.
4 Analysis of PCa data
This section illustrates the analysis of PCa data, as described in Sect. 2. To begin, we first address models for the time-varying missing covariates, \({\text{Platelets}}\), and \({\text{Bilirubin}}\). In this context, we employ two linear mixed-effects models with random intercepts, treating time as a fixed effect:
Where \({\upvarepsilon }_{{\text{ij}}}^{{\text{Platelets}}}\sim {\text{N}}\left(0,{\upsigma }_{{\text{Platelets}}}^{2}\right),\) \({\upvarepsilon }_{{\text{ij}}}^{{\text{Bilirubin}}}\sim {\text{N}}(0,{\upsigma }_{{\text{Bilirubin}}}^{2})\), \({v}_{1{\text{i}}}\sim {\text{N}}\left(0,{\uptau }_{{\text{Platelets}}}^{2}\right),\) \({v}_{2{\text{i}}}\sim {\text{N}}(0,{\uptau }_{{\text{Bilirubin}}}^{2})\), \({\mathbf{\varsigma }}_{1}=({\mathrm{\varsigma }}_{11},{\mathrm{\varsigma }}_{12})\), \({\mathbf{\varsigma }}_{2}=({\mathrm{\varsigma }}_{21},{\mathrm{\varsigma }}_{22})\), \({\mathbf{\varsigma }}_{1},\) \({\mathbf{\varsigma }}_{2}\sim {N}_{3}(\mathrm{0,1000}{{\varvec{I}}}_{2})\), where \({{\varvec{I}}}_{2}\) is a 2 × 2 diagonal matrix and \({\upsigma }_{{\text{Platelets}}}^{2},{\upsigma }_{{\text{Bilirubin}}}^{2}\),\({\uptau }_{{\text{Platelets}}}^{2},\) \({\uptau }_{{\text{Bilirubin}}}^{2}\sim \mathrm{I\Gamma }\left(\mathrm{0.1,0.1}\right).\)
The joint modelling comprises three sub-models. For the continuous outcome \({\text{log}}({\text{PSA}})\), a linear mixed-effects (LMM) model with a random intercept and slope is specified:
where,
where \({\upvarepsilon }_{{\text{ij}}}\)~N(0,\({\upsigma }_{\upvarepsilon }^{2}\)). In this model, we cannot use the values of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) directly due to missingness. Instead, we consider the values of linear predictor at time j based on models (11) and (12), respectively. The second longitudinal outcome is a binary \({\text{ALP}}\): a logistic mixed-effects regression model is applied as follows:
where,
The same as the model for the continuous outcome, the values of the linear predictor of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) variables are considered as time-varying covariates in this model.
A Weibull model is specified for time to tumor shrinkage, such that \({{\text{T}}}_{{\text{i}}}\sim {\text{Weibull}}\left({\uplambda }_{{\text{i}}}^{{\text{t}}},{\text{r}}\right),\) where \({\uplambda }_{{\text{i}}}^{{\text{t}}}\) is given as follows:
In the event time sub-model, values of a linear predictor of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) at baseline are considered instead of covariates.
Also, \({{\text{b}}}_{{\text{i}}}=\left({{\text{b}}}_{1{\text{i}}},{{\text{b}}}_{2{\text{i}}},{{\text{b}}}_{3{\text{i}}}\right)\sim {\text{MVN}}\left(\left(\mathrm{0,0},0\right),{\Sigma }_{{\text{b}}}\right)\), and \({{\text{u}}}_{{\text{i}}}=\left({{\text{u}}}_{1{\text{i}}},{{\text{u}}}_{2{\text{i}}},{{\text{u}}}_{3{\text{i}}}\right)\sim {\text{MVN}}\left(\left(\mathrm{0,0},0\right),{\Sigma }_{{\text{u}}}\right), i=1,\dots ,n,\) where \({\Sigma }_{{\text{b}}}\sim \mathrm{IWishart }\left({\uppsi }_{{\Sigma }_{{\text{b}}}},{{\text{v}}}_{{\Sigma }_{{\text{b}}}}\right),{\Sigma }_{{\text{u}}}\sim \mathrm{IWishart }\left({\uppsi }_{{\Sigma }_{{\text{u}}}},{{\text{v}}}_{{\Sigma }_{{\text{u}}}}\right),\) such that the hyper-parameters of \({\uppsi }_{{\Sigma }_{{\text{b}}}}={\uppsi }_{{\Sigma }_{{\text{u}}}}={{\text{I}}}_{2}\) and \({{\text{v}}}_{{\Sigma }_{{\text{b}}}}={{\text{v}}}_{{\Sigma }_{{\text{u}}}}=3\) which lead to low-informative priors.
For the prior distributions, it is to be assumed that \({\upsigma }_{\upvarepsilon }^{2}\sim \mathrm{I\Gamma }\left(\mathrm{0.1,0.1}\right),\) the regression coefficients \({\upbeta }_{11}, \dots .,{\upbeta }_{18},{\upbeta }_{21},\dots .,{\upbeta }_{28},\) and \({\upbeta }_{31},\dots .,{\upbeta }_{37}\) are fixed effects unknown parameters and the prior distributions for them are N(0,1000), \({\upgamma }_{{\text{j}}}, j=1,\dots ,m\) is the associated parameter of the continuous longitudinal \({\text{log}}\left(\mathrm{ PSA}\right)\) on the binary longitudinal \({\text{ALP}}\) at time \({\text{j}}\) and the prior distributions for it is N(0,1000).
For analyzing the data, in addition to the proposed joint model, separate models are also considered. Model estimation was conducted using R2OpenBUGS with MCMC in two parallel chains running for 10,000 iterations, the first 5000 being discarded as burn-in. For model comparison, DIC is considered; this criterion for the proposed joint model is equal to 39,920 and for separate models is equal to 51,290. Therefore, the proposed joint model has better performance. The results for these two models are summarized in Tables 1, 2 and 3.
Table 1 presents the posterior mean, standard deviation (sd), and 95% CI of the PCa data analysis. The results indicate that \({\text{PSA}}\) decreases for every unit increase in time, observed in both joint and separate models. \({\text{Age}}\) has a significant effect on \({\text{PSA}}\); with a one-unit increase in \({\text{Age}}\), \({\text{PSA}}\) levels decrease during the follow-up time. Additionally, an increase in \({\text{Platelets}}\) is associated with an increase in \({\text{PSA}}\). It is noteworthy that a one-unit increase in \({\text{BMI}}\) decreases \({\text{PSA}}\) by 0.041. The results show that \({\text{PSA}}\) is higher among patients with a \(\mathrm{Gleason Score}\) greater than or equal to (4 + 3) compared to those with a \(\mathrm{Gleason Score}\) lower than (3 + 4). This study's results reveal that patients who received ADT, prostatectomy, and combinations have higher \({\text{PSA}}\) levels compared to those who received EBRT, and this effect is statistically significant.
Based on the results of the binary outcome, an increase in \({\text{BMI}}\) is associated with a significant decrease in \({\text{ALP}}\), while \({\text{ALP}}\) shows a significant increase with an increase in \({\text{Bilirubin}}\). The analysis of shared parameters indicates a positive association between continuous and binary longitudinal outcomes.
For the event time model, the risk of tumor shrinkage increases with time in patients with larger values of \({\text{BMI}}\) and \({\text{Bilirubin}}\). The hazard of tumor shrinkage is higher among patients who received EBRT than among those who received ADT, prostatectomy, and combinations.
To assess the dependence between longitudinal markers and the event time outcome, the significance of \({\upsigma }_{{\text{u}}13}\), \({\upsigma }_{{\text{u}}33}\), \({\upsigma }_{{\text{b}}23}\), and \({\upsigma }_{{\text{b}}33}\) is examined (see Table 2). As \({\upsigma }_{{\text{u}}33}\) is significant, it confirms the association between longitudinal markers and event time outcome. Additionally, Table 3 presents parameter estimations of missing covariates, considered in the context of accounting for missing data.
5 Conclusions
We have proposed an amended joint SREM with model-based handling of MAR, taking into account assumptions regarding non-informative censoring and ignorability. In this approach, different types of longitudinal outcomes (continuous and binary) are considered alongside an event time outcome. This research article introduces Bayesian joint modelling of multivariate longitudinal mixed measurements and the event time model using the MCMC approach within the joint modelling framework.
Our main purpose is to contribute to the understanding of PCa progression which affects a large percentage of men. Complexity increases in data analysis accounting for missing observations in time-varying covariates. This work aims particularly to determine, which factors affect the shrinkage of PCa tumors during treatment; and how these factors interact over time [33]. Two longitudinal outcomes \({\text{PSA}}\) and \({\text{ALP}}\) are modeled along with tumor shrinkage, using a joint modelling framework. Moreover, an association among the three responses is taken into consideration. The analysis is done in such a way that the effect of different factors does not change with time and has complete measurements for all patients and the effect of variables whose values change with respect to time and not all observations are taken on all time points for patients are checked on longitudinal and event time responses. A joint modelling strategy is adopted to understand the joint evolution of repeated measurements of \({\text{PSA}}\), \({\text{ALP}}\), and \({\text{TTE}}\) processes at an individual level. In contrast, separate analyses may lead to producing biased estimates and may provide inefficient results [6].
The linear mixed-effects models presented here are based on the assumption that dropout is ignorable (MAR). Substantial evidence indicates an association between \({\text{PSA}}\) levels and \({\text{ALP}}\), suggesting increased tumor shrinkage following treatment.
This article focuses on the MAR characterization; non-ignorable missingness (NMAR) is under consideration for future work plans. Instead of utilizing SREM, one can explore alternatives such as pattern mixture and selection for handling non-random missing data [18]. Additionally, variations may be introduced by employing a probit model instead of a logistic-mixed effects model to analyze the \({\text{ALP}}\) response variable. For the random effects and error terms, various distributional choices are available beyond the normal distribution. In cases where historical data or strong prior information is available, informative priors can be employed.
Data availability
The data utilized in this study are integral components of a larger project conducted by the authors. Owing to the sensitive nature of the information, the data cannot be disseminated through any public repository. However, interested researchers may request access to the data by contacting the Corresponding Author for further inquiries. In the interest of transparency and reproducibility, the data can be made available in the form of data simulations upon reasonable request. Furthermore, to facilitate a comprehensive understanding of the methodologies employed, the relevant code scripts utilized for data analysis in this study can be provided to interested readers.
References
Verbeke G, Molenberghs G, Verbeke G. Linear mixed models for longitudinal data. New York: Springer; 1997.
Enderlein G, Cox DR, Oakes D. Analysis of survival data Chapman and Hall London-New York 1984, 201 S. Biom J. 1987;29:114.
Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Stat Med. 1996;15(15):1663–85.
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339.
Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Stat Sin. 2004;14:809–834.
Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92(3):587–603.
Diggle PJ, Sousa I, Chetwynd AG. Joint modelling of repeated measurements and time-to-event outcomes: the fourth Armitage lecture. Stat Med. 2008;27(16):2981–2998.
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1(4):465–480.
Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011;30(12):1366–1380.
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC Med Res Methodol. 2016;16:1–5.
Li H, Zhang Y, Carroll RJ, Keadle SK, Sampson JN, Matthews CE. A joint modeling and estimation method for multivariate longitudinal data with mixed types of responses to analyze physical activity data generated by accelerometers. Stat Med. 2017;36(25):4028–4040.
Dendale P, De Keulenaer G, Troisfontaines P, Weytjens C, Mullens W, Elegeert I, Ector B, Houbrechts M, Willekens K, Hansen D. Effect of a telemonitoring-facilitated collaboration between general practitioner and heart failure clinic on mortality and rehospitalization rates in severe heart failure: the TEMA-HF 1 (TElemonitoring in the MAnagement of heart failure) study. Eur J Heart Fail. 2012;14(3):333–340.
Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. TEST. 2009;18(1):1–43.
Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147.
Shardell M, Hicks GE, Ferrucci L. Doubly robust estimation and causal inference in longitudinal studies with dropout and truncation by death. Biostatistics. 2015;16(1):155–168.
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592.
Little RJ, Rubin DB. Statistical analysis with missing data, vol. 793. Hoboken: Wiley; 2019.
Philipson PM, Ho WK, Henderson R. Comparative review of methods for handling drop-out in longitudinal studies. Stat Med. 2008;27(30):6276–6298.
Njagi EN, Molenberghs G, Kenward MG, Verbeke G, Rizopoulos D. A characterization of missingness at random in a generalized shared-parameter joint modeling framework for longitudinal and time-to-event data, and sensitivity analysis. Biom J. 2014;56(6):1001–1015.
Creemers A, Hens N, Aerts M, Molenberghs G, Verbeke G, Kenward MG. Generalized shared-parameter models and missingness at random. Stat Model. 2011;11(4):279–310.
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639.
Proust-Lima C, Taylor JM. Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics. 2009;10(3):535–549.
Ferrer L, Rondeau V, Dignam J, Pickles T, Jacqmin-Gadda H, Proust-Lima C. Joint modelling of longitudinal and multi-state processes: application to clinical progressions in prostate cancer. Stat Med. 2016;35(22):3933–3948.
Sheikh MT, Ibrahim JG, Gelfond JA, Sun W, Chen MH. Joint modelling of longitudinal and survival data in the presence of competing risks with applications to prostate cancer data. Stat Model. 2021;21(1–2):72–94.
Papageorgiou G, Rizopoulos D. An alternative characterization of MAR in shared parameter models for incomplete longitudinal data and its utilization for sensitivity analysis. Stat Model. 2021;21(1–2):95–114.
Carvalhal GF, Daudi SN, Kan D, Mondo D, Roehl KA, Loeb S, Catalona WJ. Correlation between serum prostate-specific antigen and cancer volume in prostate glands of different sizes. Urology. 2010;76(5):1072–1076.
Li D, Lv H, Hao X, Hu B, Song Y. Prognostic value of serum alkaline phosphatase in the survival of prostate cancer: evidence from a meta-analysis. Cancer Manag Res. 2018;10:3125–3139.
Parzen M, Ghosh S, Lipsitz S, Sinha D, Fitzmaurice GM, Mallick BK, Ibrahim JG. A generalized linear mixed model for longitudinal binary data with a marginal logit link function. Ann Appl Stat. 2011;5(1):449.
Lipsitz SR, Ibrahim JG, Fitzmaurice GM. Likelihood methods for incomplete longitudinal binary responses with incomplete categorical covariates. Biometrics. 1999;55(1):214–223.
Ibrahim JG, Chen MH, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55(2):591–596.
Hartley HO, Hocking RR. The analysis of incomplete data. Biometrics. 1971;27(4):783–823.
Schafer JL. Analysis of incomplete multivariate data. Boca Raton: CRC Press; 1997.
Liaqat M, Kamal S, Fischer F. Illustration of association between change in prostate-specific antigen (PSA) values and time to tumor status after treatment for prostate cancer patients: a joint modelling approach. BMC Urol. 2023;23(1):202.
Acknowledgements
We express our gratitude to the staff of the Oncology and Radiology Department at Mayo Hospital, Lahore, for their valuable support in data collection. Additionally, we appreciate the exceptional efforts of Dr. Abbas Khokar (MBBS, FCPS), Head of the Oncology Department at Mayo Hospital, Lahore, Pakistan, in meticulously organizing patients' records. We also acknowledge Dr. Taban Baghfalaki for providing valuable assistance in coding and data analysis. Your support has been instrumental in enhancing the quality of our work.
Author information
Authors and Affiliations
Contributions
ML conceived the original idea of the study, designed the study, analyzed the data and drafted the manuscript. SK supervised the whole study design. SK and RAK revised it critically for important intellectual content. All authors approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study received formal approval from the Advanced Studies and Review Board at the University of the Punjab, Lahore, Pakistan, under the esteemed authorization of the Dean of Sciences, Prof. Dr. Shahid Kamal. The official ethical letter, a pivotal document in this process, was meticulously drafted and endorsed by Prof. Dr. Shahid Kamal and Ms. Madiha Liaqat. A collaborative understanding between Mayo Hospital and the University of the Punjab facilitated the ethical committee’s approval of the letter, thereby granting explicit permission to collect data from patients' records for use in the study. In adherence to stringent confidentiality protocols, all collected data were meticulously anonymized, ensuring the absence of any links to individual patients. Notably, no personal identifiers were associated with the data. Consequently, due to the anonymized nature of the dataset, informed consent from the patients was formally waived by Mayo Hospital.
Competing interests
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liaqat, M., Khan, R.A. & Kamal, S. Comprehensive modelling of prostate cancer progression: integrating continuous and binary biomarkers with event time data and missing covariates. Discov Appl Sci 6, 71 (2024). https://doi.org/10.1007/s42452-024-05727-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452-024-05727-2