Introduction

The discrete probability distributions are used to model the discrete events including earthquakes, car accidents, number of landslide or other events like number of people dying of diseases like breast cancer, PBC2, etc. These events can be characterized diffrenetly in different study areas. Some authors have tried to introduce some flexible new distributions to model these events. Mixed-Poisson type discrete distributions and survival discretization method are the two popular methods used for this, where the Poisson distribution serves as a baseline for the characterization of the events mentioned here (Eliwa et al.1).

The recently introduced methods with the help of Poisson distribution can be cited as follows:

Corresponding to a member of modified power series, COM-Poisson distribution, Chakraborty and Ong2 introduced a notion of the negative binomial (NB) model. This generalized form of NB also is a member of generalized hypergeometric and exponential families. These authors have also discussed the weighted NB and COM-Poisson distributions.

A family of distributions, exponentiated generalized-G Poisson family of distributions, is introduced by Aryal and Yousof3 by the help of the truncated Poisson and the exponentiated generalized-G distributions.

Wenhao et al.4 introduced a firsthand compounding distribution, which is named as the Lindley- Poisson distribution and they studied the most important statistical properties of the distribution. Nasir et al.5 introduced a firsthand class of univariate continuous distribution called Odd Burr-G Poisson class of distributions (in short OBGP) and they studied four special sub models associated with the newly introduced family of distributions.

Joshi and Kumar6 have generated a firsthand model using the Poisson-G family and inverted Lomax distribution as a baseline distribution and they named it Poisson inverted Lomax distribution and they have studied some distributional properties of it.

Another important study on which some part of this study based is the study of Alkarni and Oraby7. They proposed a survival family with declining hazard function which is obtained by compounding shortened in duration Poisson distribution and a lifetime distribution.

Niyomdecha and Srisuradetchai8 proposed a survival distribution called the complementary gamma zero-truncated Poisson distribution (CGZTP) by building the base on Alkarni and Oraby7. The other authors who laid theirs on the study of Alkarni and Oraby7 are Muhammad and Liu9. They proposed a class of lifetime distribution called complementary exponentiated BurrXII Poisson (CEBXIIP) distribution.

The more recent developments based one Poisson, Lomax, and inverse Lomax distributions discussed as: Sapkota et al.10 focused on the inverse exponential power distribution applied to COVID-19 and medical datasets. Hassan et al. introduced the inverse exponentiated Lomax power series (IELoPS) class of distributions, obtained by compounding the inverse exponentiated Lomax and power series distributions. Srisuradetchai and Niyomdecha11 presented Bayesian estimation methods applied to the gamma zero-truncated Poisson (GZTP) and the complementary gamma zero-truncated Poisson (CGZTP) distributions. Niyomdecha and Srisuradetchai8 proposed a new continuous three-parameter lifetime distribution called the complementary gamma zero-truncated Poisson distribution (CGZTP), which combines the distribution of the maximum of a series of independently identical gamma-distributed random variables with zero-truncated Poisson random variables.

The other authors whose study is according to the mixture POISSON distribution are Yousof et al.12. They could introduce a firsthand family of continuous distributions called the generalized poisson family which extends the quadratic rank transmutation map.

Followed by a brief review of the approaches for proposing new models for different time-to-event and lifetime data, models for repeatedly measured outcomes, survival outcomes, and the happening of both outcomes will be highlighted subsequently.

The scholars13 called a methodological enhancement of modelling the longitudinal and survival outcomes simultaneously as a joint modelling. This idea of simultaneous modelling these kinds of outcomes has been growing since two decades in health and clinical applications. Observations in clinical studies are frequently set down about patients at each follow-up visit. In this case, the outcome generates a repeatedly measured data. Thereafter, times to one or greater clinically significant events can be set down. The repeatedly measured data may be censored due to the events like when the event was observed to be death or treatment failure.

Njagi et al.14 have proposed a flexible joint modelling framework for longitudinal and survival data with extra-variability by joining features in common and Gaussian chance effects in a combined model for responses in which with, no less than one non-normal. Here the specific focus is given for the case when one of the outcomes is time-to-even type.

Krahn et al.15 have proposed a competent estimation method to combined modelling of repeatedly measured and time-to-event data. They have discussed the pre-test and reduction approximation approaches for combined modelling of repeatedly measured data and time-to-event data while selected covariates in both repeatedly measured data and time-to-event mechanisms might not be appropriate for forecasting time-to-event times and they have detailed that combined models normally chain direct variegated effects models for longitudinal data and semi-parametric models for time-to-event time.

The more recent studies related to the joint models are viewed as: Wongvibulsin et al.16 developed a novel tools to support improved clinical decision making through methods for individual-level risk prediction that can jointly handle multiple variables, their interactions, and time-varying values. Parr et al.17 conducted a review on a dynamic prediction joint model which enables updates of patients’ risk as new information becoming available by incorporating prostate-specific antigen (PSA) measurements over time. Wang et al.18 developed an R Package JMcmprsk for joint modelling of longitudinal and survival data with competing risks. Cong et al.19 devoted to the R package JSM which performs joint statistical modeling of survival and longitudinal data.

This study has two parts. The first part of the study introduces a newly proposed distribution model called an Inverse Lomax-Uniform Poisson distribution (ILUP) for application of the health data with different patterns of the data. This part is presented in “Introduction”, “An inverse Lomax-uniform and a compound family of Poisson distributions”, “An inverse Lomax-uniform Poisson distribution model”, “Statistical properties of ILUP distribution”, “Estimation and simulation study” and “An application to breast cancer patients data” in detail. The second part discusses the joint models for the repeated measured and time-to-event data in “Joint models for repeatedly measured and time-to-event data”.

The remaining part of the study is structured like this: “An inverse Lomax-uniform and a compound family of Poisson distributions” and “An inverse Lomax-uniform Poisson distribution model” introduce ILUP distribution, “Statistical properties of ILUP distribution” discusses basic statistical properties of the newly proposed distribution. Section “Estimation and simulation study” presents an estimation of the model parameters of the newly introduced distribution, “An application to breast cancer patients data” illustrates the application ILUP to the new dataset. Section “Joint models for repeatedly measured and time-to-event data” presents joint models with applications, and “Concluding remarks” summarizes the points with concluding remarks.

An inverse Lomax-uniform and a compound family of Poisson distributions

The derivation of the inverse Lomax-Uniform (ILU) distribution model and the combination of ILU with the compound class of Poisson are given in this section.

Let \(X\sim U(0,\theta )\) with the cumulative distribution function (CDF) and the probability density function (PDF) which are given by:

$$\begin{aligned} F\left( x;\theta \right) = \frac{x}{\theta }, \quad x,\theta >0, \end{aligned}$$
(1)

and

$$\begin{aligned} f\left( x;\theta \right) = \frac{1}{\theta }, \quad x,\theta >0, \end{aligned}$$
(2)

respectively.

And further, suppose X has an inverse Lomax (IL) model with CDF and PDF which are given as follows:

$$\begin{aligned} F\left( x;\lambda ,\beta \right) = \left( 1+\frac{\lambda }{x}\right) ^{-\beta }, \quad x,\lambda ,\beta >0, \end{aligned}$$
(3)

and

$$\begin{aligned} f\left( x;\lambda ,\beta \right) =\beta \lambda x^{-2} \left( 1+\frac{\lambda }{x}\right) ^{-(\beta +1)}, \quad x,\lambda ,\beta >0, \end{aligned}$$
(4)

respectively.

Hence, by combining Eqs. (14), we get the CDF and PDF of ILU as follows:

$$\begin{aligned} F_{ILU}\left( x;\pmb {\nu }\right) =\left( 1+\frac{\lambda \theta }{x}\right) ^{-\beta }, \quad x,\lambda ,\beta ,\theta >0, \end{aligned}$$
(5)

and

$$\begin{aligned} f_{ILU}\left( x;\pmb {\nu }\right) =\beta \lambda \theta x^{-2}\left( 1+\frac{\lambda \theta }{x}\right) ^{-(\beta +1)}, \quad x,\lambda ,\beta ,\theta >0, \end{aligned}$$
(6)

respectively.

Here after, we consider and follow the compound class of Poisson and lifetimes distributions approach7. Thus, let \(Y_{1}, Y_{2},\ldots , Y_{Q}\) be independent and identically distributed (iid) random variables, each has a PDF f and we let Q be a discrete random variable which has a zero-truncated Poisson distribution whose probability mass function (PMF) can be given by:

$$\begin{aligned} P_{Q}\left( q;\lambda \right) =\frac{e^{-\lambda }\lambda ^{q}}{\Gamma (q)(1-e^{-\lambda })}, {\; \; \; \; \;} q \in (1, 2,\ldots ), \lambda >0. \end{aligned}$$

When we consider the minimum values of Y, \(X=min\{Y_{1}, Y_{2}, \ldots , Y_{Q} \}\) and \(Y^{'}_{i}\)s and Q are independent, where X follows the ILU distribution. Hence, the conditional PDF of X and Q can be given by:

$$\begin{aligned} f_{X\mid Q}\left( x\mid q\right) =qf_{ILU}(x;\pmb {\nu })\left[ 1-F_{ILU}(x;\pmb {\nu }) \right] ^{q-1}, \end{aligned}$$

and the joint PDF and CDF of X and Q are computed as:

$$\begin{aligned} f_{X, Q}\left( x, q;\lambda \right)&=\frac{\lambda e^{-\lambda }}{1-e^{-\lambda }}f_{ILU}(x;\pmb {\nu })\sum _{q=1}^{\infty } \frac{[\lambda (1-F_{ILU}(x;\pmb {\nu }))]^{q-1}}{\Gamma (q)} \nonumber \\&= \frac{\lambda f_{ILU}(x;\pmb {\nu })e^{-\lambda F_{ILU}(x;\pmb {\nu })}}{1-e^{-\lambda }}, \end{aligned}$$
(7)

and

$$\begin{aligned} F_{X}\left( x;\lambda \right)&=\int _{0}^{x}\frac{\lambda f_{ILU}(x;\pmb {\nu }) e^{-\lambda F_{ILU}(x;\pmb {\nu })}}{1-e^{-\lambda }} dx \nonumber \\&= \frac{1-e^{-\lambda F_{ILU}(x;\pmb {\nu })}}{1-e^{-\lambda }}, \end{aligned}$$
(8)

respectively.

An inverse Lomax-uniform Poisson distribution model

This section presents the proposed ILUP model by using the compound class of Poisson and lifetimes distributions approach7 in connection with ILU model (see Eqs. 5 and 6). Thus, as the result of the marginal distributions in Eqs. (7 and 8), by substituting Eqs. (5 and 6) in these equations and by simplifying them, we finally get the CDF and PDF of ILUPoisson (ILUP) as below:

$$\begin{aligned} F_{ILUP}\left( x;\pmb {\nu }\right) = \frac{1-e^{-\lambda \left[ 1+\frac{\lambda \theta }{x}\right] ^{-\beta }}}{1-e^{-\lambda }}, {\; \; \; \; \;} \end{aligned}$$
(9)

and

$$\begin{aligned} f_{ILUP}\left( x;\pmb {\nu }\right) = \frac{\lambda ^{2}\beta \theta x^{-2}\left( 1+\frac{\lambda \theta }{x} \right) ^{-(\beta +1)} e^{-\lambda \left[ 1+\frac{\lambda \theta }{x}\right] ^{-\beta }}}{1-e^{-\lambda }}, {\; \; \; \; \;} \end{aligned}$$
(10)

respectively, where \(\pmb {\nu }=\left( \beta ,\lambda ,\theta \right)>0; x>0\).

For \(\lambda , \beta , \theta , x >0\), the survival function S(x) , hazard function h(x) , cumulative hazard rate function H(x) , and reverse failure rate function r(x) of the ILUP are computed as:

$$\begin{aligned} & S\left( x;\pmb {\nu }\right) = \frac{e^{-\lambda \left[ 1+\frac{\lambda \theta }{x}\right] ^{-\beta }}-e^{-\lambda }}{1-e^{-\lambda }},\\ & h\left( x;\pmb {\nu }\right) = \dfrac{\lambda ^{2}\beta \theta x^{-2}\left( 1+\frac{\lambda \theta }{x}\right) ^{-(\beta +1)} e^{-\lambda \left[ 1+\frac{\lambda \theta }{x} \right] ^{-\beta }}}{e^{-\lambda \left[ 1+\frac{\lambda \theta }{x} \right] ^{-\beta }}-e^{-\lambda }},\\ & H\left( x;\pmb {\nu }\right) = \dfrac{\lambda \left[ \left( 1+\frac{\lambda \theta }{x} \right) ^{-\beta }-1\right] }{1-e^{-\lambda }}, \end{aligned}$$

and

$$\begin{aligned} r\left( x;\pmb {\nu }\right) = \dfrac{\lambda ^{2}\beta \theta x^{-2}\left( 1+\frac{\lambda \theta }{x} \right) ^{-(\beta +1)} e^{-\lambda \left[ 1+\frac{\lambda \theta }{x} \right] ^{-\beta }}}{1-e^{-\lambda \left[ 1+\frac{\lambda \theta }{x} \right] ^{-\beta }}}, \end{aligned}$$

respectively.

Following this, graphical expression of the ILUP distribution model is displayed below:

Figure 1
figure 1

Plots of the PDF, CDF, hazard rate, and survival functions of the ILUP distribution model.

Figure 1 displays the PDF, CDF, hazard rate, and survival functions of the ILUP distribution model for different parameter values to show different patterns and how flexible the distribution is.

Figure 2
figure 2

Plot of hazard function for the ILUP distribution model.

Figure 2 depicts the plots of additional hazard function for the ILUP distribution model. It indicates that the newly proposed model has a good pattern that the increasing, decreasing, bathtub, and right-skewed patterns are shown from the figure.

Statistical properties of ILUP distribution

In this section, some of the important statistical properties of the newly proposed family of distributions are discussed.

Quantile function

Quantile function can be used for many purposes in theory and numerical applications in statistics. For example, it can be used to draw simulations. The quantile function of ILUP distribution can be obtained by applying the inversion technique. Thus,

$$\begin{aligned} Kx_{n}(u)=F_{ILUP}(x;\pmb {\nu })^{-1},\\ \dfrac{1-e^{-\lambda [ 1+\frac{\lambda \theta }{x}]^{-\beta }}}{1-e^{-\lambda }}=u, {0<u<1}. \end{aligned}$$

By solving the non-linear equation, it can be derived as:

$$\begin{aligned} Kx_{n}\left( u\right) =\dfrac{\lambda \theta }{\left\{ \frac{-1}{\lambda } \log \left[ 1-u(1-e^{-\lambda })\right] \right\} ^{-1/\beta }-1}. \end{aligned}$$
(11)

It’s considered that U \(\sim\) uniform(0, 1). The median(Med) can be obtained by substituting \(u=1/2\) in the above Eq. (11) as:

$$\begin{aligned} Med=\dfrac{\lambda \theta }{\left\{ \frac{-1}{\lambda } \log \left[ 0.5(1+e^{-\lambda })\right] \right\} ^{-1/\beta }-1}. \end{aligned}$$

In the same way, we can get the lower and upper quartiles by substituting \(u=1/4\) and \(u= 3/4\) in Eq. (11), respectively.

Skewness and Kurtosis

Using the formula proposed by Galton20 and Moors21, one can get the numerical results for Skewness (Sk) and Kurtosis (Ku) by the help of quartiles from Eq. (11) as

$$\begin{aligned} Sk=\dfrac{(K_{3}-2K_{2}+K_{1})}{(K_{3}-K_{1})} =\dfrac{(k_{(0.75)}-2k_{(0.5)}+k_{(0.25)})}{(k_{(0.75)}-k_{(0.25)})}, \end{aligned}$$

and

$$\begin{aligned} Ku=\dfrac{K_{\frac{7}{8}}-K_{\frac{5}{8}}+K_{\frac{3}{8} }-K_{\frac{1}{8}}}{K_{\frac{6}{8}}-K_{\frac{2}{8}}} =\dfrac{k_{(0.875)}-k_{(0.625)}+k_{(0.375)} -k_{(0.125)}}{k_{(0.75)}-k_{(0.25)}}, \end{aligned}$$

respectively.

Order statistics

Order statistics are widely used in the applications of applied statistics like reliability and lifetime testing. Suppose that \(X_{1},X_{2},\ldots , X_{n}\) be a random sample of size n follows the ILUP distribution with CDF and PDF given in Eqs. (9 and 10), respectively with the parameters \(\lambda , \beta , \theta\) and let \(X_{1:n},X_{2:n},\ldots , X_{n:n}\) be the corresponding order statistics. Then, the density of \(X_{i:n}\) for (\(i=1, 2,\ldots , n\)) is given by:

$$\begin{aligned} f_{i:n}\left( x\right) =\dfrac{n!}{(i-1)!(n-i)!}\sum _{j=0}^{n-i}(-1)^{j} \left( \begin{array}{c} n-i \\ j \end{array} \right) f_{ILUP}\left( x;\pmb {\nu }\right) \left[ F_{ILUP}\left( x;\pmb {\nu }\right) \right] ^{i+j-1}. \end{aligned}$$

By substituting the respective CDF and PDF in the above equation, we can get the following:

$$\begin{aligned} f_{i:n}\left( x\right)&=\dfrac{n!}{(i-1)!(n-i)!}\sum _{j=0}^{n-i}(-1)^{j} \left( \begin{array}{c} n-i \\ j \end{array} \right) \dfrac{\lambda ^{2}\beta \theta x^{-2}\left( 1+\frac{\lambda \theta }{x}\right) ^{-(\beta +1)} e^{-\lambda \left( 1+\frac{\lambda \theta }{x} \right) ^{-\beta }}}{(1-e^{-\lambda })} \nonumber \\ &\quad \times \left[ \dfrac{1-e^{-\lambda \left( 1+\frac{\lambda \theta }{x} \right) ^{-\beta }}}{1-e^{-\lambda }} \right] ^{j+i-1}. \end{aligned}$$

Rewriting the equation gives

$$\begin{aligned} f_{i:n}\left( x\right)&=\dfrac{n!}{(i-1)!(n-i)!}\sum _{j=0}^{n-i}(-1)^{j} \left( \begin{array}{c} n-i \\ j \end{array}\right) \left( 1-e^{-(i+j)\lambda }\right) ^{-1}\lambda ^{2}\beta \theta x^{-2}\log \left[ e^{(i+j)\left( 1+\frac{\lambda \theta }{x} \right) ^{-(\beta +1)}}\right] \nonumber \\&\quad \times e^{-(i+j)\lambda \left( 1+\frac{\lambda \theta }{x}\right) ^{-\beta }} \left[ \dfrac{1+e^{-\lambda \left( 1+\frac{\lambda \theta }{x}\right) ^{-\beta }}}{1-e^{-\lambda }}\right] ^{j+i-1}. \end{aligned}$$

The simplified form of the order statistics function is given by

$$\begin{aligned} f_{i:n}\left( x\right) =\sum _{j=0}^{n-i}\eta _{j}f_{ILUP}^{*}(x;\pmb {\nu }), \end{aligned}$$

where

$$\begin{aligned} \eta _{j}=\dfrac{n!}{(i-1)!(n-i)!}\sum _{j=0}^{n-i}(-1)^{j} \left( \begin{array}{c} n-i \\ j \end{array}\right) , \end{aligned}$$
(12)

and \(f_{ILUP}^{*}(x;\pmb {\nu })\) is the product of the density of the ith order statistics.

The order statistics for the ILUP can be obtained from Eq. (12) and its rth central instants with its functions \(M_{x}\left( t\right)\) are given in the next section.

Moments and moment generating functions

By using Eq. (10), the rth central moments of the newly proposed model can be obtained as follows:

$$\begin{aligned} E\left( x^{r}\right) =\int _{0}^{\infty } \dfrac{x^{r}\lambda ^{2}\beta \theta x^{-2}\left( 1+\frac{\lambda \theta }{x} \right) ^{-(\beta +1)} e^{-\lambda \left( 1+\frac{\lambda \theta }{x}\right) ^{-\beta }}}{1-e^{-\lambda }} dx, \end{aligned}$$
(13)

Let

$$\begin{aligned} y=\frac{\lambda \theta }{x} \Rightarrow x=\frac{\lambda \theta }{y-1}, \end{aligned}$$

then

$$\begin{aligned} \frac{dy}{dx} =-\lambda \theta x^{-2},\\\Rightarrow dy=-\lambda \theta x^{-2}dx.\end{aligned}$$

Substituting this in Eq. (13) gives

$$\begin{aligned} E\left( x^{r}\right)&= \int _{1}^{\infty }-\left( 1-e^{-\lambda }\right) ^{-1}\lambda \beta (\lambda \theta )^{r}(y-1)^{-r} y^{-(\beta +1)}e^{-\lambda y^{-\beta }} dy,\\&=\left( e^{-\lambda }-1\right) ^{-1}\lambda ^{(1+r)}\theta ^{r}\beta \int _{1}^{\infty }(y-1)^{-r} y^{-(\beta +1)}e^{-\lambda y^{-\beta }} dy, \end{aligned}$$

Now let’s consider the following expressions: For \(k \in N^{+}\),

$$\begin{aligned} (1-w)^{k}=\sum _{j=0}^{k} \left( \begin{array}{c} k \\ j \end{array}\right) (-1)^{j} w^{j}, \end{aligned}$$

and

$$\begin{aligned} e^{x}=\sum _{n=0}^{\infty }\frac{x^{n}}{n!}. \end{aligned}$$

By adapting these expressions for our case, we get

$$\begin{aligned} (1-y)^{r}=\sum _{j=0}^{r} (-1)^{r-j}\left( \begin{array}{c} r \\ j \end{array}\right) y^{j-\beta -1}, \end{aligned}$$

and

$$\begin{aligned} e^{-\lambda y^{-\beta }}=\sum _{n=0}^{\infty }\frac{(-\lambda y^{-\beta })^{n}}{n!}. \end{aligned}$$

Therefore, \(E(x^{r})=\mu _{r}^{'}\) is given by:

$$\begin{aligned} \mu _r^{'}&= \left( e^{-\lambda }-1 \right) ^{-1}\lambda ^{(1+r)}\theta ^{r}\beta \int _{1}^{\infty }\sum _{j=0}^{r}(-1)^{r-j}\left( \begin{array}{c} r \\ j \end{array}\right) y^{j-\beta -1}\sum _{n=0}^{\infty }\frac{(-\lambda y^{-\beta })^{n}}{n!} dy,\\&=\left( e^{-\lambda }-1 \right) ^{-1}\lambda ^{(1+r)}\theta ^{r}\beta \int _{1}^{\infty }\sum _{n=0}^{\infty }\sum _{j=0}^{r}\left( \begin{array}{c} r \\ j \end{array}\right) \lambda ^{(1+n)}\frac{(-1)^{n+r-j}y^{j-\beta (1+n)-1}}{n!} dy,\\&=\left( e^{-\lambda }-1 \right) ^{-1}\lambda ^{(1+r)}\theta ^{r}\beta \sum _{n=0} ^{\infty }\sum _{j=0}^{r}\left( \begin{array}{c} r \\ j \end{array}\right) \frac{\lambda ^{(1+n)}(-1)^{n+r-j}}{n!}\int _{1}^{\infty }y^{j-\beta (1+n)-1} dy,\\&= \left( e^{-\lambda }-1 \right) ^{-1}\beta \lambda ^{(1+r)}\theta ^{r}\sum _{n=0}^{\infty }\sum _{j=0}^{r}\left( \begin{array}{c} r \\ j \end{array}\right) \frac{\lambda ^{(1+n)}(-1)^{n+r-j+1}}{n![j-\beta (1+n)]}, \end{aligned}$$

since the integral is with respect to y, it integrates only \(y^{j-\beta (1+n)-1}\) and the rest remain constant.

The moment generating function of the ILUP can be obtained by using the last result of \(\mu _r^{'}\) in \(M_{x}\left( t\right)\) as:

$$\begin{aligned} M_{x}\left( t\right) =\sum _{m=0}^{\infty } \dfrac{t^{m}}{m!}\mu _r^{'}. \end{aligned}$$
(14)

Hence, the moment generating function of the ILUP can easily be obtained from Eq. (14). The next section deals with estimation and simulation study.

Mean deviation, Bonferroni and Lorenz curves

Let X \(\sim\) ILUP(\(\pmb {\nu }\)), the mean deviation about the mean and the median are defined by

$$\begin{aligned}\Delta _{1}(X)=\int _{0}^{+\infty }\mid x-\mu \mid f_{ILUP}(x)dx, \end{aligned}$$

and

$$\begin{aligned}\Delta _{2}(X)=\int _{0}^{+\infty }\mid x-M \mid f_{ILUP}(x)dx, \end{aligned}$$

respectively, where x \(\in {{\mathbb {R}}}\), \(\mu =E(X)\) and \(M= Median(X)\) denotes the median.

These can further be expressed as:

$$\begin{aligned} \Delta _{1}(X)=2\mu F_{ILUP}(\mu )-2\int _{0}^{\mu }x f_{ILUP}(x)dx, \end{aligned}$$

and

$$\begin{aligned} \Delta _{2}(X)=\mu -2\int _{0}^{M}x f_{ILUP}(x)dx, \end{aligned}$$

respectively. As the result, the first incomplete moment is given by \(m(x)=\int _{0}^{z}x f_{ILUP}(x)dx\). These measures have been applied to a wide variety of fields, such as reliability, demography, insurance, and medicine (Oluyede et al. 2018).

Moreover, let X \(\sim\) ILUP(\(\pmb {\nu }\)), the Bonferroni and Lorenz curves are defined by:

$$\begin{aligned} B(p)=\frac{1}{p\mu }\int _{0}^{q}x f_{ILUP}(x)dx, \end{aligned}$$

and

$$\begin{aligned} L(p)=\frac{1}{\mu }\int _{0}^{q}x f_{ILUP}(x)dx, \end{aligned}$$

respectively, where \(\mu =E(X)\) and \(q=F^{-1}_{ILUP}(p)\).

The next section deals with estimation and simulation study.

Estimation and simulation study

In this section, the maximum likelihood estimation method and the simulation study for different settings are discussed.

Maximum likelihood estimation

This section deals with the computation of maximum likelihood estimators(MLEs) for the model parameters of ILUP. Let \(x_{1},x_{2}, \ldots , x_{n}\) be observed values of a random sample drawn from the ILUP with parameters \(\lambda ,\beta\), and \(\theta\). Given the PDF of the ILUP in Eq. (10) and the total likelihood function

$$\begin{aligned} L\left( \pmb {\nu }\mid x_{1}, x_{2},\ldots , x_{n}\right)&=(\lambda ^{2}\beta \theta )^{n} \prod _{i=1}^{n}x_{i}^{-2}\prod _{i=1}^{n}\left( 1+\frac{\lambda \theta }{x_{i}}\right) ^{-(\beta +1)}\\&\quad \times e^{-\lambda \sum _{i=1}^{n}\left( 1+\frac{\lambda \theta }{x_{i}} \right) ^{-\beta }}(1-e^{-\lambda })^{-n}, \end{aligned}$$

the log-likelihood function (\(\log L(x;\lambda ,\beta ,\theta )\)) is given by:

$$\begin{aligned} \log L\left( x;\lambda ,\beta ,\theta \right)&=n\log (\lambda ^{2}\beta \theta ) -2\sum _{i=1}^{n}\log (x_{i})-(1+\beta )\sum _{i=1}^{n}\log \left( 1+\frac{\lambda \theta }{x_{i}}\right) \\&\quad -\lambda \sum _{i=1}^{n}\left( 1+\frac{\lambda \theta }{x_{i}}\right) ^{-\beta }-n\log (1-e^{-\lambda }). \end{aligned}$$

The model parameters can be estimated by taking the first partial derivative of the \(\log L(x;\lambda ,\beta ,\theta )\) with respect to each model parameters and equating to zero.

Having \(\log L(x;\lambda ,\beta ,\theta )\), the partial derivatives of it with respect to each parameters are given as:

$$\begin{aligned} \dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \lambda }&=\frac{2n}{\lambda }-\sum _{i=1}^{n}\left( \frac{\theta (1+\beta )}{x_{i}+\lambda \theta } \right) +\sum _{i=1}^{n}\left\{ \frac{\lambda \beta \theta }{x_{i}}\left( 1+\frac{\lambda \theta }{x_{i}}\right) ^{-(\beta +1)}-\left( 1+\frac{\lambda \theta }{x_{i}} \right) ^{-\beta }\right\} \nonumber \\&\quad -\frac{n e^{-\lambda }}{1-e^{-\lambda }}, \end{aligned}$$
$$\begin{aligned} \dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \beta }= \frac{n}{\beta }-\sum _{i=1}^{n}log\left( 1+\frac{\lambda \theta }{x_{i}}\right) +\lambda \beta \sum _{i=1}^{n}\log \left( 1+\frac{\lambda \theta }{x_{i}}\right) \log \left[ \lambda \sum _{i=1}^{n}\left( 1+\frac{\lambda \theta }{x_{i}}\right) \right] , \end{aligned}$$

and

$$\begin{aligned} \dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \theta }= \frac{n}{\theta }-(1+\beta )\sum _{i=1}^{n}\frac{\lambda }{x_{i}} +\lambda ^{2}\beta \sum _{i=1}^{n}\frac{1}{x_{i}}\left( 1+\frac{\lambda \theta }{x_{i}}\right) ^{-(\beta +1)}, \end{aligned}$$

respectively.

Afterwards, the MLEs of the parameters \(\lambda , \beta\), and \(\theta\) can be obtained by solving the non-linear equation

$$\begin{aligned} \pmb {U_{n}}=\left( \dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \lambda },\dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \beta }, \dfrac{\partial \log L(x;\lambda ,\beta ,\theta )}{\partial \theta }\right) ^{T}=0, \end{aligned}$$

using numerical methods like Newton–Raphson or Broyden’s methods.

Simulation study

In this section, we evaluate the performance of MLEs for a fixed sample size n. An empirical calculation is carried out to study the capability of MLEs for the ILUP model. The estimation of estimates is made based on the afterwards quantities for each sample size (n). The estimators, biases and the practical mean square errors (MSEs) are performed using the R software program. The empirical phases are itemized as below:

  1. i.

    A sequence of random sample \(X_{1}, X_{2},\ldots , X_{n}\) of sizes; n=25, 175,..., 800 and 1000. Thus, a total of 40 random sequences of samples are drawn and these random samples are considered for the computation of the quantities. The samples are generated from the ILUP distribution by using inversion method.

  2. ii.

    Four scenarios are considered for the three parameters of the proposed model to evaluate the MLEs for each parameter and sample size iteratively.

  3. iii.

    There are 1000 times (repetitions) done to compute the bias and the MSEs for each parameters.

  4. iv.

    The computational formula for bias and MSEs is given as follows based on the formulae for the mean and variance of the parameters.

    $$\begin{aligned} & {\hat{\eta }}=\dfrac{1}{1000}\sum _{i=1}^{1000}\eta _{i},\\ & Bias({\hat{\eta }}) =\hat{\eta _{i}}-\eta ,\\ & Var({\hat{\eta }})=\dfrac{1}{1000}\sum _{i=1}^{1000} \left( \eta _{i}- \eta \right) ^{2}, \end{aligned}$$

    and

    $$\begin{aligned} MSE({\hat{\eta }})=var({\hat{\eta }})+(Bias({\hat{\eta }}))^{2}, \end{aligned}$$

where \({\hat{\eta }}=({\hat{\lambda }},{\hat{\beta }},{\hat{\theta }})\).

The numerical results of the simulation study for the different simulation study cases are displayed in Table 1 and the plots for parameter estimates, MSEs, absolute bias and bias are presented in the “Supplementary material”. From these results, it can be observed that the parameters values estimated are quite steady and are close to the true parameter values for the sample gets increased. And there is a vivid sight that as the sample size increases, the error gets minimized and it validates what is expected to be.

Table 1 Summary of simulation study for different simulation settings.

An application to breast cancer patients data

Breast cancer is one of the most severe diseases in the world and become the public’s every day’s agenda in both developed and developing countries. The new data on time-to-recovery of 686 breast cancer patients were taken from patient’s medical record cards that were enrolled from October 2012 to April 2017 in Nigist Elleni Mohamad memorial referral comprehensive hospital (NEMMRCH), Hossana, south Ethiopia.

We illustrate the fitting capacity of the ILUP model to the data by comparing it to the below six competing models.

The CDFs of the competitor models for comparison with the newly proposed models are given below.

  1. i.

    complementary exponentiated burrxii poisson (CEBP)22,

    $$\begin{aligned} F\left( x;\alpha ,\lambda ,\beta ,\theta \right) =\frac{e^{\lambda \left( 1-(1+x^{\alpha })^{-\beta } \right) ^{\theta }}-1}{e^{\lambda }-1}, x>0. \end{aligned}$$
  2. ii.

    Pareto Poisson7,

    $$\begin{aligned} F\left( x;\alpha ,\lambda \right) =\dfrac{1-e^{-\lambda \left[ 1-\frac{1}{(1+x)^{\alpha }}\right] }}{(1-e^{-\lambda })}, x>0. \end{aligned}$$
  3. iii.

    Poisson–Lomax23,

    $$\begin{aligned} F\left( x;\alpha ,\beta ,\lambda \right) =\frac{1-e^{-\lambda \left( 1+\beta x \right) ^{-\alpha }}}{1-e^{-\lambda }}, x>0. \end{aligned}$$
  4. iv.

    Marshall-Olkin Inverse Lomax (MO-IL)24,

    $$\begin{aligned} F\left( x;\alpha ,\beta ,\theta \right) =\dfrac{(1+\frac{\beta }{x}) ^{-\alpha }}{1-(1-\theta )[1-(1+\frac{\beta }{x})^{-\alpha }]}, x>0. \end{aligned}$$
  5. v.

    Exponentiated Weibull-Poisson25,

    $$\begin{aligned} F\left( x;\alpha ,\beta ,\gamma ,\theta \right) =\frac{e^{\theta \left( 1-e^{-(\beta x)^{\gamma }}\right) ^{\alpha }}-1}{e^{\theta }-1}. \end{aligned}$$
  6. vi.

    Complementary Weibull Poisson22,

    $$\begin{aligned} F\left( x;\alpha ,\beta ,\gamma ,\theta \right) =\frac{e^{\theta \left( 1-e^{-(\beta x)^{\gamma }}\right) }-1}{e^{\theta }-1}. \end{aligned}$$
  7. vii.

    Inverse Lomax Weibull26,

    $$\begin{aligned} F_{ILW}(x)=\left( 1+\frac{\beta }{e^{\lambda x^{\theta }}-1}\right) ^{-\alpha }, x>0. \end{aligned}$$

The information criteria (IC) such as (i) AIC27, (ii) CAIC28, (iii) BIC29, and (iv) HQIC30 are used to discriminate the best model. In addition to the IC, the other goodness of fit (g-o-f) measures such as (i) Cramer-Von-Messes (CM) test statistic, (ii) Hannan-Quinn information criterion (HQIC) test statistic, and (iii) Kolmogorov-Smirnov (KS) test statistics are also considered. In addition to these criteria, the log-likelihood (\(-2\log L\)) of the fitted models is also calculated. In all these, the model with the least value is taken to be the best model to fit the data.

The MLE of the parameters with their corresponding standard errors and the model adequacy measures for the fitted models are given in Tables 2 and 3, respectively.

Table 2 MLE estimates of the parameters and the corresponding standard errors (SE. in the parentheses) for the fitted models

Table 2 displays the MLEs and standard errors of the ILUP model along with the six competing models (ILW, EWP, PoL, Weib, CEBP, ILU, MO-IL, and PaP).

Table 3 Model adequacy measures for the fitted models

Table 3 gives the model comparison result (model adequacy measures) for all models considered in this section. The newly proposed model ILUP, based on the six model comparison criteria, is shown to be the best-performing model among the six competing models. This shows the newly proposed model outperforms the set of similar competing models and is applicable to the health, biomedical, and biological sectors data.

The next section discusses the joint modelling of the repeatedly measured outcome and the time-to-event outcome.

Joint models for repeatedly measured and time-to-event data

In this section we deal with joint modelling of the longitudinal and survival processes applied to the practical real data set.

Introduction

Although Cox proportional hazard (PH) model is widely applicable to analyze survival data, there stay reasonably limited probability distributions for the time-to-event data that can be used with these models. In these conditions, the accelerated failure time (AFT) model is a substitute to the Cox PH model for the analysis of time-to-event data. Beneath AFT models we quantity the straight consequence of the covariates on the time-to-event in its place of hazard, which like is done for Cox PH model. Both Cox PH and AFT models designate the connection amongst the lifetime probabilities and a set of predictor variables but neither of them takes into account the unmeasured variability among subjects beyond that of measured covariates31. This suggests an implementation of the longitudinal approach to the see unseen parts by the survival models.

Longitudinal data refers to the repeated measurement of data by a group of individuals in time or space order. Each individual is observed multiple times at different times or under different test conditions, and the obtained data has the characteristics of time series and cross-section data. In many practical studies, participants were followed up repeatedly to obtain longitudinal data. The event time data refers to the time elapsed from a certain starting point to the occurrence of the event32.

Recently, many researchers are interested in joint modelling of the repeatedly measured and time-to-event processes. There are several approaches to jointly model these two effects. The most commonly used approach is a two-stage approach. However, this method is approved by several researches to result to a biased estimates33,34,35,36. The time cost of joint model is another difficult problem to solve. By minding these constraints, many researchers have developed many R packages to reduce this burden. Rizopoulos37 introduced two R packages JM and JMbayes, respectively for the maximum and Bayesian estimations.

Model Formulation

To formulate the joint model effect, some steps (based on Rizopoulos37) are considered as follows:

1. Suppose that \(c_{i}(t)\) is known, true and an observed value of measurements of serum bilirubin at any time t in years, for instance. Then let us define the standard proportional hazards model, for simplicity; which further can be extended.

$$\begin{aligned} h_{i}\left( t\mid C_{i}(t) \right) =h_{0}(t) e^{\gamma ^{T}w_{i}+\alpha c_{i}(t)}, \end{aligned}$$
(15)

where \(C_{i}(t)={c_{i}(s),0\le s<t }\) collection of longitudinal history, say serum bilirubin history, \(\alpha\) describes the effect of repeatedly measured outcome on the hazard of the event. Let us say that it quantifies the effect of the measurements of serum bilirubin on the hazard for the death of the patient.

2. Use the observed repeatedly measured outcomes \(y_{i}(t)\) and derive the covariate history \(C_{i}(t)\) for each patient. Now just the time-dependent mixed-effect model can be written as

$$\begin{aligned} y_{i}\left( t\mid \pmb {b}_{i}\right)&= c_{i}(t)+\varepsilon _{i}(t) \nonumber \\ &=\pmb {x}^{T}_{i}(t)\pmb {\xi }+\pmb {z}^{T}_{i}(t)\pmb {b}_{i}+\varepsilon _{i}(t), \end{aligned}$$
(16)

where \(\varepsilon _{i}(t)\sim N(0,\sigma ^{2}),\) \(\pmb {x}^{T}_{i}(t)\) and \(\pmb {\xi }\) are the vector of covariates and parameters, respectively, for the fixed-effect part, while \(\pmb {z}^{T}_{i}(t)\) and \(\pmb {b}_{i}\) are the vector of covariates and parameters, respectively, for random-effect part.

3. Make an association between the two processes, Eqs. (14) and (15), to construct a joint distribution for them. The joint distribution for these two processes can be written as follows

$$\begin{aligned} f( y_{i},T_{i},\delta _{i},\pmb {b}_{i})=\int f( y_{i}\mid \pmb {b}_{i}){\Bigg \lbrace h_{i}( T_{i}\mid \pmb {b}_{i})^{\delta _{i}} S( T_{i}\mid \pmb {b}_{i}) \Bigg \rbrace } f(\pmb {b}_{i})d\pmb {b}_{i}, \end{aligned}$$
(17)

where \(y_{i}\) is a repeatedly measured outcome(longitudinal), \(T_{i}\) is a survival time observation, \(\delta _{i}\) is an event indicator, \(\pmb {b}_{i}\) is a non-scalar form of erratic effects that explains the interdependencies(is common for both processes), f(.) is a density function for a longitudinal part, S(.) is a survival function for the time-to-event part37.

The underlying assumption is noted as full conditional independence of the outcomes. Thus, given the random effects \(\pmb {b}_{i}\), the longitudinal outcome is independent of the time-to-event outcome (Kalema, 2014).

Description of the PBC2 data

In this section a primary biliary cirrhosis data on 312 subjects is used. As stated by Chen38, the data consist of longitudinal covariates and a time-to-event outcome, survival time, recorded in a randomized trial conducted by Mayo Clinic39. It is available in R40 package JMbayes41. The original purpose of the data was to study the effect of the drug D-penicillamine (DPCA) on treating primary biliary cirrhosis by comparing the survival time of subjects who were treated with those who were not.

The data includes 3 time-fixed and 13 time-varying covariates. Time- fixed covariates were age in years at the beginning of the trial (numeric), sex (binary) and the treatment indicator (binary). Time-varying covariates were recorded at every visit, and includes measurements of serum bilirubin in milligram per deciliter (numeric), serum cholesterol in milligram per deciliter (numeric), albumin in milligram per deciliter (numeric), alkaline phosphatase in units per liter (integer), serum glutamic-oxaloacetic transaminase (SGOT) in units per milliliter (numeric), platelets per cubic milliliter per thousand (integer), prothrombin time in seconds (numeric), histologic stage of disease (4-level); and whether the patient had ascites (binary), hepatomegaly (binary), spiders (binary), edema (3-levels), the time of each visit in years (numeric). The response variable is the time between registration and earlier of death, transplantation, or censoring in years (numeric) and the status indicator (3-level), for survival process, where as serum bilirubin in milligram per deciliter is for longitudinal process.

General descriptive results for the data and its exploration

The median survival time of a primary biliary cirrhosis data with 95% CI is 13.9 [11.2, 13.9] years. The mean serum bilirubin in milligram per deciliter is 4.21. The mean serum bilirubin by sex are 4.77 and 3.52 milligram per deciliter, respectively, male and female. The mean serum bilirubin by histologic stage of the disease for levels 1–4, respectively are 1.02, 2.01, 2.98, and 4.82 milligram per deciliter. The mean and median age of the patients are 48.87 and 49.26 years.

Figure 3
figure 3

The individual profile plot of serum bilirubin by age for the PBC2 data set.

The above Fig. 3 depicts the individual profile plot of serum bilirubin for the PBC2 data set. The figure has a trend that the serum bilirubin is increasing-decreasing over the age of the patient. But, this trend is not stable and it is more pronounced in females than males, as shown in Fig. 4 below.

Figure 4
figure 4

The individual variance plot of serum bilirubin by sex for the PBC2 data set.

The above Fig. 4 elicits the individual variation of serum bilirubin by sex for the PBC2 data set. The variation of serum bilirubin for females (red color circles) is higher than that of males (blue color triangles). The serum bilirubin for females is more erratic than males. The variation at the beginning is smaller than the variation at the end for both groups. Thus, the within-subject specific and between-subject specific variation is high.

Figure 5
figure 5

The mean plot of serum bilirubin for the PBC2 data set.

The mean plot of serum bilirubin is illustrated in Fig. 5. The figure shows that there is a clear linear pattern serum bilirubin. This suggests the linear fixed effect, not quadratic or other forms, has to be included in the model.

Figure 6
figure 6

Kaplan–Meier estimate of time-to-death of biliary cirrhosis disease patients by sex for the PBC2 data set.

The comparison of survival curves by using Kaplan-Meier estimate of time-to-death of biliary cirrhosis disease patients by sex for the PBC2 data set is given in Fig. 6. The vertical axis is labled by the survival function and the horizontal axes is by baseline age in years. The lifetime curve of female patients (red line) is somehow upper than the male patients curve. Thus, female patients have higher survival time than the male patients. Furthermore, the curves are not parallel to each other. This indicates the proportional hazards assumption has been violated. Hence, the Weibull parametric AFT model is used in the further analysis.

Separate analysis of longitudinal process

In this sub-section, the longitudinal (linear mixed model) is built by selecting the more appropriate estimation method for this data.

Table 4 Model comparison based on estimation methods for separate longitudinal process.

The model comparison based on estimation methods for separate longitudinal process is given in Table 4. The comparison is based on the maximum likelihood (ML) and restricted maximum likelihood (REML) estimation methods. Based on the model comparison techniques, the ML estimation method displays good estimates and chosen as a better method for further analysis.

Table 5 Summary linear mixed model result based on the ML estimation method for the separate longitudinal process.

The summary linear mixed model result for separate longitudinal process is given in Table 5. The table displays the longitudinal relationship among the measurements of serum bilirubin in milligram per deciliter and the associated covariates. Among the five covariates entered in the linear model, year or the time of each visit in years, and albumin in milligram per deciliter including the intercept contribute a significant longitudinal effect on the measurements of serum bilirubin. The rest three covariates including sex have no significant longitudinal effect on the measurements of serum bilirubin.

Separate analysis of survival process

In this sub-section, a separate AFT survival model is built by selecting the best model among the five AFT models (Exponential, Weibull, Loglogistic, Loggaussian, Lognormal) based on the model comparison techniques.

Table 6 Comparison of AFT models.

Based on Table 6, the AFT models comparison result shows that the Weibull AFT model has the least comparison technique values compared to the four AFT models. As the result, the Weibull AFT is the best selected AFT model and the following analysis is based on it.

Table 7 Summary result for separate Weibull accelerated failure time model.

The fitted model for the survival process is given by:

$$\begin{aligned} \lambda (t \mid {\textbf {X}})= \lambda _{0}(t) e^{\sum _{i=0}^{k}\xi _{i}x_{i}}, \end{aligned}$$

where \(i=0, 1,\dots , k\) indicates the number of covariates in the model. Hence,

$$\begin{aligned} \lambda (t \mid {\textbf {X}})= \lambda _{0}(t) e^{-0.717* \ sexfemale -1.344* \ albumin+0.570* \ histologic}. \end{aligned}$$

From the above Table 7 and the fitted model, we see that the covariates sex(male), histologic stage of disease, and albumin in milligram per deciliter are the significant contributors for the survival response variable time in years-to-death of the patients, while the covariate D-penicillamine, as indicated in other results, has no effect on the response variable. The 95% is given for the exponent of the coefficients (exp(coef) or exp(\(\xi _{i}\))), not for the coefficients themselves.

Joint model analysis

The joint analysis result is given below. The below Table 8 summarizes the combined model of repeatedly measured and time-to-event processes.

Table 8 Summary result for joint model.

Table 8 displays the summary result of the combined model of the repeatedly measured and time-to-event effects of the PBC2 data set. The result is obtained from the R function jointModel(). From this joint model result, it is seen that the intercept has a positive association with the longitudinal and survival responses. The covariates sex of the patient, drug D-penicillamine, and histologic stage of disease have no significant effect on the longitudinal process both in separate and joint analysis. In the contrary, the covariates albumin − 0.168 [95% CI − 0.226, − 0.109] and year 0.195 [95% CI 0.161, 0.230] have a significant effect on the longitudinal response both in separate and joint analyses like the intercept. The covariates sex 0.550 [95% CI 0.185, 0.915] of the patient (unlike the longitudinal process), histologic stage of disease − 0.419 [95% CI − 0.614, 0.225] (unlike the longitudinal process), and albumin − 0.168 [95% CI 0.371, 1.052] have a significant effect on the survival response both in separate and joint analyses like the intercept. In addition, the association factors assoct − 0.762 [95% CI − 0.916, − 0.608] and log(shape) have a significant effect on the change in survival response in the joint analysis. The covariate drug D-penicillamine has no any effect on longitudinal and survival responses in the three analyses.

Concluding remarks

In this paper we have studied the combined or joint modelling effect of the repeatedly measured and time-to-event processes on the PBC2 dataset. Joint modelling is one of the special features among other features of data modelling like overdispersion, correlation, and zero-inflation. Here the data is analysed by using three analyses separately and jointly. The covariate drug D-penicillamine (DPCA) has no effect on the respective responses in all analyses.

Furthermore, a flexible new model called ILUP in the compound class of Poisson distributions is studied in this section. Some basic statistical properties were obtained and it is compared with six potential competitor models. Based on the all model comparison techniques, the ILUP model outperforms all six models. The practical capability of the model is studied on the practical real dataset and also the simulation study is conducted. The simulation study shows a good promise that the parameter estimates close the true parameter value as the sample increases. The newly proposed model has a good advantage to analyze the data coming from health, Biomedical, Biological, and the related sectors.

Multivariate extension of joint models for more than one longitudinal and survival processes in the Bayesian setting by conducting the sensitivity analysis will be a future direction for scholars in the same area (Supplementary material S1).