Background

Many clinical trials have a primary outcome which is measured on a categorical scale. In these situations, the outcome categories are often ordered, either by a direct relationship to either a continuous measurement or a clinical score through pre-defined cut-points, or by a natural interpretation by the physician of a health condition or body reaction.

An overview and recent developments in the analysis of ordered categorical data have been presented in [14]. More specific developments for analyzing ordinal data in the context of meta-analysis of clinical trials have been proposed initially for summary statistics [5, 6], secondarily extended to meta-analysis with individual patient's data [7, 8].

The main objective of this paper was to adapt and implement the above-proposed methods in the specific context of malaria. A more specific objective was to study the effect of both different treatments and a set of covariates on the primary outcome.

In antimalarial trials, the WHO (World Health Organization) established a clinical study protocol [9, 10] to assess the efficacy of antimalarial drugs for uncomplicated malaria, in which the primary outcome is a four-category outcome, i) ACPR (adequate clinical and parasitological response); ii) LPF (late parasitological failure); iii) LCF (late clinical failure) and; iv) ETF (early treatment failure). Patient status reflects the ordinal character of the WHO criteria. Danger signs appearing on day 1, 2 or 3 with a persistent parasitaemia correspond to an ETF. The presence of fever and parasitaemia between day 4 and day 14 without an ETF is considered as LCF, whereas the presence of parasitaemia on day 14 without ETF and LCF is a LPF. ACPR corresponds to the absence of parasite regardless the body temperature on day 14 without being classified as either ETF, or LCF or a LPF previously. This suggests that there is an explicit order between the categories. A multi-centre randomized controlled trial of three antimalarial drugs, amodiaquine (AQ), sulfadoxine-pyrimethamine (SP) and their combination (AQ + SP), which was conducted in 2003 by OCEAC-IRD in patients with uncomplicated P. falciparum malaria from 3 health centers in Cameroon, according to the WHO protocol [9, 10] will be used to illustrate the statistical approach.

Briefly, the 3 centers were selected as they were known to be associated with different levels of transmission. Affected children, 0-5 years old, were the target population, according to the WHO recommendations for assessing antimalarial drug efficacy. The primary outcome of efficacy was the four category-variable: C1 = ACPR, C2 = LPF, C3 = LCF, and C4 = ETF, the first category being the best category and the last category being the worst, in terms of health condition. Table 1 describes the results for each arm of anti-malarial drug, in the three centers. The majority of patient outcomes were assigned in the first category, with very few treatment outcomes in the other categories, corresponding to various degrees of therapeutic failures. The protocol was approved by both the Ministry of Public Health and the Ethical National Committee. The person legally in charge of the children gave an informed consent before inclusion in the study. Study investigators were OCEAC field workers. A global analysis of the data, based on the rate of ACPR can be found in [11].

Table 1 Counts of treatment response by category stratified in all centers and treatments.

Methods

Statistical methods

The proportional odds model

The statistical approach is based on the proportional odds model [12]. Consider the K = 3 centers. Each patient had a response into one of the m = 4 categories, (C 1, .., C m ), with n i patients in center i. Let be the total number of patients overall.

Denote Z the response and π ijc , the probability that the jth subject in study i had a response in category c; π ijc = Pr(Z ij = c). Let also Q ijc = Pr(Z ij c) be the cumulative probability of a response in category c or better, so that Q ijc = π ij 1 + ... + π ijc and Q ijm = 1; c = 1, ..., m-1. Hence c = m corresponding to the worst outcome (here an ETF) is the baseline level. The proportional odds model is defined by

(1)

where α c represents the cth intercept and η ij = γ 1 x 1 ij + ...+ γ p x pij , is a linear combination of explanatory variables. The model assumes 1) an increasing order for the α c : α 1 < α 2 < ... < α m-1; 2) α 0 = -∞ and α m = ∞.

The assumption of proportional odds signifies that the ratio of the odds of the event Z ij c for any pair of sets of explanatory variables is independent of the category c. The cumulative logit model (1) results from the existence of a latent random variable with a logistic distribution, the (α c , c = 0, ..., m) corresponding to the cut-off thresholds for categorizing the latent variable into m classes. Alternatives to the proportional odds model can be developed based on different assumptions regarding the latent variable, like a normal- or an extreme value distribution associated with the probit or complementary log- log links, respectively.

Fixed effect model

The basic model extends the proportional odds model (PO) proposed in [7] to a situation with L treatments (L > 2). For the sake of simplicity, the presentation will be restricted to the situation of L = 3 treatments corresponding to the real data set but the extension to more than 3 treatments is straightforward. Assuming that one particular treatment (here, AQ) was selected as the reference treatment, the treatment covariate is modeled using L-1 = 2 dummy variables T 1 and T 2, where T l (l = 1, L-1) yields 1 if subject received the treatment l, and 0 otherwise (here, l = 1 corresponds to SP, l = 2 corresponds to AQ + SP)

(2)

where {j = 1, ..., n i ,} corresponds to the individuals within each arm, {i = 1, .., K}, corresponds to the centers and {c = 1, ..., m-1} to the categories. The parameter γ 1 represents the log-odds ratio of efficacy associated to treatment 1 versus the reference treatment, and γ 2 the log-odds ratio of efficacy associated to treatment 2 versus the reference treatment.

The center effects are modeled by γ 0i where γ 01 = 0 for identifiability constraint.

Testing the proportional odds assumptions

It is worth noting that model (2) assumes that, within each center and for each outcome category c, (c = 1, .., m-1), there is a common log-odds ratio, γ 1 for treatment l versus the reference treatment. Similarly, model (2) assumes also that, within each treatment group, and for each outcome category c there is a common log-odds ratio, γ 0i - γ 0i , between two centers i and i', independent of the outcome category.

Additional models were introduced to test the PO assumption. Model (3) allows for testing a global PO between treatments, without a center effect:

(3)

However, each treatment can have a different effect according to the outcome category.

Assuming a global PO for T 1, the model (4) allows for testing PO between T 2 and the reference treatment

(4)

where δ hc = 1 if h = c and 0, otherwise.

Similarly, the model (5) assumes a global PO for T 2, and allows for testing PO between T1 and the reference treatment (here AQ)

(5)
Mixed effects model

Discarding a center effect, and still assuming proportional odds between treatments, 2 random effects were added to the treatment variables.

where: β 1i = γ 1 + δ 1i and β 2i = γ 2 + δ 2i . Both δ 1i and δ 2i are normally distributed random effects with expected value 0 and variance , respectively.

By grouping separately the fixed and the random-effects, the model can be rewritten as follows:

(6)

where , where G', the transposed matrix of G, needs to be estimated from the data.

Estimation methods

The parameters of the fixed effects models were estimated using three different methods, i.e. maximum likelihood (ML), iterative generalized least squares (IGLS) and generalized estimating equations (GEE) [7]. In the IGLS and GEE methods, the ordinal response was considered as a set of correlated binary responses. Thus the proportional odds model is viewed as a member of the multivariate generalized linear models [13].

All these methods were implemented under the free Statistical Software R http://cran.r-project.org/. The various codes for the ML, GEE and IGLS methods are in the additional file 1. Models (3), (4) and (5) were implemented for the test of the proportional odds assumption between centers and treatments. The parameters of the random effects model were estimated using a Bayesian approach for hierarchical models. In the Bayesian paradigm, prior distributions for all unknown parameters θ need to be specified. The intercept α c were given non-informative N(0,104) priors. Parameters β 1i and β 2i were assumed to be from N(γ 1, σ 1 2) and N(γ 2, σ 2 2) respectively. We gave γ 1, γ 2 non-informative N(0,10-6). The difficulty in the Bayesian paradigm is the choice of the a priori distribution for the inverse of the variances σ 1 2 and σ 2 2. Several forms of the a priori are available. We chose an a priori of the form Gamma(0.5η, 0.5ηζ 2), where η and ζ 2 are either arbitrary chosen or calibrated from the data. Here we chose η = 2. The initial estimated values of ζ 2 (0,007, 0,001) were obtained from our data, by adjusting a generalized linear mixed model. So 1/σ 1 2 and 1/σ 2 2 were given Gamma(1,0.007) and Gamma(1,0.001), respectively. All the Bayesian analyses were performed using 3 chains of 50000 samples, out of which the first 25000 were removed to allow for convergence. Thus, each posterior statistic was based on 75000 samples. Finally the 95% credibility interval was obtained for each statistic.

Simulations

To assess the performance of the fixed effect methods, we carried out analyses on simulated data based on model (6). The aim was to compare the methods when the treatment covariate was taken into consideration and random effects were introduced to account for center variability. We generated a data set of 3 centers, each containing 100 patients, comparing 3 treatments A, B, C, with equal size. We supposed that the treatment effects were the same in the 3 centers, and were set to be γ 1 = -1 for B comparing to A and, γ 2 = 1 for C comparing to A, respectively. Initial values for α c were α 01 = -2, α 02 = -1, α 03 = 0. The initial value of the matrix G' was a diagonal matrix with diagonal elements σ 1 = 0.01 and σ 1 = 0.02, corresponding to the between center deviation.

The choice of the treatment initial values was guided by preliminary results from real data. The choice for the cut-point values was derived from [14].

Cumulative probabilities Q ijc were then obtained using model (6). For a subject j in study i, with vector of cumulative probabilities (0, Q ij1, Q ij2, Q ij3, 1), the simulated outcome was generated using the quartile range category, to which an uniformly (0,1) distributed random value belonged. To check that the conclusions were not sensitive to the choice of the starting values, we repeated the simulations, for each method, with different initial values. An example of the simulated data set can be found in additional file 2.

For each method, bias and confidence levels were estimated through N 1000 = replications. For each estimation method, the treatment effect was assessed using the calculated coverage probability and the type I error.

In the Bayesian analysis, model (6) was implemented using the Bayesian Software WinBUGS. A program is provided in the additional file 3, showing how model (6) can be written in BUGS language.

Results

Simulated data set

Results are summarized in Table 2. The ML-, IGLS- and GEE- methods yielded similar results. For the detection bias, the ML estimate appeared less biased than the two other methods. Considering their coverage probabilities (CP = 1- β), all the three approaches appeared to be able to detect a significant treatment effect with a given type I error α.

Table 2 Simulation results.

Analysis of the Multi-centre trial

Results based on treatment and center effect only

Among the n = 538 patients from the three centers, 19 patients were lost to follow-up by day 14 and hence could not contribute to the analyses.

Results from the randomized trial are displayed in Table 3 with a general view of the fixed and the Bayesian approach. All the methods showed a significant decrease of the SP effect as compared with AQ. The SP treatment appeared less efficacious than AQ, by all the three fixed methods (ML: OR = e -1.93= 0.14; 95% CI 0.04-0.50; IGLS: OR = e -1.77= 0.17; 95% CI 0.04-0.58; GEE: OR = e -1.80= 0.16; 95% CI 0.04-0.57). The combination AQ-SP was not significantly different by the three approaches. The additional file 4 shows how practically these fixed estimates can be obtained with an example of the different output given in additional file 5.

Table 3 summary of parameter estimates using both the fixed effect models and the bayesian approach in the anti-malarial trial.

The between center variability was

= 0.023 (95% credibility interval 0.001-0.107) for the SP/AQ treatment, and

= 0.020 (95% credibility interval 0.0001-0.024) for the AS+SP/AQ treatment. The assumption of proportional odds between treatments and centers was tested using the ML method by comparing the change in deviance of the various models (Table 4) to the chi-squared χ 1-α 2 (2 df) which equals 5.99. The results indicate that the assumption was satisfactory.

Table 4 Test of the proportional odds between centers and treatments on the anti-malarial trial data set.

Results after covariates adjustment

Covariates known at the time of inclusion were the parasitaemia, the type of treatment, the gender, the patient's age, the bodyweight (kg), a previous treatment before the study enrollment, the haematocrit and the baseline body temperature. The patient's bodyweight was log-transformed; the initial parasitaemia counts were transformed to normality using a Box-Cox transformation (estimated power = 0.1028, p-value = 0.0056). The baseline haematocrit was categorized into two classes, below (A) and above (B) 25%. The patient's age was also categorized into 3 categories: i) less than 1 year old (A); ii) between 2 and 4 years old (B); iii) more than 5 years old (C).

First, a univariate analysis including each covariate of interest, in addition to the treatment covariates and the centre covariate was performed. Then only covariates with a p-value under 10% were selected for a multivariate analysis. Only the baseline temperature reached a statistical significance (Table 5). Multivariate analysis did not change the magnitude of the treatment effects.

Table 5 Effect of individual covariates on the ordinal outcome.

As all the estimated intercept values were strictly positive, the assumption of a logistic distribution for the latent variable could be replaced with an extreme value distribution, using a model with a complementary log-log (cloglog) link function [2]. In Table 6, the corresponding results show that the intercept value-estimates were reduced as compared to the corresponding ones using the logistic distribution. However, the treatment parameter estimates did not change although the interpretation is not strictly the same. Indeed, as pointed out by Agresti [2], trying a complementary log-log link function is equivalent to fitting a proportional hazards model. Hence regarding the comparison of the SP treatment to the reference treatment AQ, the parameter e -0.502 = 0.605 (95% CI 0.44-0.82) is interpreted as a hazards ratio which is assumed constant regardless the response category. It suggests that the proportion of patients having positive response was more decreased with SP than with AQ. Between AQSP and AQ, the hazards ratio was 1.40 (95% CI 0.88-2.18), suggesting that there was no difference in efficacy.

Table 6 Model fitting with the complementary log-log link function and results comparisons.

Discussion

The present work applied methods, initially presented in the context of a meta-analysis comparing 2 treatment effects with a categorical outcome, to the analysis of multi-center-antimalarial trials, comparing 3 treatments, with a 4-categorical primary outcome, using individual patient data. All the methods were implemented using free available softwares. Their performances were assessed using simulated data sets. Both a frequentist and Bayesian approaches were implemented. In the simulation settings, we considered a general case of a mixed effects model. The ML method was faster than the IGLS and GEE methods. All the methods produced minimal biases. Another R function can be used like lrm (Hmisc library using the library Design) which is more restricted at present in the choice of the link functions.

Our results from the multi-centre trial showed a significant decrease in efficacy in the sulfadoxine-pyrimethamine-arm as compared with the amodiaquine-arm. This means that there was less ACPR and more failures with SP. No significant difference was observed between amodiaquine and the combination AQ + SP. Similar results hold for both the frequentist and Bayesian approaches. The proportional odds assumption between treatments was not rejected. The proposed model can be a useful way when dealing with a small number of centers with more than 2 treatments.

Regarding the dependence between the outcome and the explanatory variables, a recent study [15] reported patient's age as a significant factor associated with antimalarial treatment failure, more specifically with artemisinin derivatives and quinine. Indeed, the study showed that children less than 5 years old had a significant risk of treatment failure as compared with children more than 5 years. In our analysis, no significant age effect was found, but our data concerned mostly children less than 5 years. Covariates, like haematocrit, gender, auto-medication before study enrollment, were not significant. In contrast, the baseline temperature was significantly associated with the outcome, showing that patients with a low temperature at inclusion were more likely to have a good treatment response. For the SP treatment, the decrease could be related to a gene mutation, responsible for a resistance towards SP [1618]. The SP resistance was also confirmed in a recent study in South Africa [19]. The emergence of resistance led to the use of Artemisinin-based Combination Therapies (ACTs) which play an important role in the reduction of malaria transmission [20]. Thus, ACTs are widely used as first-line treatment of malaria in Africa [2128].

The present work presented some statistical methods to handle the WHO criteria globally. They appear adapted for assessing antimalarial treatment in clinical trials, or in other studies with an ordinal outcome. It contrasts with previous methods [15], which reduced the outcome into a binary variable, for instance ACPR versus non ACPR. Only 3.5% were missing data. By eliminating drop-out or excluded patients in the final analysis, we assumed that they were missing completely at random (MCAR). Incorporating these into the analysis would require an imputation of the missing category. For such imputation, wide-range methods have been studied [29], but remain a topic to study in the case of malaria.

Other study designs are presently available in which patients have repeated standardized visits during their follow-up. Thus repeated categorical responses are available for each subject. All methods used in the case of a single categorical response, can however, be extended to repeated categorical responses [30].

Conclusion

The approach of modeling the WHO criterion as an ordinal criterion is innovative in the sense that, it gives a gain of information and precision on the estimate of the treatment effect using a global analysis of the four-category outcome instead of either comparing the rate of ACPR only, without being able to differentiate treatments when they present an identical percentages in the ACPR category but, different repartitions in the remaining three other categories, or repeating a similar binary analysis for each category, then exposing to the increase of the type-1 error. This analysis could help in avoiding a conclusion on an equal treatment effect whereas there is not or, vice-versa. In addition, this approach is able to take into account individual's covariates to improve the prediction of efficacy.