Key messages regarding feasibility

  • 1) What uncertainties existed regarding the feasibility?

    Not applicable

  • 2) What are the key feasibility findings?

    Not applicable

  • 3) What are the implications of the feasibility findings for the design of the main study?

    Not applicable

Introduction

Preliminary studies are commonly used to inform the design of clinical trials. In the past decade there has been an increasing emphasis on the importance of conducting preliminary trials prior to a definitive large-scale trial in order to increase efficiency and reduce research waste [1]. These studies are often called “pilot studies” or “feasibility studies” and have been found to be very effective in reducing research waste such as over-spending [2]. Although the terms “pilot studies” and “feasibility studies” are used interchangeably, there are some key differences. A study or trial can be labeled as “pilot” when it is a small-scale study conducted prior to the large-scale study, mimicking the design of the main study, and designed to test and refine a protocol (i.e., ensure recruitment protocols are efficient, provide training and experience in running randomization, treatments, and follow-up assessments). In contrast, feasibility studies are designed to evaluate whether a larger scale study could be performed and used to estimate important parameters required to design the main study (i.e., willingness of participants to be randomized, number of people eligible, response rates, follow-up rates, etc.) [3, 4].

Pilot and feasibility studies may use progression criteria to determine if a larger study is feasible. Progression criteria are one or more feasibility outcomes that must meet a pre-defined threshold for feasibility to be declared. They inform the decision to move forward to a larger trial, make modifications to the larger trial, or abandon altogether [5]. For example, investigators could determine that a larger trial is feasible if they are able to recruit 75% of the people that are approached. Progression criteria are insufficiently used in pilot studies [6, 7], despite the requirement to declare progression criteria in the CONSORT extension for pilot randomized controlled trials (RCTs) [3]. This creates challenges with how pilot studies are interpreted and how decisions are made with regards to a larger trial.

Pilot studies are particularly useful in HIV research due to the numerous challenges with recruiting and retaining participants, who may be experiencing social stigma and discrimination. Moreover, people living with HIV (PLWH) may belong to other minority groups associated with discrimination (i.e.; Black people, people who inject drugs [PWID] and men who have sex with men [MSM]) [2]. Considering the over-representation of intersectional discrimination in HIV studies, pilot studies would provide an invaluable service in determining potential recruitment challenges in these specific population groups. In a sample of 248 pilot studies in HIV research, the authors noted that pilot studies are increasingly being used [2]. However, several design, analysis, and reporting issues exist including limited use of progression criteria and lacking justifications for trial sample sizes [2].

Researchers may face challenges in defining feasibility outcomes and developing progression criteria due to the lack of empirical data on credible and reasonable thresholds for frequently used outcomes such as recruitment, compliance, and dropouts. There is limited guidance on how to set these thresholds. In a methodological study, only 28% of publications provided a rationale for their progression criteria [7]. Existing guidance cites the use of prevalence or incidence rates and pre-existing observational data for recruitment rates [8]. However, observational data may not always be available and, even if they are, they may not necessarily reflect estimates that would be true for a randomized trial. A potential solution to this issue is to summarize the estimates from completed full scale trials.

The purpose of this study is to inform the design of HIV clinical trials by providing credible evidence-based estimates to use in determining progression criteria thresholds when planning feasibility outcomes in HIV randomized clinical trials.

Methods

Data collection

We conducted a methodological study of HIV clinical trials indexed in the past 5 years (2017–2021) in the PubMed database using the following search strategy (LM):

  • ((((randomized controlled trial [pt]) OR (controlled clinical trial [pt]) OR (randomized [tiab]) OR (placebo [tiab]) OR (clinical trials as topic [mesh: noexp]) OR (randomly [tiab]) OR (trial [ti])) NOT (animals [mh] NOT humans [mh])) AND ((HIV) OR (human-immunodeficiency-virus) OR (human immunodeficiency virus)) NOT ((pilot [ti]) OR (feasibility [ti]) OR (protocol [ti])))

The results of our search were collected in EndNote reference manager. Reviewers working independently screened all the titles and abstracts for eligibility (LC, EA, MSU, ACJE, MCG, LS, TAJ, NR). To be eligible, a trial must include only people living with HIV individually randomized to any type of intervention. We excluded pilot or feasibility RCTs, trials with cluster randomization, trials in which participants were enrolled as couples (dyads) and trials published only as abstracts.

Data extraction

Full text articles were retrieved for potentially eligible articles and screened in duplicate. Data were extracted by one reviewer and verified by a second independent reviewer for quality control (LC, EA, MSU, ACJE, MCG, LS, TAJ, NR). The following data were extracted: basic bibliometric information (author name, author contact information, year of publication, and journal), country of origin, country’s income level (based on the World Bank Classification as high, upper middle, lower middle and low) [9], World Health Organization (WHO) region (Africa, Americas, Eastern Mediterranean, Europe, South East Asia, Western Pacific) [10], source of funding (industry, non-industry), trial duration in months, trial design (crossover, multi arm, factorial), follow-up duration, number of trial sites, use of medication (pharmacological versus non pharmacological), intervention type (Educational, Mobile health, Counselling, Electronic, Change in healthcare delivery, Incentives, Peer support, Psychotherapy, Outreach), population type known to be at higher risk of HIV infection and morbidity (Black people, MSM, women, youth, PWID, people in prisons, transgender people and children) [11, 12], comorbidities (tuberculosis, mental health, substance use, cancer). The following metrics were extracted from the CONSORT flow diagram, tables, or manuscript text: number of participants who were assessed for eligibility, recruited, randomized, who did not receive the intervention as planned, lost to follow-up, who discontinued intervention, and the number analyzed. Data extraction was conducted using DistillerSR (Evidence Partners, Ottawa, Canada).

Data analysis

We computed the following metrics as percentages:

  • Recruitment: number enrolled divided by the number approached or assessed

  • Randomization: number randomized divided by the number enrolled

  • Non-compliance: number who did not receive the intervention as planned divided by the number randomized

  • Lost to follow-up: number lost to follow-up divided by the number randomized

  • Discontinuation: number who discontinued the intervention divided by the number randomized

  • Proportion analyzed: number analyzed divided by the number randomized

The analysis was performed in StataCorp. 2021. Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC. These proportions were pooled using random effects models. We used the Freeman-Tukey double arcsine transformation to stabilize the variances. The weighted pooled estimates were then back transformed, and using these transformed values and their variances, the pooled estimates were computed using the inverse variance method. Based on the binomial distribution, the exact 95% confidence intervals (CI) were calculated using the Clopper-Pearson approach. We conducted subgroup analyses based on the use of medication, intervention type, study design, country income level, WHO region, participant type, participant co-morbidities, and source of funding. These data are meant to be descriptive and therefore no interaction analyses were conducted. We also conducted a sensitivity analysis for the studies that reported on all the metrics. The number of studies, pooled estimates, and 95% confidence intervals (CI) are reported. Inferences for subgroups are made only when there are at least two studies.

Results

Our search retrieved 2122 articles of which 83 were duplicates. Of these articles, 701 were deemed relevant after title and abstract screening. After full text screening, we included 394 articles. The flow of study selection is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of study selection

About half of the included trials were of pharmaceutical interventions (212; 53.8%). The largest group of trials involved changes in healthcare delivery, such as changes in the number of pills, home-based care, and task-shifting (182; 46.2%). Seventy-nine (20.1%) were multi-arm trials. The majority were conducted in high income countries (164; 42.6%) and in the Africa region (127; 32.2%). The largest group of people studied were women (65; 16.5%) followed by Black people (39; 10.0%). The most common comorbidity studied was substance use (34; 8.6%). Most trials were non-industry funded (300; 76.1%). Two thirds (66.3%) were multicenter trials with a median number of sites of 3 (quartile 1: quartile 3; 1:7). The mean (standard deviation) duration of follow-up was 11.7 (9.2) months.

These results are summarized in Table 1.

Table 1 Characteristics of included studies

Recruitment

One hundred and fifty-six studies (156) had sufficient data to compute recruitment. The overall recruitment rate was 64.1% (95% CI 57.7 to 70.3). The lowest recruitment rate was in the trials of participants with mental health comorbidities (42.9; 95% CI 22.9 to 64.3; 8 trials) and the highest in trials conducted in more than one WHO region (80.2% 95% CI 73.1 to 86.4; 20 trials).

Randomization

One hundred and eighty-seven studies (187) had sufficient data to compute randomization. The overall randomization rate was 97.1 (95% CI 95.8 to 98.3). The lowest randomization rate was observed in the trials that used incentives as the intervention (86.8; 95% CI 54.5 to 100.0; 8 trials), and the highest was in the trials conducted in more than one WHO region (99.9; 95% CI 99.7 to 100.0; 27 trials).

Non-compliance

Two-hundred and sixteen studies (216) had sufficient data to compute non-compliance. The overall non-compliance was 3.8% (95% CI 2.8 to 4.9). The lowest non-compliance was in factorial trials (0.5; 95% CI 0.0 to 1.6; 7 trials), and the highest non-compliance was in trials with a psychotherapy intervention (16.1%; 95% CI 5.9 to 30.0; 16 trials).

Lost to follow-up

Two hundred and fifty-one studies (n = 251) had sufficient data to compute lost to follow-up. The overall lost to follow-up was 5.8% (95% CI 4.9 to 6.8). The lowest lost to follow-up was in the trials conducted with industry funding (1.8%, 95% CI 1.1 to 2.7; 34 trials), and the highest lost to follow-up was in the trials with an educational intervention (15.0%, 95% CI 10.9 to 19.6; 29 trials).

Discontinuation

Two hundred and fifteen (215) trials had sufficient data to compute discontinuation. The overall discontinuation was 6.5% (95% CI 5.5 to 7.5). The lowest discontinuation was in the trials conducted in South East Asia region (0.6%, 95% CI 0.0 to 2.5; 8 trials), and the highest discontinuation was in the trials with patients who had cancer (16.1; 95% CI 13.2 to 19.2; 2 trials).

Analyzed

Three hundred and sixty-seven (367) trials had sufficient data to estimate the proportion analyzed. The overall proportion analyzed was 94.2% (95% CI 92.9 to 95.3). The lowest proportion analyzed was in the studies with an electronic intervention (89.0; 95% CI 81.9 to 94.6; 15 trials), and the highest proportion analyzed was in studies with transgender people (99.6; 95% CI 98.8 to 100.0; 2 trials).

All the results are summarized in Table 2.

Table 2 Summary of estimates for feasibility outcomes

In our sensitivity analyses, 62 studies reported data on all the outcomes with the following estimates for recruitment (66.9%; 95% CI 58.5 to 74.8), randomization (97.3%; 95% CI 95.1 to 98.9), non-compliance (3.2%; 95% CI 1.3 to 5.6), lost to follow-up (4.9%; 95% CI 3.3 to 6.7), discontinuation (5.0%; 95% CI 3.4 to 6.9), and proportion analyzed (95.8%; 95% CI 93.7 to 97.5).

Discussion

In this methodological study, we have provided empirical data to use in determining progression criteria thresholds when planning feasibility outcomes in HIV pilot randomized trials. We have also demonstrated that these estimates may vary based on the use of medication in the trials, the type of intervention, study design, income level of the countries in which the trial is conducted, region of the world, type of participants included, the comorbidities they may have, and the source of funding.

This is the first study of its kind to provide estimates intended to inform the design of pilot and feasibility trials in HIV. The estimates and their confidence intervals can be used for sample size calculations for feasibility outcomes and to set thresholds for feasibility. For example, in a study of an electronic intervention, the investigators can expect a lost to follow-up of 11.0%, which may be as low as 4.4% or as high as 20.1%. Likewise, for a non-pharmacological intervention, an investigator could estimate the sample size required to attain a recruitment rate of 59.3% with a margin of error about 16.5% wide.

Many of our findings are not surprising. It is reasonable to expect challenges in recruiting people with mental health issues. Other studies have highlighted these concerns and proposed solutions in the broader population [13] and for specific co-existing conditions [14].

In principle, if the study is carefully explained to participants, few enrolled participants would withdraw from the study before randomization. While randomization was generally high, one could speculate that in studies that used incentives, participants may have viewed a 50% chance of receiving the intervention unfavorably and chose to withdraw. Run-in periods might be an effective strategy to identify participants who are likely to drop out if they are used appropriately [15]. Alternatively, investigators could identify the factors linked to pre-randomization withdrawals in the pilot trials and take measures to address them in the design of the larger trial [16].

Lower non-compliance in factorial trials, as we found, is not unexpected given that participants in factorial trials experience a higher burden especially if they are randomized to more than one active treatment [17]. It is possible that the 7 trials included in these analyses had other characteristics that may have enhanced compliance. Other studies have reported low compliance with psychotherapy interventions, albeit in fragmented population groups. For example, in one systematic review, the authors report on adherence to online psychological interventions [18]. In another, compliance is investigated only in group interventions in patients with psychosis [19]. In another systematic review, compliance is explored from the therapists' perspective for children and adolescents [20].

With regard to loss to follow-up, other studies have found that industry-funded studies may be methodologically different from others [21, 22]. This may be linked to the level of resources available and which may be deployed in this case to enhance follow-up. Educational interventions may require more engagement from participants and therefore be more inconveniencing and challenging to accommodate in their broader lives [23], leading to higher rates of loss to follow-up.

Discontinuation was low in trials from certain regions. This may have to do with local factors such as proximity to the health facility or rural dwelling, which have been shown to be linked with discontinuation [24, 25]. Discontinuation may also be high in people with HIV who also have cancer owing to the higher burden of disease, burden of treatment, and risk of death before the trial end date [26].

Studies using electronic interventions had the lowest number of people analyzed. This may be because of challenges in ascertaining participants status (given the virtual nature of the interventions) and difficulties in determining the causes of missing data. In this context, discontinuation may be related to non-usage of the electronic device precluding further meaningful participation in the trial. In skin cancer prevention research, dropout rates are higher for digital interventions than others [27].

Region-specific differences in outcomes are not uncommon in methodological research but are sometimes challenging to explain. We found high recruitment and randomization rates in studies conducted in more than one (mixed) WHO region. Larger multicenter and multi-country studies are likely to have more resources including access to methodologists and the means to ensure higher participation in trials. If conducting a trial across multiple sites or countries is indicative of study size, the literature suggests that larger studies are reported more clearly [21, 22, 28] and may have additional methodological strengths. We also found the lowest discontinuation in the South East Asia region. The implications of this finding are unclear.

There are several caveats to the use of these data. First, the availability of data was not uniform across studies and therefore not all studies contributed to all the estimates. However, we conducted a sensitivity analysis pooling data from the 62 studies that contributed data to all the outcomes and found consistent results. The second caveat is that outcomes may have been defined differently, especially in the studies that did not display a CONSORT flow diagram, and in some instances, adjudication was required to determine if participants were lost to follow-up or had discontinued. Third, our measure of non-compliance does not capture the reason for non-compliance (it could be because the intervention was not delivered appropriately, the participants did not adhere to the intervention, or there were technical and logistic issues that precluded compliance). While the result is the same, the reasons may be of value to investigators of pilot and feasibility trials. Fourth, the numbers analyzed were extracted as reported by the authors and may reflect additional approaches used to ensure complete data, including imputation techniques. Fifth, some outcomes may be influenced by time. For example, it is possible that participants are more likely to drop out from or discontinue longer studies. We invite investigators to consider this as they use these data.

Conclusion

We have presented a large body of evidence on credible estimates for feasibility outcomes in HIV clinical trials and shown that key study characteristics may influence these estimates. These data should be used to inform the choice of thresholds for feasibility outcomes and the development of progression criteria in HIV pilot randomized trials.