FormalPara Key Points for Decision Makers

There is widespread non-compliance with agreed data requests following conditional NICE technology appraisal recommendations.

The majority of evidence submitted for re-appraisal was judged to be at serious, high, moderate or unclear risk of bias.

Quality standards ought to be stipulated in respect to evidence contributing to re-appraisals following conditional NICE approval recommendations.

1 Introduction

Pharmaceutical purchasing decisions have historically been based on prices set by their manufacturers and the evidence available at the time of application for reimbursement [1]. More recently, mechanisms for approving medicines conditional on further evidence being generated has gained traction as a mechanism for earlier patient access to new and sometimes innovative medicines [2,3,4,5,6,7]. Approaches to conditional approval include only in research (OiR) and only with research (OwR) recommendations; with the main distinction being that in OiR, coverage is limited to participants enrolled in research studies, whereas the whole eligible population is covered following an OwR recommendation, but with pre-specified data collected [1, 4, 8].

The National Institute for Health and Care Excellence (NICE) has always had the option to recommend health technologies OiR where there is uncertainty in clinical parameters (along with ‘recommended’, ‘optimised’ [that is, recommended for a sub-group of patients for whom the medicine is clinically indicated] and ‘not recommended’ appraisal decision outcomes [9]). OiR recommendations were more common in the early years of the NICE appraisal process, contrasting with OwR which has become more commonplace following provisions for performance-based risk-sharing arrangements in the Pharmaceutical Price Regulation Scheme and more recently the Voluntary Scheme for Branded Medicines Pricing and Access. Most notably, OwR recommendations have featured extensively in managed access agreements (MAAs) for Cancer Drugs Fund (CDF) medicines, a trend set to continue with the Innovative Medicines Fund for England. MAAs consist of two main components—a data collection arrangement to address clinical uncertainty and a commercial agreement that controls drug costs whilst data are collected [10]. Medicines included in the CDF or the Innovative Medicines Fund list are funded for a specified period and considered for routine commissioning upon review of additional data in a subsequent appraisal.

One source of concern about NICE’s conditional approval process is the quality and reliability of the methods of evidence generation, analysis and reporting [3, 11]. Morrell et al. [12] identified the most common sources of uncertainty among a sample of technology appraisals (TAs) of cancer medicines as being the immaturity of survival data and issues relating to comparators. They commented that these could not be readily resolved by real-world data collection where there are no ongoing trials to provide longer-term data. However, subsequent reviews of cancer medicines appraised by NICE found more extensive use of later follow-up data from clinical trials than from real-world data for addressing uncertainty related to survival estimates [13, 14].

A second area of concern is the increasing focus on OwR in preference to OiR. While the research following OiR recommendation is predominantly funded by manufacturers, OwR incurs a cost to the NHS. The relative value of both approaches ought to be established when conditional approval recommendations are made. Central to this is the need to consider explicitly the health gain foregone by patients denied access to a medicine until the research is completed (OiR) and the health gain foregone by other patients when a medicine is reimbursed that is subsequently found to be insufficiently effective (OwR) [15].

NICE’s conditional approval mechanisms facilitate patient access to specific categories of medicines based on immature evidence and lower certainty around their clinical and cost effectiveness. Providing access to ineffective treatments or treatments with questionable evidence imposes an opportunity cost in terms of impacting on access to medicines and services in other areas of the NHS [16]. The drive to expedite patient access has also led to a wider acceptance of inferior evidence based on non-randomised research [17]. While the collection of further evidence can reduce the uncertainty of re-appraisal decisions, the reversibility of such decisions is potentially limited, although in practice treatment would continue for responders, and not initiated for new patients.

Recognising the limitations of conditional approval, and the variable nature of the evidence available for re-appraisal, the aim of this study was to critically appraise the methods, quality and risk of bias of evidence generated in response to OiR and OwR recommendations by NICE in TAs and evaluations of highly specialised technologies (HSTs).

2 Methods

2.1 Data Sources

All medicines appraised by NICE between March 2000 and September 2020 were identified from a published list of TAs [18] and HSTs up to October 2023 [19]. The final appraisal determinations (or if unavailable, the assessment reports) for TAs and the final evaluation determinations for HSTs were reviewed for whether they were approved conditionally (OiR or OwR). Relevant committee papers and MAAs (where applicable) were retrieved from NICE’s website or, in the case of historical TAs (which had been deleted from the website), from the UK Web Archive online or the UK government National Web Archive, respectively (see Supplementary Table 1 in the electronic supplementary material [ESM]).

2.2 Inclusion Criteria

The review was limited to pharmacological interventions, which represent the majority of technologies with an MAA and all technologies on the CDF list. For the purposes of this review, TAs and HSTs were classified as either OiR or OwR based on the following definitions of Walker et al. [8]: “OiR—Coverage of a technology is available only to patients involved in research. OwR—A positive coverage decision is conditioned upon the collection of additional evidence to support continued, expanded or withdrawal of coverage.”

2.3 Exclusion Criteria

TAs and HSTs were excluded from the review if they were not recommended or had been terminated due to non-submission by the pharmaceutical company.

2.4 Data Extraction

A database was constructed in MS Excel to record details of each TA and HST, including the name of the medicine, clinical indication, NICE recommendation, CDF and MAA status and whether the recommendation was OiR or OwR. Data were extracted from NICE TA and HST guidances on the areas of uncertainty highlighted by the appraisal committees and the recommendations made for further research (study type and data sources). MAAs were also checked to identify any agreement or recommendation that may not have been mentioned in the TA or HST.

Medicines that were re-appraised after conditional approval were identified and the TAs or HSTs reviewed for reference to the new evidence generated. The original (X) and re-appraised (Y) TAs (HSTs) are referenced as TAX → TAY (HSTX → HSTY). Further information about the evidence was then identified from published journal articles and NICE committee papers. Data were subsequently extracted on the source of the clinical evidence (e.g. trials, audits, registries, experts), the research design (randomised controlled trials [RCTs], observational studies, opinion) and methods (randomisation, blinding, consideration of confounding, outcome reporting, data missingness and adherence to intervention). The principal findings of the additional research studies were summarised.

2.5 Analysis

The risk of bias from randomised evidence identified for re-appraised medicines was analysed using the Cochrane Collaboration’s tool for assessing risk of bias in randomised trials [20]. The tool considers six domains of bias: selection bias, performance bias, detection bias, attrition bias, reporting bias and other biases. Each domain of bias could be categorised as low risk, high risk, or unclear risk.

Non-randomised evidence was analysed using the Cochrane Collaboration’s Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool [21], which considers seven domains of bias: two pre-intervention (confounding, selection of participants), one at intervention (classification of intervention) and four post-intervention (deviations from intended interventions, missing data, measurement of outcomes and selection of the reported result). Each domain could be categorised as low, moderate, serious, or critical risk, or no information provided.

Risk of bias assessments were undertaken by YPP and reviewed by DAH.

3 Results

3.1 Included Appraisals

For the period of analysis, NICE conducted 651 TAs and made 968 individual recommendations. Six hundred TAs were of medicines, from which 88 were not recommended and 61 related to terminated appraisals. Of the remaining TAs, 53 were included for analysis. These made 54 recommendations—13 were OiR and 41 satisfied the criteria for classification as OwR. Additionally, some TAs were misclassified in NICE’s database, including TA070 which was listed as recommended, but with a condition of further data collection; TA397, which was listed as optimised and not highlighted as OiR; and TA588 which was an optimised recommendation, but linked to an MAA with conditions for further data collection.

NICE also conducted 28 HST evaluations, of which five were conditionally approved OwR. HST1 was classified as OwR because one of the conditions for approval included a research programme to evaluate when to stop the treatment, although it was the only HST conditionally approved without an MAA (see Fig. 1 and Supplementary Table 2 [in the ESM]).

Fig. 1
figure 1

Flow diagram of Technology Appraisals (TAs) and Highly Specialised Technologies (HSTs) included in the review

3.2 NICE Recommendations for Further Research

NICE specified the need to address uncertainty in clinical and cost effectiveness in all conditionally approved TAs and HSTs. Recommendations for further research included RCTs, observational studies, or a combination of both (Table 1). For example, among OiR recommendations, in TA093 the committee recommended obtaining data from well-designed (unspecified method) clinical studies and an audit, while in TA111, the committee recommended research, “preferably in the form of RCTs”, and an audit. For TAs with OwR recommendations, data were requested mainly from ongoing trials, and from the Systemic Anti-Cancer Therapy (SACT) and Blueteq registries as secondary sources of information. Only TA581 recommended a new RCT, linked to a post-marketing condition for approval by the European Medicines Agency. Recommendations for all conditionally approved HSTs included the collection of auditable data through specific databases. For instance, for HST1 and HST2, NICE recommended the creation of a national registry; for HST3, data would be collected via the NorthStart database; for HST6, through a disease-specific MAA database; while for HST12 it would be collected through the creation of an Excel SharePoint repository of registering patients.

Table 1 Study type and data sources

3.3 NICE Technology Appraisals (TAs) and Highly Specialised Technologies (HSTs) Subjected to Re-appraisal

Sixteen TAs that included 17 individual recommendations were subsequently re-appraised by NICE.

Nine TAs were originally recommended OiR (including TA193 with two individual recommendations) and of these, NICE requested RCT evidence for three TAs (TA033, TA037 and TA111) and further research from unspecified sources in six TAs (TA065, TA072, TA093, TA193 [for both indications], TA244 and TA467).

There were seven re-appraised TAs that were originally recommended OwR (TA070, TA416, TA446, TA447, TA465, TA472 and TA483). For these, NICE mainly requested evidence from ongoing RCTs, SACT and/or Blueteq databases, with the exception of TA070, which made reference to a national registry, an audit and “good quality studies”; and TA446 which referred to a retrospective, non-interventional trial with data from SACT and Blueteq.

Three HSTs were subjected to NICE re-appraisal. Following HST1, for example, NICE requested national monitoring systems to record the number of people diagnosed; and for HST2, NICE specified that the marketing authorisation holder had to create a patient registry accessible to NHS England.

3.3.1 Evidence Considered in Re-appraisal Following Only in Research (OiR) Recommendation

There were some inconsistencies in what research NICE requested in OiR recommendations and what was subsequently considered during re-appraisal. While two TAs complied fully with NICE’s request for further evidence (TA033 → TA093 and TA288 → TA418), four were only partially compliant (TA072 → CG79, TA093 → CG131, TA111 → TA217 and TA244 → TA461).

RCT evidence was provided in three instances following NICE recommendations (TA033 → TA093, TA111 → TA217 and TA244 → TA461), although RCT data were considered in three other re-appraisals: TA072 → CG79, TA093 → CG131 and TA288 → TA418 (Table 2). However, while TA461 included evidence from two RCTs, there were no data on the comparison of interest that NICE specified in TA244. NICE specified audit data along with additional studies in their OiR recommendations for TA065, TA072 and TA093, but evidence from audits were not available in any re-appraisals. TA217 did not include evidence requested in TA111 of how the technology contributed to health-related quality of life.

Table 2 Summary of the clinical and economic evidence (unreferenced evidence was extracted from NICE committee papers or final appraisal determinations) considered upon re-appraisal

No new evidence was presented in three re-appraisals (TA037 → TA137, TA065 and TA193). NICE commented in TA137 that it would be difficult to gather relevant information for the OiR indication in question because the medicine was approved as a third-line treatment. NICE decided TA065 was no longer relevant to clinical practice; and a review of TA193 did not find any new evidence.

3.3.2 Evidence Considered in Re-appraisal Following Only with Research (OwR) Recommendation

Of the seven TAs identified as OwR, four complied with NICE’s request for further evidence on re-appraisal (TA416 → TA653, TA446 → TA524, TA465 and TA483 → TA655) and three TAs complied partially (TA070 → TA241, TA447 → TA531 and TA472 → TA629). TA241 did not present the audit data requested and, while TA531 and TA629 included SACT data, this was immature or not considered by the committee to be sufficiently robust.

RCT evidence was available on re-appraisal for five TAs for which NICE requested such data following OwR recommendations (TA416 → TA653, TA447 → TA531, TA465, TA472 → TA629 and TA483 → TA655) (Table 2). TA653 included a survival analysis based on SACT data relating to CDF patients, in addition to the evidence from the two studies requested in TA416. For TA465, NICE requested data from an ongoing trial and from SACT, but the results from the trial indicated no survival advantage, and the TA was withdrawn. TA655 presented a combination of data from an RCT, follow-up data from a previous study and a cohort study.

NICE’s re-appraisal of TA070 in TA241 included RCT evidence in addition to non-randomised studies, but no audit. The re-appraisal of TA446 in TA524 provided the evidence requested and, in addition, a single-arm, open-label, multicentre study (Table 3).

Table 3 Risk of bias assessment for randomised evidence in updated TAs and HSTs

All re-appraised HSTs complied with NICE’s requested of further evidence. In HST6 → HST23, however, while the company presented descriptive MAA data on clinical outcomes, these were not initially analysed in accordance with NICE’s request (a comparison with registry data), on the basis that there were major baseline differences between populations. A subsequent analysis of the registry data, which compared treatment naïve and experienced patients, was redacted.

3.4 Risks of Bias in Evidence Presented in Re-appraisals

Among technology re-appraisals where new data were considered, TA461 and TA465 included evidence that was all judged to be of low risk of bias, two TAs (TA217 and TA418) included at least some evidence that was of low risk of bias and 11 TAs (TA093, TA217, TA241, TA418, TA524, TA531, TA629, TA653 and TA655, including evidence from CG79, CG131) contained evidence with a critical, serious, high, moderate or unclear risk of bias.

3.4.1 Randomised Evidence

Based on the Cochrane risk of bias tool, randomised studies were susceptible mostly to biases due to a lack of blinding of participants and researchers, or blinding of outcome assessment (see Fig. 2).

Fig. 2
figure 2

Risk of bias from (A) randomised controlled trials and (B) non-randomised studies, that were considered during re-appraisal of Technology Appraisals (TAs) and Highly Specialised Technologies (HSTs) with Only in Research (OiR) or Only with Research (OwR) conditional approvals

Eight RCTs reported in four TAs (three TAs previously recommend OiR and one TA previously recommended as OwR) were categorised as having low risk of bias in all domains: TA217 [41,42,43,44], TA418 [53], TA461 [50, 51] and TA465 [63].

Eight RCTs from four TAs (all previously recommended OiR) had at least one domain categorised as unclear: TA093 [22], CG79 [25,26,27,28], CG131 [29, 30] and TA418 [52]. Selection, performance and detection bias were identified as the three main sources of risk of bias that led to this classification.

Ten RCTs from seven TAs (two previously recommended OiR and five previously recommended OwR) were categorised in at least one domain as a high risk of bias: TA093 [23, 24], TA241 [54, 55], TA217 [39, 40], TA531 [62], TA629 [64], TA653 [59] and TA655 [65]. Performance, detection and other bias were identified as the domains that contributed the most to this categorisation.

Evidence presented in HSTs was categorised as unclear or high risk. For instance, trial MOR-004 [67] in HST2 had most domains as low risk of bias, but random sequence generation and allocation concealment were unclear. Trials MOR-05 [68] in HST2 and ENB-009-10 [72] in HST6 had more than two domains classified as high risk of bias.

3.4.2 Non-randomised Evidence

There were six re-appraised TAs with evidence from 22 non-randomised studies—primarily single-arm studies, with the exception of Atri et al. [46] and Lopez et al. [49] in TA217. The evidence came from a combination of studies, including, for example, a case series [56], a phase II single-arm study [60], three phase IV or post-marketing surveillance studies that used patient-reported outcomes [45, 48, 57], a real-world UK observational study (TA524) and evidence from SACT data (TA653 and TA655) (Table 4).

Table 4 Risk of bias assessment for non-randomised evidence in updated technology appraisals

Based on the ROBINS-I tool, the non-randomised studies were susceptible mostly to biases due to the selection of participants and to confounding (Fig. 2). One study (Aparicio et al. [36]) presented in CG131 (previously recommended OiR) had one domain classified as a critical risk of bias. Thirteen studies from five TAs (two previously recommended OiR and three OwR) presented evidence categorised as high risk of bias in at least one domain: TA241 [56,57,58], CG131 [31, 32, 35, 37], TA217 [45,46,47,48,49], TA524 (real-world observational study) and TA655 [66]. All the other evidence was categorised as a moderate risk of bias (one TA previously recommended OiR and three TAs OwR).

Evidence submitted in re-appraised HSTs were also categorised as moderate or serious risk of bias. All evidence but one (study ENB-003-08 [73] in HST6) was considered to have serious risk of bias in more than one domain.

3.5 NICE Recommendations Following Re-appraisal

There were nine TAs with OiR recommendation, of which three (TA037 → TA137, TA111 → TA217 and TA288 → TA418) were recommended in full for at least one indication. Two TAs were optimised (TA111 → TA217 and TA244 → TA461), four remained recommended OiR (TA033 → TA093, TA072 → CG79, TA093 → CG131 and TA193) and TA065 was withdrawn. Of the seven TAs with OwR recommendation, two TAs were subsequently recommended in full (TA472 → TA629 and TA447 → TA531), three TAs were optimised (TA416 → TA653, TA446 → TA524 and TA483 → TA655), one TA was not recommended (TA070 → TA241) and TA465 was withdrawn.

Of the six TAs that complied fully with NICE’s recommendations for further research (either OiR or OwR), one (TA418) was recommended in line with marketing authorisation, three TAs (TA524, TA653 and TA655) were recommended optimised, TA093 remained OiR and TA465 was withdrawn. Among the seven TAs that complied partially with NICE’s recommendations, TA531 and TA629 were recommended in line with marketing authorisation and TA461 was recommended optimised. TA217 received full recommendation for patients with severe disease but optimised for those with moderate disease. CG79 and CG131 remained OiR and TA241 was not recommended. Three TAs did not provide new evidence, but of those, TA137 was recommended, TA193 remained OiR for both indications and TA065 was withdrawn.

All HSTs were recommended after re-appraisal with HST19 and HST2 recommended according to marketing authorisation, while HST23 was recommended optimised.

4 Discussion

4.1 Statement of Principal Findings

Over the period of analysis, NICE conditionally recommended more medicines in technology appraisals OwR (41) than OiR (12). However, more TAs with OiR recommendations were subsequently re-appraised (9) than OwR recommendations (7). Upon re-appraisal, only six TAs (two originally with OiR and four with OwR recommendations) complied fully with NICE’s request for further evidence; three TAs did not comply.

The majority of re-appraised TAs included evidence that was deemed to be at serious, high, moderate or unclear risk of bias, with Aparicio et al. [36] in CG131 categorised as critical. Only eight randomised studies from four appraisals were supported by evidence that was judged to have low risk of bias in all domains. Reliance on observational studies and audits was commonplace among TAs that were subsequently re-appraised. However, the majority of these studies were judged as having high or critical risk of bias. The increasing use of SACT and Blueteq for real-world evidence of benefit from cancer and high-cost medicines is fraught with issues resulting from missing data and confounding, and which could be mitigated through well-designed RCTs.

All five conditionally approved HSTs were recommended OwR. Of these, three were re-appraised and were compliant with NICE’s request of further information. However, all the evidence submitted was categorised as unclear, moderate or high/serious risk of bias.

Overall, the quality of evidence submitted in response to OiR and OwR recommendations was poor.

4.2 Comparison with Other Studies

To our best knowledge, this is the first systematic study of the quality and biasedness of evidence resulting from NICE OiR and OwR recommendations. Other researchers, however, have (1) highlighted the challenges for health technology assessment in the face of higher clinical uncertainty relating to conditional marketing authorisation pathways [76]; (2) investigated the use of real-world data within single technology appraisals of cancer medicines [77] or as part of MAA associated with CDF medicines [12, 13] and (3) compared real-world data relating to targeted and non-targeted cancer therapies [14].

4.3 Strengths and Limitations

This review provides a comprehensive assessment of the nature and quality of evidence requested and subsequently evaluated by NICE following conditional recommendation. It benefits from the systematic application of two widely utilised tools for the determination of the risks of bias in randomised and observational data. Our findings should be of value in informing the application of the NICE real-world evidence framework [78].

One limitation of this review has been our reliance on the availability of documents on NICE’s website. Some superseded documents have been removed from the website, while many documents are highlighted as being commercially sensitive with text redacted [79]. The lack of transparency in aspects of the CDF has been noted previously [80], although some historical documents could be retrieved from UK web archives [81, 82]. Additionally, as NICE does not explicitly label OwR recommendations as such, assumptions were necessary in relation to those TAs for medicines conditionally recommended via processes other than OiR [8, 83,84,85]. A third limitation related to the application of tools used to assess the risk of bias being restricted to the information publicly available for those reviewed studies. This was compounded by analyses of SACT and Blueteq data not being available as publications in peer-reviewed journals. Finally, as our study was limited to NICE, its findings may not be fully generalisable to other jurisdictions.

4.4 Policy and Research Implications

There are four important policy implications to our research. Firstly, while OwR recommendations were more likely to result in re-appraisal, they rely heavily on data from potentially biased studies. The increasing use of non-randomised studies for providing evidence on clinical effectiveness is problematic, not least as confounding, selection and information bias can undermine the reliability of such data. The cost of incorrect decisions upon re-appraisal can be significant—there is a fallacy to expediting wider access to treatment through reliance on observational data. The limitations of non-randomised data are acknowledged in the NICE real-world evidence framework, which has the specific purpose and aim of improving the quality of real-world evidence to inform NICE’s guidance.

Secondly, non-compliance with NICE-requested evidence generation is widespread. The reasons for this were not apparent in most instances; however, it is concerning both that manufacturers do not always provide the evidence requested and that NICE recommendations are seemingly unaffected by this.

Thirdly, NICE has resisted using value of information methods to help select the specific design and sample size of a proposed study, as well as the value of OiR versus OwR [86]. Instead, NICE assesses the nature and methods of evidence generation (in respect to MAA) according to whether data collection and analyses are feasible in a reasonable timeframe, do not represent an unreasonable burden on patients and the NHS and are likely to support the case for a positive recommendation upon re-appraisal [87]. While conceptually attractive, value of information analyses require detailed specification of the decision problem and quantification of the associated uncertainty. This typically entails a level of complexity that may obfuscate, and a requirement for data that may not be reliably available—although a valid outcome from such an analysis would be evidence to inform the choice of whether a recommendation ought to be OiR or OwR.

Fourthly is the timeliness of data. Others have commented on the delays and costs in acquiring evidence following OiR recommendations—such as in the extreme example of treatments for multiple sclerosis [88]. Our review found evidence to be either immature or had been superseded by RCTs in some re-appraised TAs following OwR recommendations. While the duration of data collection is agreed within data collection agreements, it is well recognised that some medicines with MAA remain on the CDF for extended periods of time [80].

5 Conclusions

This review highlights important variation in the quality of evidence submitted in response to NICE conditional approval (OiR and OwR) recommendations. It further identifies non-compliance with agreed data requests and discusses the implications in respect to pharmaceutical policy. The increased reliance on real-world evidence raises concerns about the risks of incorrect decisions being made based on inaccurate data. This may be mitigated through more careful considerations of the limitations of observational evidence, and potentially pre-empted by conducting value of information analyses that could inform the nature and choice of additional data collection methods. As a minimum, quality standards ought to be stipulated in respect to evidence contributing to NICE re-appraisals following conditional approval recommendations.