2.1 Development of the EQ-VT Protocol

Over the past 25 years, approaches taken to the valuation of EQ-5D-3L have not changed much from those used in Dolan (1997). While issues had been noted in regard to valuing the EQ-5D-3L, the desire to produce new EQ-5D-3L value sets using the same approaches as previously lessened the impetus for change. The introduction of the EQ-5D-5L offered an opportunity to explore how methods for health state valuation could be improved to produce an updated valuation protocol (Oppe et al. 2014). To arrive at a protocol that could be supported broadly, the initial development – and later refinement – of that protocol coincided with an extensive programme of methodological research within the EuroQol Group. This chapter summarises the research that was undertaken, the results that underpinned the initial version of the EQ-5D-5L valuation protocol and later modifications, and the main lessons learned from the international EQ-5D-5L valuation work.

While the research programme had a broader scope, the focus was on two different methods to elicit preferences for health states, time trade-off (TTO) and discrete choice experiments (DCEs). TTO had emerged as the first method of choice in earlier valuation studies, and the introduction of the EQ-5D-5L did not change that. Yet, concerns had been expressed about extremely low values that could be produced for states worse than dead (WTD), requiring arbitrary rescaling (Janssen et al. 2013) and therefore refinement of the TTO method was pursued within the research programme. Lead time TTO (LT-TTO) had been identified as a possible TTO approach that could mitigate issues valuing states WTD (Robinson and Spencer 2006; Tilling et al. 2010; Devlin et al. 2011) and therefore the relative merits of that approach were explored (Attema and Versteegh 2013; Devlin et al. 2013; Versteegh et al. 2013). DCE was at that time recognised as a promising new method for health state valuation (e.g. Salomon 2003; McCabe et al. 2006), and having become more widely used in other aspects of health economics (Ryan 2004). DCE was, therefore the second focus of the research programme and was studied both as a potential alternative to TTO and as a complement. DCE has the benefit of having a generally simpler task compared with TTO, requiring simple choices rather than completion of an iterative process, with potential significant benefits for data collection. Questions around the way to collect and model DCE-data therefore also were addressed. Based on the desire to replace the props used in TTO interviews (e.g. TTO board) by computers and develop a computerised TTO procedure, all tasks were integrated into a digital aid (the EuroQol-Valuation Technology, EQ-VT), which was developed in conjunction with the protocol. As a result, the protocol is commonly referred to as the EQ-VT protocol.

We will not cover all findings of the research programme in this chapter. However, several findings require particular highlighting:

  1. 1.

    while the conducted research on LT-TTO produced ample proof of concept for the use of LT-TTO in health state valuation, values for states better than dead (BTD) seemed to be subject to a downward bias. Therefore, composite TTO (cTTO) was introduced (Janssen et al. 2013), which uses conventional TTO for the valuation of states BTD, and LT-TTO for states WTD;

  2. 2.

    while implementation of the protocol in EQ-VT software allows, in theory, for the tasks to be self-completed, such an approach to administration leads to significant data quality issues; therefore, computer-assisted personal interviews (CAPI) remained the preferred mode of administration (Shah et al. 2013);

  3. 3.

    DCE tasks that produce values on a latent scale are straightforward to generate, but many open questions remain regarding tasks that can produce DCE values anchored at the full health-dead scale (Norman et al. 2016).

Further results obtained in the methodological research programme have been documented in 19 journal articles. Oppe et al. (2014) described how those results supported the development of the EQ-VT protocol version 1.0.

2.2 Description of the EQ-VT Protocol

2.2.1 Contents of the Protocol

From its origins in 2012, some elements of the EQ-VT protocol have evolved but the overall structure has been retained, comprising the following six parts:

  1. 1.

    general welcome,

  2. 2.

    self-reported health using EQ-5D-5L and background questions,

  3. 3.

    introduction to the cTTO valuation task,

  4. 4.

    health state valuation using cTTO,

  5. 5.

    health state valuation using DCE,

  6. 6.

    general thank you and goodbye.

After a general welcome and explanation of the purpose of the study, self-reported health as measured by the EQ-5D-5L including EQ VAS and background questions regarding age, gender and experience with illness are asked. The third section then introduces respondents to the cTTO valuation tasks (see Figs. 2.1a and 2.1b). The interviewer uses the example health state “being in a wheelchair” to explain how to interpret and carry out the cTTO tasks. After cTTO task understanding is confirmed, respondents move on to value ten EQ-5D-5L health states and answer three debriefing questions regarding the cTTO tasks. In the next part of the interview, the interviewer explains how to carry out the DCE. The respondents are asked to complete seven forced-choice paired comparisons of EQ-5D-5L health states without a “duration” attribute (see Fig. 2.2), meaning the choice is simply between two EQ-5D-5L health states independent of time. Following this, respondents answer three debriefing questions regarding the DCE tasks. In the concluding part of the interview, the respondents can leave feedback and are thanked for their participation.

Fig. 2.1a
figure 1

Presentation of the composite time trade-off used in the EQ-VT protocol: better than dead task. (© 2021 EuroQol Research Foundation. Reprinted with permission)

Fig. 2.1b
figure 2

Presentation of the composite time trade-off used in the EQ-VT protocol: worse than dead task. (© 2021 EuroQol Research Foundation. Reprinted with permission)

Fig. 2.2
figure 3

Presentation of a discrete choice experiment task used in the EQ-VT protocol. (© 2021 EuroQol Research Foundation. Reprinted with permission)

The cTTO approach begins with the ‘conventional’ TTO with the first question being ten years in the health state being valued versus ten years in full health (see Fig. 2.1a), and only shifts to an LT-TTO when the respondent considers the health state to be WTD. In that case, the following LT-TTO task involves a twenty-year time frame: ten years of lead time followed by ten years in the EQ-5D-5L health state being valued (see Fig. 2.1b). The resulting cTTO values range from −1 (trading all of the lead time) to 1 (trading no years in full health) in 0.05 increments. The exact iteration scheme is reported elsewhere (Oppe et al. 2016). The underlying experimental design including the health state selection for both the cTTO and DCE tasks, and other study design considerations such as sample size requirements, is addressed in Chap. 3.

To ensure that respondents can give valid and meaningful responses during the cTTO task, they first get the opportunity to experience the cTTO task by completing the wheelchair example and are made aware that they will be asked to evaluate a set of other health states in the same way. After that – still in the wheelchair example – they learn, amongst other things, how their responses will be interpreted, what the range of possible answers is, how the task proceeds in a slightly different way when a state WTD is encountered, and how they need to interpret the health states. Delivering these instructions is challenging for the interviewer, as most are not prompted on screen, the task is difficult for some respondents, and the interview needs to be completed in a standardised and neutral way. Furthermore, a high level of task engagement is expected from the respondent, and this depends on the level of engagement demonstrated by the interviewer and the quality of interactions with the respondent. Since the wheelchair example is the point in the interview where all of this needs to be discussed, this section is the key to the successful implementation of the EQ-VT protocol; especially the cTTO part.

2.2.2 Why the cTTO Task was Adopted

Concern with the way in which values for states WTD were produced in EQ-5D-3L value sets motivated much of the research carried out to develop the new protocol. It is well known that a standard TTO task, by contrasting a ten-year life in a disease state to a shorter life of t years in full health, can only produce positive values. In this task the value x of the disease state is given by t/10 at the point where the respondent is indifferent between the options. Since lifespans cannot have a negative value, t cannot be negative and so only values for x in the [0, 1] range can be observed. If respondents indicated that they preferred immediate death over living for ten years in the disease state, a modified task was offered inviting the respondents to compare a health profile including t years in full health followed by 10−t years in ill health to immediate death. Here the value x of health is given by −t/(10−t). As the difference between t and 10 can be made infinitely small (e.g. counted in years, months, weeks, days or smaller units), the value of this negative ratio statistic can become extremely large. To counteract the effect on mean values, an arbitrary transformation was applied to bound the negative values at −1. Various options to transform the data have been proposed, however, the choice between them remained arbitrary but could affect the results substantially.

LT-TTO offers – in theory – a unified approach for the valuation of states BTD and WTD. As in standard TTO, respondents consider how good or bad it would be to spend ten years in a state of impaired health. However, the period of impaired health does not start now, but starts ten years from now so that the total remaining lifespan is 20 years. This is compared to a life that has t years in full health and the duration t is varied between 0 and 20 to identify indifference. The value x of the disease state can be computed by solving 10+10x=t which gives a positive value for all t>10 and a negative value for all t<10.Footnote 1 However, due to presence of a bias, described below, the cTTO approach was preferred over LT-TTO.

Regarding LT-TTO, larger lead times ceteris paribus extend the range of negative values that can be observed, but higher bounds on the maximum hypothetical lifespan and lower bounds on the size of the trade-off unit also need to be considered. Key findings were that values for states BTD seemed to be affected by a downward bias in the LT-TTO task, and larger ratios of lead time to disease time amplified this problem. A possible explanation is that respondents considered what portion of their remaining years to trade off without recognising that trading into the lead time implied a WTD response. Therefore, the decision was made to use standard TTO in the valuation of states BTD and only adopt LT-TTO for the valuation of WTD states. Consistent with previous valuation studies, the standard TTO was again specified with a ten-year time frame and in the LT-TTO frame a lead time of ten years was offered to even out the changes in value associated with the trade-off unit (years) in the BTD and WTD size of the scale. The name ‘composite’ TTO was adopted as the name for the TTO protocol adopting standard TTO for the valuation of states BTD and lead time TTO for states WTD.

2.2.3 Why the DCE Task was Adopted

In most EQ-5D-3L valuation studies, respondents received multiple valuation tasks, of increasing complexity and from the start it was assumed that the EQ-VT protocol would also include at least two types of stated preference tasks. But which tasks?

The non-standardised protocols for EQ-5D-3L valuation (see Chap. 1) supplied researchers with rank, VAS (visual analogue scale) valuation and TTO responses. At the discretion of the study teams, the collected data were used in various ways. In the early years, both VAS- and TTO-based value sets were developed while the ranking task was seen as a useful precursor. Gradually, however, the views on these methods started to shift. TTO became the method of first choice and the use of VAS valuation started to decline. Alternatives to VAS valuation were considered for the EQ-VT protocol. At the same time, the underused potential of rank data started to be recognised (e.g. Salomon 2003; Craig and Busschbach 2009; Craig et al. 2009). In the EQ-VT protocol, the ranking and VAS valuation tasks eventually were displaced by DCE. This method is akin to a ranking task.

There were several reasons to choose DCE. For example, one reason for including DCE was because of the different nature of the instrument being valued, i.e., the EQ-5D-5L, rather than the EQ-5D-3L. The subtler differences between levels – especially at the mild end of the descriptive system – meant some people might not be willing to trade off any life years; whereas the DCE could still obtain preferences between mild states. Furthermore, DCEs were widely recognised as a promising new method for valuing health and shown to be feasible for EQ-5D (Stolk et al. 2010). Lastly, a DCE task can be set up in different ways and depending on the chosen configuration, it can produce values (a) on a latent scale or (b) values directly anchored on the QALY (Quality Adjusted Life Year) scale if either the attribute “duration” or the alternative “dead” is included in the DCE (Norman et al. 2016). In the latter case, DCEs yield values that can have the same cardinal measurement properties as TTO, but with a more straightforward and less costly data collection process. Anticipating future developments, it was also considered important to include DCE (instead of VAS or rank) now, to familiarise more researchers with the DCE method and promote learning.

The DCE task included in the EQ-5D-5L was a basic one, comparing two EQ-5D-5L health states without reference to lifespan, i.e., number of years lived in each state. Methodological research that guided this decision had suggested that this basic approach produces robust results, whereas the approaches that could produce values on the QALY scale initially suffered from unexplained high variability in the results, and researchers had different ideas about how to make these advanced tasks work. Therefore, it seemed unwise to push for a harmonised method when the protocol for EQ-5D-5L valuation was introduced. However, it was agreed to continue research about other DCE approaches and see if issues with those approaches could be resolved (for further discussion on this, see Chap. 7).

2.2.4 Value Set Generation

Subsequent to completion of the EQ-VT data collection protocol, value set generation can be based on either hybrid models that draw on both types of data at the same time (i.e., cTTO and DCE), or on cTTO data only. The DCE data cannot be used independently as a basis for value set generation because DCE values are estimated on a latent scale and lack the interpretation of health state values that are anchored at 0 (dead) and 1 (full health). The option to generate a value set based on two types of data has the benefit of providing extra assurance about the ability to construct a value set based on data collected in the valuation study.

While cTTO and DCE results provide two measures of the same construct, preferences for health, perfect agreement of cTTO and DCE results is not to be expected due to the differences between the methods:

  • results are derived from two different tasks that may evoke different respondent behaviour and can be subject to different biases;

  • the theoretical models used to infer health state values from observable responses differ;

  • values are estimated on a different scale.

Choices between methods for value set generation must reflect judgments about the relative merits of each method, given theoretical considerations and/or the properties of the empirical data. If the two data sources agree, that could be an argument to include all data deriving a value set with greater precision. Conversely, if there are discrepancies, it might be questioned which is the “correct” one and it might be considered problematic to combine the two data sources. However, the latter might be considered a fallacious argument, because there exists no gold standard against which the values derived from cTTO or DCE can be judged. Discrepancies therefore can also be looked at as providing complementary information.

As neither line of reasoning will be universally accepted, the EQ-VT protocol sets the frame for eliciting health state preferences and the local research team makes the decision about the way of value set generation (e.g. type of data included and modelling).

2.3 How the EQ-VT Protocol Updates Evolved

After the first wave of EQ-5D-5L valuation studies (Canada, China, England, Netherlands, and Spain) were completed using the new EQ-VT protocol, it became apparent that there was scope to improve on the first version of the protocol, especially by strengthening it on the implementation side. In some of those initial studies, issues with the cTTO data were observed such as strong clustering effects, limited coverage of the value range, and high number of inconsistencies.Footnote 2 The data issues seemed to reflect low levels of task engagement of the respondents and/or the interviewers, leading to detrimental effects on quality of cTTO valuations. It was recognised that these issues clustered in interviewers and were not universally present, leading to the hypothesis that the data issues represented interviewer effects. This motivated the development and integration of a procedure that would allow the data to be monitored in real time to detect the presence of any issue and to enable timely interventions: a quality control (QC) procedure (Ramos-Goñi et al. 2017). In addition, the introduction of three practice cTTO tasks following the wheelchair example and the inclusion of confirmatory pop-ups for each cTTO task to validate answers before storing led to EQ-VT version 1.1 (see Table 2.1).

Table 2.1 Overview of EQ-VT elements by protocol versions

In addition, a comprehensive EQ-VT research programme was launched to test a range of suggestions for strategies that could help to prevent the data quality issues and interviewer effects from occurring. Shah et al. (2015) described the studies (N=7) that were done aimed at remedying cTTO data issues and improving EQ-VT. All studies were set up as experiments with at least two arms, allowing results obtained from a modified version of the protocol (experimental arm) to be compared to the EQ-VT version 1.1 (control arm). The battery of tests included:

  • introduction of a ranking task for warm-up purposes,

  • comparison of whether the definition of the comparator state matters for the results (‘full health’ or state 11111, i.e., no problems on any dimension),

  • modified iteration procedures,

  • separation of the BTD and WTD task in cTTO,

  • presenting respondents with rank ordering implied by cTTO valuations (feedback module).

The collected data provided strong support for integration of a QC module, as it improved data quality markedly. It also supported implementation of the feedback module (see Fig. 2.3), since respondents frequently appreciated having the option to review and reconsider their own responses if needed. The other tested modifications did not produce clear benefits (Shah et al. 2015). Interviewer effects, clustering of cTTO values, and inconsistencies were strongly reduced in valuation studies that applied the updated EQ-VT protocol (Ramos-Goñi et al. 2017; Stolk et al. 2019). Guided by results obtained in this work, the EQ-VT has received two updates in 2013 (EQ-VT 1.1) and 2014 (EQ-VT 2.0). Later on, in 2017 one more update was implemented (EQ-VT 2.1), which altered the flow of the wheelchair example to include more prompts for interviewers (Stolk et al. 2019). Box 2.1 provides further details on the QC procedure, as implemented from protocol version 1.1 onwards.

Fig. 2.3
figure 4

Example of the feedback module used in the EQ-VT protocol since version 2.0. (© 2021 EuroQol Research Foundation. Reprinted with permission)

Box 2.1: QC Procedure Since EQ-VT Protocol 1.1

A QC procedure was introduced to monitor the interviewer’s protocol compliance and interviewer effects as well as the face validity of the data. By looking at four QC criteria it is possible to determine whether an individual interview is of “suspect” quality. If any of the four following criteria is met for an individual interview, it is flagged:

  1. 1.

    no explanation of the WTD task (LT-TTO) during the wheelchair example;

  2. 2.

    too short time period spent on the wheelchair example (i.e., less than three minutes);

  3. 3.

    clear inconsistency in the cTTO ratings (the worst possible health state 55555 is not the lowest and at least 0.5 higher than the state with the lowest value);

  4. 4.

    too short time period for the cTTO task (i.e., total time for the ten cTTO tasks less than five minutes).

Initial QC reports are used to evaluate whether interviewers met the minimum quality requirements. If 40% out of the ten interviews are flagged as being of suspect quality by using the QC tool provided by the EuroQol Executive Office, all interviews thus far conducted by that interviewer will be removed and the interviewer will be retrained. After further ten interviews, interviewer’s performance and compliance are re-evaluated. If again 40% or more interviews are flagged, these interviews will also be removed and the interviewer is removed from the interviewer team. A threshold value of 40% was selected because flagged interviews could hold genuine responses (e.g., respondents who quickly build their opinion and perform the cTTO tasks). Additionally, this allows interviewers to grow into their roles when they built up experience with valuation interviews following the EQ-VT protocol.

During the entire study, the local study team continuously monitors data quality. Later QC reports allow to reflect on interviewers’ performance, discuss possible improvements and intervene when the performance of an interviewer worsens.

2.4 Lessons Learned

The evidence from the valuation studies and the comprehensive EQ-VT research programme (Shah et al. 2015) led to increased awareness of how challenging the interview is, both for the respondent and the interviewer. Data issues driven by interviewer effects showed that the interviewer and his/her skills are pivotal in the success of the interview, especially for the cTTO tasks. The amount of guidance to respondents affects their engagement and task understanding, and thereby accuracy and reliability of responses. The DCE task may be more robust to interviewer effects, but it may also be that data issues are simply relatively more transparent in cTTO data. While technical aspects of the tasks were the key focus before the first valuation studies were launched, focus shifted to the human interaction side of the task after the first wave, which is equally important and clearly needed more attention.

The changes made to the protocol can be categorised into (a) monitoring of and providing support to the interviewers and (b) supporting the respondents. As outlined above, the introduction of the QC procedure since version 1.1 with accompanying QC tool enables monitoring of the protocol compliance of the interviewers and interviewer effects. Moreover, it facilitates the support of the interviewer by providing data-based feedback. The items measured are reported elsewhere (Ramos-Goñi et al. 2017). In order to support the respondents, it was realised that an extended introduction and practising cTTO is necessary before the valuation tasks can be carried out in the intended way: three additional practice states and a dynamic question after the wheelchair example were added as EQ-VT elements. Depending on the respondent’s response for the wheelchair task, in the dynamic question the respondent is asked to imagine a health state that is much better or much worse than being in a wheelchair in order to move to the other part of the evaluation space in the cTTO. Moreover, as mistakes and/or learning effects can still occur, confirmatory pop-ups after each task and the feedback module presented in Fig. 2.3 were additionally included. The latter presents respondents with the rank ordering implied by their cTTO valuations and provide the opportunity to flag problematic valuations for removal from the data. Further details on the EQ-VT elements and its changes are provided elsewhere (Stolk et al. 2019).

To prepare interviewers for their role in the execution of a study, the EuroQol Executive Office started to work more closely with study teams. Besides making available an interviewer script and EQ-VT software tailored to the needs of each team, the EuroQol Executive Office now also offers training for the local research team, who in turn train their interviewers (a ‘train the trainer’ approach). While this training helps, due to the complexity of the interview and because the topics taught are abstract until the interviewers start doing interviews, it will not prevent all issues. Learning on the job – as supported by the QC process – therefore has a big additional impact to promote performance of interviewers, using information on the behaviour of an interviewer to tailor and deliver a personalised set of additional instructions. The initial training addresses therefore a mix of topics related to content and process, to build up interviewer skills and to discuss collaboration when the study is ongoing.

Related topics that need consideration are the selection of interviewers, the logistics of data collection, and, more broadly, how investigators and interviewers can work together most effectively. This part of the study is not standardised, but the EuroQol Executive Office can offer recommendations. To date, working with a small team of dedicated students, who travel together with a data coordinator throughout the country, and collect data in weekly round of ten interviews per interviewer, serves as an example of good practice (e.g. Pickard et al. 2019; Shafie et al. 2019; Welie et al. 2020). Students have relevant background knowledge, are familiar with the concepts validity and bias, are keen to learn, want to do well, and do not mind having their performance assessed. Working as a group allows the data coordinator to deliver effective feedback, and individuals are likely to be receptive to it, since they see other interviewers work on similar issues. Undertaking the work as a group makes everyone more focused on the goal of the study.

2.5 Concluding Remarks

Over the last ten years the accumulation of extensive multinational evidence supported the development and the subsequent refinement of a standardised EQ-VT protocol for conducting national EQ-5D-5L valuation studies. A multinational research programme examined alternative approaches for eliciting health state preferences, developed methods to improve data quality and demonstrated the robustness of these approaches across languages and countries.

The EQ-VT protocol was developed in a way that evidence-based refinements are anticipated. Across the different versions of the protocol, EQ-VT 1.0 to the current version 2.1, the valuation tasks have remained the same, but later versions pay more attention to the optimal implementation of these tasks combined with a QC procedure. The refinements of the EQ-VT protocol have been shown to improve data quality and minimize interviewer effects.

The EQ-VT protocol has to date successfully been applied in about 30 countries worldwide and, at time of writing, 25 of these have been published. These 25 value sets are summarised in Chap. 4, and their similarities and differences are described in Chap. 6. Even though the improved valuation protocol with its QC process provides a solid basis for estimating national EQ-5D-5L value sets, there remain methodological questions that can be addressed in future research (see Chap. 7). This might further improve the EQ-VT protocol.