1 Introduction

In a discrete choice experiment (DCE), respondents are assigned sets of alternatives and asked to make choices under a hypothetical situation. The alternatives in each set are typically described using a descriptive system of multi-level attributes. Depending on the hypothetical situation, the choice sets may include a referent alternative (i.e., opt-out), for example, “watchful waiting” (e.g., Campbell & Erdem, 2019). In health preference research (HPR), the primary purpose of a DCE is to test and estimate the effects of attribute levels and different alternatives on preferential choice behaviors (Gonzalez, 2019). A common mantra in HPR is that choice defines value (Craig et al., 2017).

In a DCE, the effects of different attribute levels and alternatives on choice behaviors may vary between respondents due to differences in their preferences (i.e., preference heterogeneity). Such differences may be systematically associated with observable or unobservable factors. Observable heterogeneity refers to observed (by the researcher) factors associated with individuals’ choices such as age or gender (e.g., Craig et al., 2022). On the other hand, unobservable heterogeneity refers to latent factors associated with the choices of individuals (e.g., risk perception).

Unobserved heterogeneity may be described using individual-specific parameters, which may or may not be correlated. Uninformative heterogeneity implies that individual-specific parameters vary but are uncorrelated within the sample. Informative heterogeneity means that individual parameters are correlated through latent factors; therefore, fixing one parameter for a respondent affects the distribution of another. Jumamyradov et al. (2023) simulated DCE data under uninformative and informative heterogeneity and compared the conditional logit (CL) estimates to their true values. They found that the extent and form of unobservable heterogeneity may result in biased CL estimates, and these biases substantively affect willingness-to-pay estimates. It is not typically feasible to estimate individual-specific parameters directly; however, some researchers allow for random parameters, namely mixed logit specifications. As a natural extension of this prior simulation study, this paper investigates the effects of alternative mixed logit specifications on the biases caused by unobservable heterogeneity.

Informative heterogeneity may be separated into two subtypes based on which parameters are correlated. Substitution patterns (e.g., Brownstone & Train, 1999) refer to the presence of a latent factor that causes an association with alternative-specific constants (ASCs). For example, the classic red-bus-blue-bus problem refers to a positive correlation between the two bus alternatives, where a red bus is a substitute for a blue one (e.g., Quandt, 1970). An ASC is not the same as a nominal attribute. In this classic problem, every set includes three fixed alternatives, namely a car, a red bus, and a blue bus. If the color and mode were nominal attributes and varies independently, some sets might include three buses. In health economics, a more common example is a choice set with multiple fixed treatment options (e.g., a brand-name medication and a generic medication) as well as “watchful waiting” (i.e., no medication).

Taste patterns (e.g., Revelt & Train, 1998) refer to the presence of a latent factor that causes an association between attribute coefficients more generally (i.e., attribute importance). In health economics, each alternative may vary by out-of-pocket price, wait times, and other attributes. The coefficients of these attributes may be correlated. For example, individuals who are willing to pay more for their health care may also prefer more luxurious services (e.g., a brand-name medication).

Imagine a specific alternative (e.g., a brand-name medication). Under uninformative interpersonal heterogeneity, the effect of a specific brand varies between persons, but this effect is not associated with another alternative’s effect (i.e., substitution pattern) or the effects of any attributes (i.e., taste pattern). Alternatively, latent factors may induce informative heterogeneity, such as a substitution pattern between brands or a taste pattern between a brand and out-of-pocket cost.

For simplicity, we generated preferential choice behaviors for a standard DCE with a full factorial design (see Appendix A) and individual-specific unobserved coefficients drawn from a lognormal distribution. Each task represents a choice between three alternatives to illustrate uninformative and informative heterogeneity, including substitution and taste patterns. Each alternative is described using a nominal attribute and a numeric one, namely: medication and out-of-pocket price (see Appendix A for out-of-pocket price). For example, the correlation between the effects of brand and generic is a substitution pattern, and the correlation between the effects of medication and price is a taste pattern. Willingness-to-pay for a brand-name medication is the ratio of the brand-name medication and price effects. This DCE was constructed to mimic the classic red-bus-blue-bus example.

Generally, a mixed logit is the same as a conditional logit, except it allows for random parameters (i.e., more flexible) and is estimated using maximum simulated likelihood (MSL). The MSL estimator has become a widely used simulation-based method among applied economists to numerically evaluate models the likelihood function of which cannot be expressed in a closed-form and, therefore, cannot be evaluated analytically using the maximum likelihood (ML) estimator. However, the MSL estimator is biased for a fixed number of simulations, which is a results of the log transformation of the simulated likelihood (see Gourieroux & Monfort, 1990, Lee, 1995, and Hajivassiliou et al., 1994, Train, 2009). Furthermore, recently Jumamyradov and Munkin (2022) showed that the MSL estimator produce significant biases when applied to the bivariate normal and bivariate Poisson-lognormal models. We expect that similar biases might appear in the MSL estimation of the mixed logit model within the context of a DCE.

Apart from simulation bias and other nuances (e.g., number of Halton draws), determining which parameters should be allowed to be random in the mixed logit model depends on the researcher’s assumptions. Knowing the specification of the unobservable heterogeneity in advance can be challenging. Therefore, this simulation study examines the performance of the MSL estimation of the mixed logit model targeting three questions:

  1. 1.

    What happens when the mixed logit model is correctly specified (simulation bias)?

  2. 2.

    What happens when the mixed logit model erroneously restricts a correlation (under-specification)?

  3. 3.

    What happens when the mixed logit model erroneously allows a correlation (over-specification)?

More specifically, we compare the mixed logit estimates and true parameters under uninformative and informative heterogeneity to assess differences due to simulation and misspecification. In this paper, we emphasized the estimation of the correlations, not the variances. Note that increasing the variances only increases the scale of the biases and has no effect on the correlations.

Depending on the experimental design, analysts may take two general approaches when accounting for interpersonal heterogeneity in panel data analysis. First, in rare cases, the analyst can estimate individual-specific parameters per respondent if the DCE has a large number of tasks per respondent relative to the number of parameters (i.e., fixed effects). When this is not the case, the analyst may estimate a random parameter model by making distributional assumptions about the prevalence of parameters within the population (e.g., mixed logit). To estimate a random-parameter model, the analyst will need to specify the correlation structure between the random parameters, often with little prior information.

Since its advent in 1980s, mixed logit estimation has been increasingly utilized in the analysis of DCE responses (e.g., air travel: Hossain et al., 2018; healthcare: Soekhai et al., 2019). For instance, Clark et al. (2014) report that while only 3% of health-related DCE papers published during 1990–2000 utilized the mixed logit model in their estimations, that number increased to 21% during 2009–2012. Our null hypothesis is that the mixed logit estimation accurately describes the unobservable heterogeneity in DCE responses. The rest of the paper is organized as follows. Section 2 presents the mixed logit model, the DCE simulations, MSL estimation, and comparison procedure details. Section 3 presents the results and concludes.

2 Methods

  1. a.

    The Mixed Logit Model.

The theoretical framework of choice analysis is based on the random utility maximization (RUM) theory (e.g., Marschak, 1960). According to RUM theory, utility function \({U}_{itj}={V}_{itj}+{\epsilon }_{itj}\) of individual \(i=1,\dots N\) for alternative \(j=1,\dots ,J\) at choice task \(t=1,\dots T\) can be decomposed into deterministic part of utility \({V}_{itj}\) (representative utility) which is observed by the researcher, and random part of utility\({\epsilon }_{itj}\) which is not observed by the researcher.

Following McFadden (1974), individual \(i\) will choose alternative \(j\) if and only if the probability that utility associated with alternative \(j\) is higher than utilities associated with all other alternatives

$${P}_{itj}=P\left({U}_{itj}>{U}_{itk} , \forall k\ne j\right){P}_{itj}=P({V}_{itj}+{\epsilon }_{itj}>{V}_{itk}+{\epsilon }_{itk} , \forall k\ne j){P}_{itj}=P({\epsilon }_{itk}-{\epsilon }_{itj}<{V}_{itj}-{V}_{itk} , \forall k\ne j)$$
(1)

To derive the choice probabilities, we need to make distributional assumptions about the random part of utility. The conditional logit (CL) model is derived under the assumption that \({\epsilon }_{itj}\) is independently and identically distributed (IID) with extreme value type 1 (EV1) distribution (e.g., Revelt & Train, 1998; Brownstone & Train, 1999; McFadden & Train, 2000). As a result, the difference of two IID EV1 random error terms \({(\epsilon }_{itk}-{\epsilon }_{itj})\) has a logistic distribution. This implies that the choice probabilities of the conditional logit model can be expressed in terms of logistic distribution with cumulative distribution function

$${P}_{itj}=\frac{1}{1+\sum _{k=1}^{J}\text{e}\text{x}\text{p}\left[\right({V}_{itk}-{V}_{itj}\left)\right]}, \forall k\ne j$$
(2)

Under the conditional logit, representative utility \({V}_{itj}\) is a function of alternative attributes, typically linear in parameters. However, the mixed logit extends the conditional logit through individual-specific parameters (e.g., Hensher et al., 2005). The mixed logit model can account for various kinds of unobservable heterogeneity, but its flexibility comes with a cost. The choice probabilities of the mixed logit model do not have a closed-form expression. Therefore, the estimation of the mixed logit model relies on simulation-based methods such as the maximum simulated likelihood estimator (MSL) (e.g., Greene, 2012).In this paper, simulation bias refers to the difference between estimated and true parameter values, when the estimated mixed logit model is correctly specified.

Furthermore, we examine biases due to misspecification of the mixed logit models. In particular, a model may be under-specified when the researcher erroneously restricts a correlation (i.e. under-specification bias) between random parameters. Similarly, a researcher erroneously assumes that random parameters are correlated with each other (i.e. over-specification bias). We conducted a series of simulations to assess differences between mixed logit estimates and true parameters due to simulation and misspecification biases.

  1. b.

    Simulation.

Using a standard DCE, we generate choices between three alternatives, where the third alternative serves as a referent (i.e., opt-out; Campbell & Erdem, 2019). For identification of the parameters, we normalize the model by assuming that the coefficients of the opt-out alternative are equal to zero. Therefore, the deterministic part of the opt-out alternative will be represented as \({V}_{it3}=0\). For example, imagine the decision between a brand-name medication, generic medication or doing nothing. Hence, we assume that the representative utilities of three alternatives will take the following form

$${V}_{it1}={\alpha }_{i1}-{\beta }_{i}{x}_{t1}{V}_{it2}={\alpha }_{i2}-{\beta }_{i}{x}_{t2}{V}_{it3}=0$$
(3)

and therefore, utility differences between each alternative and opt-out will be

\({Z}_{it1}={\alpha }_{i1}-{\beta }_{i}{x}_{t1}+{\eta }_{it1} {\eta }_{it1}\sim Logistic\left(\text{0,1}\right)\)\({Z}_{it2}={\alpha }_{i2}-{\beta }_{i}{x}_{t2}+{\eta }_{it2} {\eta }_{it2}\sim Logistic\left(\text{0,1}\right)\)(4).

where \({x}_{t1}\) and \({x}_{t2}\) are out-of-pocket prices (see the Appendix A for the values of out-of-pocket price), \({\alpha }_{i1}\) and \({\alpha }_{i2}\) are alternative-specific constants (ASCs), \({\beta }_{i}\) is a random price coefficient, \({Z}_{it1}={U}_{it1}-{U}_{it3}\) and \({Z}_{it2}={U}_{it2}-{U}_{it3}\) are two utility differences, and finally \({\eta }_{it1}={\epsilon }_{it1}-{\epsilon }_{it3}\) and \({\eta }_{it2}={\epsilon }_{it2}-{\epsilon }_{it3}\) are differences in IID EV1 error terms. Under this specification, the willingness-to-pay for each brand is the ratio of the ASC and price coefficient, \({\alpha }_{i1}/{\beta }_{i}\) and \({\alpha }_{i2}/{\beta }_{i}\). Notice that we have a negative sign in front of the price coefficient \({\beta }_{i}\) to represent the fact that people dislike paying more.

In our model specification, we use two ASCs (\({\alpha }_{i1}\) and \({\alpha }_{i2}\)) to allow for their correlation. Alternatively, a nominal attribute for opt-in (\({\alpha }_{i}\)) may be included instead of two ASCs. However, that specification would deviate from the red-bus-blue-bus example. Similar in health economics is the case of choosing between three ways to see heartburn relief where the alternatives are a brand-name medication, a generic medication or doing nothing. It seems appropriate to allow the ASCs for the brand-name and generic medications to be correlated, and not combined as a treatment attribute.

Since the random out-of-pocket cost \({\beta }_{i}\) enters the denominator, careful consideration of its distribution is important. Daly et al. (2012) show that whether the distribution of WTP has finite moments depends on the distribution of \({\beta }_{i}\). In this paper, we assume that all three random variables have a lognormal distribution such as

$${\beta }_{i}=\text{exp}\left(\beta +{v}_{{\beta }_{i}}\right)$$
$${\alpha }_{1i}=\text{exp}\left({\alpha }_{1}+{v}_{{\alpha }_{1i}}\right)$$
$${\alpha }_{2i}=\text{exp}\left({\alpha }_{2}+{v}_{{\alpha }_{2i}}\right)$$

where \(\left({v}_{{\alpha }_{1i}},{v}_{{\alpha }_{2i}},{v}_{{\beta }_{i}}\right)\sim N(\left(\text{0,0},0\right),{\Sigma })\) has a multivariate normal distribution with the following covariance matrix

$${\Sigma }=\left[\begin{array}{ccc}{\sigma }_{1}^{2}& {\sigma }_{1}{\sigma }_{2}{\rho }_{\alpha }& {\sigma }_{1}{\sigma }_{\beta }{\rho }_{\beta }\\ {\sigma }_{1}{\sigma }_{2}{\rho }_{\alpha }& {\sigma }_{2}^{2}& 0\\ {\sigma }_{1}{\sigma }_{\beta }{\rho }_{\beta }& 0& {\sigma }_{\beta }^{2}\end{array}\right]$$
(5)

with \({\sigma }_{1}^{2}=Var\left({v}_{{\alpha }_{1i}}\right)\), \({\sigma }_{2}^{2}=Var\left({v}_{{\alpha }_{2i}}\right)\), \({\sigma }_{\beta }^{2}=Var\left({v}_{{\beta }_{i}}\right),{ \rho }_{\alpha }=Corr\left({v}_{{\alpha }_{1i}},{v}_{{\alpha }_{2i}}\right)\), and \({\rho }_{\beta }=Corr\left({v}_{{\alpha }_{1i}},{v}_{{\beta }_{i}}\right)\). In this covariance matrix, informative heterogeneity can further be separated based on whether the correlation is between the alternative-specific constants,\({ \rho }_{\alpha }\) (i.e. substitution patterns) or alternative attributes, \({\rho }_{\beta }\) (i.e. taste patterns). Notice that we do not generate data with \(Corr\left({v}_{{\alpha }_{2i}},{v}_{{\beta }_{i}}\right)\), because we found no substantive differences between the data generated with \(Corr\left({v}_{{\alpha }_{1i}},{v}_{{\beta }_{i}}\right)\) or \(Corr\left({v}_{{\alpha }_{2i}},{v}_{{\beta }_{i}}\right)\).

The mean and variance of lognormal random variables \({\beta }_{i}\) and \({\alpha }_{1i}\) are

$$E\left({\beta }_{i}\right)=\text{exp}\left(\beta +{\sigma }_{\beta }^{2}/2\right)$$
$$E\left({\alpha }_{1i}\right)=\text{exp}\left({\alpha }_{1}+{\sigma }_{1}^{2}/2\right)$$
$$Var\left({\beta }_{i}\right)=\text{exp}\left(2\beta +{\sigma }_{\beta }^{2}\right)\left[\text{exp}\left({\sigma }_{\beta }^{2}\right)-1\right]$$
$$Var\left({\alpha }_{1i}\right)=\text{exp}\left(2{\alpha }_{1}+{\sigma }_{1}^{2}\right)\left[\text{exp}\left({\sigma }_{1}^{2}\right)-1\right]$$

Notice that the WTP, which is a ratio of two lognormal random variables, also has a lognormal distribution

$$WT{P}_{{\alpha }_{1}}=\frac{{\alpha }_{1i}}{{\beta }_{i}}=\frac{\text{exp}\left({\alpha }_{1}+{v}_{{\alpha }_{1i}}\right)}{\text{exp}\left(\beta +{v}_{{\beta }_{i}}\right)}=\text{e}\text{x}\text{p}({\alpha }_{1}-\beta +{v}_{{\alpha }_{1i}}-{v}_{{\beta }_{i}})$$

with mean and variance as following

$$E\left(WT{P}_{{\alpha }_{1}}\right)=\text{e}\text{x}\text{p}\left({\alpha }_{1}-\beta +\frac{{\sigma }_{1}^{2}-2{\sigma }_{1}{\sigma }_{\beta }{\rho }_{\beta }+{\sigma }_{\beta }^{2}}{2}\right)$$
$$Var\left(WT{P}_{{\alpha }_{1}}\right)=\text{e}\text{x}\text{p}\left({2(\alpha }_{1}-\beta )+{\sigma }_{1}^{2}-2{\sigma }_{1}{\sigma }_{\beta }{\rho }_{\beta }+{\sigma }_{\beta }^{2}\right)\left[\text{exp}\left({\sigma }_{1}^{2}-2{\sigma }_{1}{\sigma }_{\beta }{\rho }_{\beta }+{\sigma }_{\beta }^{2}\right)-1\right]$$

In the data generation process, the true means of the representative utility parameters (Eq. 3) were fixed at \({\alpha }_{1}={\alpha }_{2}=\beta =1\)(e.g., brand-name medication, generic medication, and out-of-pocket price, respectively). Furthermore, we generate various covariance matrices (Eq. 5) based on the true values we choose for the variances and correlations, and we refer to each distinct covariance matrix as a “parameter set” (see Table 1). For each parameter set, we generated 100 datasets (\(R=100\)), each including 25 choices (\(T=25\)) for 200 respondents (\(N=200\)) given a standard DCE with a full factorial experimental design. We acknowledge that this design may include dominated alternatives. However, since this is a simulation study, the existence of dominated alternatives should not be a concern.

The simulated covariance matrices fall into three distinct groups: the first group (i.e., Group 1) represents uninformative heterogeneity (i.e., uncorrelated random parameters; \({\rho }_{\alpha }={\rho }_{\beta }=0\)) with different variances, and the last two groups (i.e., Group 2 and Group 3) represent informative heterogeneity (i.e., correlated random parameters; \({\rho }_{\alpha }\ne 0\) or \({\rho }_{\beta }\ne 0\)) where the variances are fixed at 0.15. For instance, in Group 1 we assume that \({\sigma }_{1}^{2}\), \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\) range from 0.05 to 0.25 with increments of 0.1 (i.e. \({\sigma }_{1}^{2}={\sigma }_{2}^{2}={\sigma }_{\beta }^{2}=\{0.05, \text{0.15,0.25}\}\) or \({\sigma }_{1}^{2}={\sigma }_{2}^{2}={\sigma }_{\beta }^{2}=\{0.05:0.1:0.25\}\)) and we assume that \({\rho }_{\alpha }={\rho }_{\beta }=0\). Since there are 3 different values for each \({\sigma }_{1}^{2}\), \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\), we can create 33=27 combinations of parameter sets for the true covariance matrix.

In the Group 2 we generate the data with fixed variances \({\sigma }_{1}^{2}={\sigma }_{2}^{2}={\sigma }_{\beta }^{2}=0.15\) and different values for \({\rho }_{\beta }\) ranging from \(-0.95\) to 0.95 with increments of 0.05 (i.e. \({\rho }_{\beta }=\{-0.95:0.05:0.95\}\)). In the Group 3 we generate the data with fixed variances \({\sigma }_{1}^{2}={\sigma }_{2}^{2}={\sigma }_{\beta }^{2}=0.15\) and different values for \({\rho }_{\alpha }\) ranging from 0.05 to 0.95 with increments of 0.05 (i.e. \({\rho }_{\alpha }=\{0.05:0.05:0.95\}\)). Therefore, in Group 2 and Group 3, there are 38 and 19 nonzero correlation values, respectively. For brevity, we define \({\sigma }^{2}=\{{\sigma }_{1}^{2}, {\sigma }_{2}^{2},{\sigma }_{\beta }^{2}\}\) and \(\rho =\{{\rho }_{\alpha },{\rho }_{\beta }\}\).

Table 1 Parameter Sets for the Simulation of Informative and Uninformative Heterogeneity
  1. c.

    Estimation.

To analyze simulation and misspecification biases in each simulated dataset with various parameter sets, we estimated different specifications of the mixed logit model using 500 Halton draws. For simulation bias, we estimate the mixed logit model assuming that the researcher knows the accurate specification of the data generation process. For instance, when the data is generated using the Group 1, we estimate the mixed logit model assuming \({\rho }_{\alpha }={\rho }_{\beta }=0\). When the data are generated using the Group 2, we estimate the mixed logit model assuming \({\rho }_{\alpha }=0\) and \({\rho }_{\beta }\ne 0\). Lastly, when the data are generated using the Group 3, we estimate the mixed logit model assuming \({\rho }_{\beta }=0\) and \({\rho }_{\alpha }\ne 0\).

To analyze misspecification bias, we estimate an inaccurate specifications of the mixed logit model by either restricting (i.e., under-specification) or unrestricting (i.e. over-specification) correlations in the estimation process. For instance, to analyze the under-specification bias, we generate the data using the Group 2 or Group 3, however, we estimate the mixed logit model where we restrict \({\rho }_{\alpha }={\rho }_{\beta }=0\). To analyze the over-specification bias, we generate the data using the Group 1, however, we estimate the model where we allow \({\rho }_{\beta }\ne 0\) or \({\rho }_{\alpha }\ne 0\). The goal in these analysis is to examine whether MSL estimation of the mixed logit model produce accurate estimates, and more importantly, to see if the results produce accurate WTP.

Overall, in this paper we estimate three different mixed logit models, which differ based on correlation restrictions. The first model, Model 1, assumes that \({\rho }_{\alpha }={\rho }_{\beta }=0\). The second model, Model 2, assumes that \({\rho }_{\beta }\ne 0\) but \({\rho }_{\alpha }=0\). In the third model, Model 3, we assume that \({\rho }_{\alpha }\ne 0\) but \({\rho }_{\beta }=0\).

In Table 2 we summarize different biases we analyze in this paper (i.e., simulation, over-specification and under-specification biases). Notice that if we generate the data using the Group 1 and estimate the mixed logit model using the Model 1, occurring bias in the estimated parameters will represent the simulation bias. Similarly, if we generate the data using the Group 1 and estimate the Model 2, occurring bias in the estimated parameters will represent the over-specification bias.

Table 2 Different models and groups of data generation processes
  1. d.

    Comparison between True and Estimated Values.

First, we describe simulation bias by comparing the true and estimated values obtained from accurate specification of uninformative (i.e. Group 1) and informative heterogeneity (i.e., Group 2 and Group 3). As a measure of biases, we use the Wald distance of the estimated and true values based on the estimated standard errors (see Greene, 2012). Second, we describe under-specification bias by comparing the true and estimated values obtained from Group 2 (i.e., restricted \({\rho }_{\beta }=0\)) and Group 3 (i.e., restricted \({\rho }_{\alpha }=0\)). Lastly, we describe over-specification bias by comparing the true and estimated values obtained from the Group 1 (i.e., unrestricted \({\rho }_{\beta }\ne 0\) or \({\rho }_{\alpha }\ne 0\)).

3 Results

This section summarizes the results that we believe ideally address the questions put forward in the introduction regarding simulation bias, under-specification (i.e., restricted correlation) and over-specification (i.e. unrestricted correlation) of MSL estimation of the mixed logit model. Standard error of each parameter was calculated by dividing its standard deviation over 100 simulations by the square root of 100. As a benchmark, in our interpretation of results, we analyze the distance between the true and estimated parameters relative to their respective standard errors. After interpreting these distances, we repeated the analysis with 500 simulations (Appendix B), which rendered similar results and further strengthened the conclusions.

3.1 Simulation Bias

As mentioned earlier, simulation bias represents the bias occurring when the mixed logit model is correctly specified. To analyze the simulation bias, we present Table 3 where we show MSL results of the mixed logit model with the Group 1 data generation process. Since there are 27 parameter sets with different true variances, in Table 3 we present the results of only nine combinations of variances. The rest of the results can be found in the Supplementary Materials. The true values of variances in Table 3 are chosen such that \({\sigma }_{1}^{2}\) and \({\sigma }_{2}^{2}\) each has three values \(\left\{0.05, 0.15, 0.25\right\}\), and \({\sigma }_{\beta }^{2}\) does not vary (\({\sigma }_{\beta }^{2}=0.15\)). In Table 3, we show the estimated parameters as well as the true and estimated WTP with their respective standard errors in brackets.

There are three observations that we would like to point out about the simulation bias on Group 1. First, notice that the MSL estimator produces biased results for \({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta\) in all nine specifications presented in the table. Specifically, when the true coefficients are \({\alpha }_{1}=1\), \({\alpha }_{2}=1\), \(\beta =1\), with variances \({\sigma }_{1}^{2}=0.05\), \({\sigma }_{2}^{2}=0.05\) and \({\sigma }_{\beta }^{2}=0.15\) (i.e., first column), the estimated coefficients are equal to 1.117 (0.003), 1.118 (0.003) and 1.121 (0.003), respectively, all three of which are separated from their true values by approximately 40 standard errors.

Second observation is that the estimated variances are biased as well. For instance, when the true variances are \({\sigma }_{1}^{2}=0.05\), \({\sigma }_{2}^{2}=0.05\) and \({\sigma }_{\beta }^{2}=0.15\) (i.e., first column), MSL estimates are 0.222 (0.003), 0.219 (0.003) and 0.378 (0.003) which are separated from their true values by more than 57, 56 and 43 standard errors, respectively. Therefore, the null hypothesis that the estimated coefficients and variances are equal to their respective true values may be rejected for all these parameters.

Third observation is that even though the parameter estimates are biased, the estimated WTP results are close to their true values. For instance, when the true variances are \({\sigma }_{1}^{2}=0.05\), \({\sigma }_{2}^{2}=0.05\) and \({\sigma }_{\beta }^{2}=0.15\) (i.e., first column), the estimated \(E\left(WT{P}_{{\alpha }_{1}}\right)\) and \(Var\left(WT{P}_{{\alpha }_{1}}\right)\) are − 1.097 (0.004) and 0.259 (0.004), which are separated from their respective true values − 1.105 and 0.270 by less than three standard errors (i.e. typical margin or error). Furthermore, all 36 estimated WTP parameters (i.e., 9 columns, 2 mean values and 2 variances each) are within the typical margin of error. Therefore, we conclude that although there is a simulation bias in the coefficients and variances of the Group 1, it does not substantively affect the estimated WTP. Similar patterns can be seen in results presented in the Supplementary Materials.

Table 3 Model 1 Estimation for Group 1 (i.e., Simulation Bias)

In Table 4, we present MSL parameter estimates for Group 2 (taste patterns). Notice that we do not estimate \({\sigma }_{1}^{2}\) because we restrict it at its true value 0.15 for identification. Since in Group 2 there are 38 parameter sets with different true correlations (see Table 1), in Table 4 we present only the results of eight parameter sets. The rest of the results can be found in the Supplementary Materials.

Similar to the results we found in Table 3, the MSL estimator produces biased results in the Group 2 as well. Specifically, in all eight specifications presented in Table 4, the estimated mean coefficients (i.e., \({\alpha }_{1}\), \({\alpha }_{2}\), \(\beta\)) are separated from their true values by approximately 40 standard errors. Furthermore, in all eight columns of Table 4, the estimated variances (i.e., \({\sigma }_{2}^{2}\), \({\sigma }_{\beta }^{2}\)) are separated from their true values by approximately 70 standard errors. Therefore, the null hypothesis that the estimated coefficients and variances are equal to their respective true values may be rejected.

On the contrary, we can see in Table 4 that MSL results of the estimated \({\rho }_{\beta }\) are relatively accurate than the estimated coefficients and variances. For instance, the true and estimated \({\rho }_{\beta }\) are separated from each other by more than three standard errors only in two out of eight columns. Specifically, when the true correlation is \({\rho }_{\beta }=-0.75\) and \({\rho }_{\beta }=-0.35\), their estimated values are \(-0.798 \left(0.013\right)\) and \(-0.398 \left(0.013\right)\), respectively, both of which are separated from their true values by 3.7 standard errors.

However, similar to the results we observed in Table 3, even though there are biases in the estimated coefficients, variances and correlation, the estimated mean and variance of WTP are mostly close to their true values. There are only a few instances where the WTP parameters are separated from their true values by more than 3 standard errors (i.e., biased). For instance, the estimated \(E\left(WT{P}_{{\alpha }_{1}}\right)\) is biased when the true correlation \({\rho }_{\beta }=0.55\) and \({\rho }_{\beta }=0.75\), the estimated \(E\left(WT{P}_{{\alpha }_{2}}\right)\) is biased when the true correlation \({\rho }_{\beta }=-0.75\), and finally the estimated \(Var\left(WT{P}_{{\alpha }_{2}}\right)\) is biased when the true correlation \({\rho }_{\beta }=0.35\) and \({\rho }_{\beta }=0.55\). Therefore, the simulation bias in the Group 2 (taste patterns) does not seem to affect the estimated WTP.

Table 4 Model 2 Estimation for Group 2 (i.e., Simulation Bias)

In Table 5, we present the results for the Group 3 (substitution patterns) where we find similar patterns in terms of the estimated parameters and WTP. Specifically, in all nine columns presented in Table 5, the estimates of coefficients and variances are separated from their true values by more than three standard errors, and therefore, the MSL estimator produces biased results for the coefficients \({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta\), and variances \({\sigma }_{1}^{2}\) and \({\sigma }_{2}^{2}\).

However, similar to the results in Table 4, the MSL estimator produces relatively more accurate results for the correlation parameter \({\rho }_{\alpha }\). For instance, in Table 5, MSL produces biased result for \({\rho }_{\alpha }\) only when its true value is \({\rho }_{\alpha }=0.5\), which is separated from its estimated value 0.532 (0.009) by 3.5 standard errors. However, regardless of these simulation biases occurring in the estimation of parameters, they do not substantively affect the estimated WTP.

Table 5 Model 3 Estimation for Group 3 (i.e., Simulation Bias)

3.2 Under Specification Bias

Next, we analyze under-specification bias. Recall that in Table 4 we present the Model 2 estimation (i.e., correct specification) results of the Group 2 and analyze simulation bias. In Table 6, we present the Model 1 estimation (i.e., restricted \({\rho }_{\beta }=0\)) results for the Group 2. By comparing Tables 4 and 6, we infer the effects of restricting \({\rho }_{\beta }\) on the mixed logit estimates apart from simulation bias.

There are three observations that we would like to point out. The first observation relates to the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\). The second observation is related to the estimated variances \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\). Finally, the third observation relates to the estimated WTP parameters.

First, restricting a positive \({\rho }_{\beta }\) has a different effect on the estimated coefficients than restricting a negative \({\rho }_{\beta }\). Specifically, restricting a negative \({\rho }_{\beta }\) slightly increases the bias in the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\). However, restricting a positive \({\rho }_{\beta }\) substantively reduces the bias in the coefficients, countering the simulation bias. For instance, in Table 6, when the true \({\rho }_{\beta }=-0.95\), the estimated \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\) are 1.122 (0.004), 1.118 (0.004) and 1.130 (0.004), which are separated from their true values relatively more than their counterparts in Table 4, implying that restricting \({\rho }_{\beta }\) increases the under-specification bias. However, when the true value is \({\rho }_{\beta }=0.95\), the estimated \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\) are 1.104 (0.004), 1.085 (0.004) and 1.097 (0.004), which are substantively closer to their true values than the estimated coefficients in Table 4.

Second, when we compare the estimated variances, there are a few non-negligible differences between the estimates we find in Tables 4 and 6. For instance, among all nine columns presented in Table 4, the largest and smallest estimated \({\sigma }_{2}^{2}\) are 0.391 (0.004) and 0.378 (0.004), which occur when the true correlations are \({\rho }_{\beta }=0.35\) and \({\rho }_{\beta }=-0.35\), respectively. However, in Table 6, the estimated \({\sigma }_{2}^{2}\) monotonically increases from 0.351 (0.003) to 0.480 (0.004) when the true correlations are \({\rho }_{\beta }=-0.95\) and \({\rho }_{\beta }=0.95\), respectively. Given that the true value is \({\sigma }_{2}^{2}=0.15\), the results in Tables 4 and 6 suggests that restricting \({\rho }_{\beta }\) increases the bias in the estimated \({\sigma }_{2}^{2}\).

Similarly, the largest and smallest estimated \({\sigma }_{\beta }^{2}\) in Table 4 are 0.368 (0.003) and 0.385 (0.003), respectively. However, the estimated \({\sigma }_{\beta }^{2}\) in Table 6 monotonically decreases from 0.434 (0.003) to 0.150 (0.006) when the true correlations are \({\rho }_{\beta }=-0.95\) and \({\rho }_{\beta }=0.95\), respectively. Therefore, we can conclude that restricting \({\rho }_{\beta }=0\) adds more variation to the estimated variances \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\).

Finally, recall that the estimated mean and variance of WTP in Table 4 are mostly close to their true values. Specifically, in Table 4, only 4 out of 36 estimated WTP parameters are separated from their respective true values by more than 3 standard errors (i.e., 9 columns with 4 estimated WTP parameters each). However, when we compare the WTP results from Table 4 to the results we observe in Table 6, we note that restricting \({\rho }_{\beta }\) increased the bias in the mean and variance of WTP. Specifically, 25 out of 36 estimated WTP parameters are separated from their respective true values by more than 3 standard errors. Therefore, we can conclude that although mis-specifying the mixed logit model by restricting\({\rho }_{\beta }\) may sometimes positively affect the estimated coefficients and variances, however, it negatively affects the estimated WTP parameters.

Table 6 Under Specification for Group 2 (Restricted \({\rho }_{\beta }\))

Like with Tables 4, 5, 6 and 7 present the Model 3 estimation (i.e., correct specification) results and the Model 1 estimation (i.e., restricted \({\rho }_{\alpha }=0\)) results of the Group 3, respectively. By comparing Tables 5 and 7, we infer the effects of restricting \({\rho }_{\alpha }\) on the mixed logit estimates apart from simulation bias. Similarly, there are three observations that we would like to point out.

First, we observe that restricting \({\rho }_{\alpha }\) does not have a substantive effect on the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\). The estimated coefficients in Tables 5 and 7 are mostly similar to each other. Second observation is that in Table 7, when we increase the true correlation \({\rho }_{\alpha }\) from 0.1 to 0.9, the estimated values of \({\sigma }_{1}^{2}\) and \({\sigma }_{2}^{2}\) both monotonically decrease from 0.369 (0.004) and 0.374 (0.004) to 0.172 (0.004) and 0.179 (0.003), respectively. However, in Table 5, the largest and smallest estimated \({\sigma }_{2}^{2}\) are 0.392 (0.004) and 0.383 (0.003), respectively, which is fairly constant compared to the estimated \({\sigma }_{2}^{2}\) in Table 7. Given that the true value in Table 7 is \({\sigma }_{2}^{2}=0.15\), these results suggest that restricting \({\rho }_{\alpha }=0\) decreases the bias in the estimated \({\sigma }_{2}^{2}\), especially when the true \({\rho }_{\alpha }\) is high.

However, when we compare the estimated \({\sigma }_{\beta }^{2}\) in Tables 5 and 7, the results suggest that restricting \({\rho }_{\alpha }\) in the estimation of the mixed logit model negatively affects the estimated \({\sigma }_{\beta }^{2}\). Specifically, the estimated \({\sigma }_{\beta }^{2}\) in Table 7 ranges from 0.377 (0.003) to 0.450 (0.003), however, in Table 5 it only ranges from 0.366 (0.003) to 0.382 (0.003). Given that the true value is \({\sigma }_{\beta }^{2}=0.15\), these results suggest that restricting \({\rho }_{\alpha }=0\) increases the bias in the estimated \({\sigma }_{\beta }^{2}\).

Finally, when we compare the estimated WTP parameters between Tables 5 and 7, we see that restricting \({\rho }_{\alpha }\) substantively increases the bias in the mean and variance of WTP. For instance, in Table 5, there are only 7 out of 36 WTP parameters that are separated from their respective true values by more than 3 standard errors, with an average distance of 2.1 standard errors. However, in Table 7, there are 32 estimated WTP parameters that are biased, with an average distance 8.7 standard errors between the true and estimated values. Therefore, we can conclude that omitting \({\rho }_{\alpha }\) in MSL estimation of the mixed logit model, negatively affects the mean and variance of WTP.

Table 7 Under Specification for Group 3 (Restricted \({\rho }_{\alpha }\))

3.3 Over-specification Bias

Next, we analyze over-specification bias. Recall that in Table 3 we present the Model 1 estimation (i.e., correct specification) results of the Group 1 and analyze simulation bias. In Table 8, we present the Model 2 estimation (i.e., unrestricted \({\rho }_{\beta }\ne 0\)) results for the Group 1. By comparing Tables 3 and 8, we infer the effects of unrestricting \({\rho }_{\beta }\) on the mixed logit estimates apart from simulation bias.

There are three observations that we would like to point out. The first observation relates to the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\), and \(\beta\). The second observation is related to the estimated variances \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\), as well as to the estimated correlation \({\rho }_{\beta }\). Finally, we analyze the effects on the estimated WTP parameters.

First, notice that the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta\) are biased in all nine columns presented in Table 8. However, these biases are not notably different than those presented in Table 3, which suggests that unrestricted \({\rho }_{\beta }\ne 0\) does not substantively affect the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta\).

Second, when we compare the estimated variances \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\) between Tables 3 and 8, we can see that the estimated values in two tables share similar ranges. For instance, the estimated \({\sigma }_{2}^{2}\) in Table 8 ranges between 0.206 (0.010) and 0.503 (0.004). In Table 3, the estimated \({\sigma }_{2}^{2}\) ranges between 0.213 (0.005) and 0.500 (0.004). Similarly, the estimated \({\sigma }_{\beta }^{2}\) in Table 8 ranges from 0.340 (0.015) to 0.377 (0.004), while in Table 3 it ranges from 0.364 (0.011) to 0.380 (0.003). These results suggest that there may be slight differences in the estimated variances when we allow \({\rho }_{\beta }\ne 0\).

When we analyze the estimated \({\rho }_{\beta }\), we see that MSL produces biased results in some columns of Table 8. Specifically, in 5 out of 9 columns of Table 8, the estimated \({\rho }_{\beta }\) is separated from 0 by more than 3 standard errors. For instance, when the true variances are \({\sigma }_{1}^{2}={\sigma }_{\beta }^{2}=0.15\) and \({\sigma }_{2}^{2}=0.05\), the estimated \({\rho }_{\beta }\) is \(-0.029 \left(0.012\right)\), which is within 3 standard errors from its true value \({\rho }_{\beta }=0\). However, when the true variances are \({\sigma }_{1}^{2}=0.05,{\sigma }_{2}^{2}=0.25\) and \({\sigma }_{\beta }^{2}=0.15\), the estimated \({\rho }_{\beta }\) is \(-0.123 \left(0.020\right)\), which is separated from its true value by more than 6 standard errors. Analysts may erroneously infer correlation due to over-specification.

Finally, although it seems like there are only slight differences in the estimated coefficients and variances of Tables 3 and 8, those small differences add up and substantively affect the estimated WTP parameters in Table 8. Specifically, all 36 estimated WTP parameters in Table 3 are within 3 standard errors from their respective true values. However, in Tables 8 and 13 out of 36 estimated WTP parameters are separated from their true values by more than 3 standard errors, with and average distance of 5.83 standard errors between the true and estimated values. In other words, over-specification may cause biases in WTP estimates.

Like with Tables 8 and 9 present the Model 3 estimation (i.e., unrestricted \({\rho }_{\alpha }\ne 0\)) results of the Group 1. By comparing Tables 3 and 9, we infer the effects of unrestricting \({\rho }_{\alpha }\) on the mixed logit estimates apart from simulation bias. The results generally confirm that findings from Table 8 namely that allowing \({\rho }_{\alpha }\ne 0\) does not substantively affect the estimated coefficients \({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta\) or the estimated variances \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\), but over-specification may cause biases in the WTP estimates.

Table 8 Over-specification Bias for Group 1 (Unrestricted \({\rho }_{\beta }\))
Table 9 Over-specification Bias for Group 1 (Unrestricted \({\rho }_{\alpha }\))

4 Discussion

The evidence from this simulation study suggests that the MSL estimator will produce biased estimates of the mean coefficients, their variances and correlations irrespective of the underlying specification of the unobserved heterogeneity. These results suggest that the correct specification of the mixed logit model (i.e., simulation specification is the same as estimator specification) does not substantively mitigate biases in the estimated coefficients. However, a correctly specified mixed logit model can provide relatively accurate estimates of the WTP parameters. Restricting or unrestricting correlations erroneously can lead to further biases, particularly in the estimated variances (i.e., \({\sigma }_{2}^{2}\) and \({\sigma }_{\beta }^{2}\) ) as well as the estimated mean and variance of WTP.

Before estimating the mixed logit model based on the empirical DCE data, we highly recommend that choice modellers conduct simulations based on their intended specifications to assess the potential extent of simulation biases before relying on the MSL estimates of mixed logit parameters, particularly their variances and correlations. For example, some analysts may allow all parameters to be random and correlated; however, the findings from this paper may cast doubt about whether MSL estimators can reproduce results accurately.

To better illustrate the implication of these findings, we created a series of three plots comparing the true and estimated \(\left(WT{P}_{{\alpha }_{1}}\right)\) for 500 Halton draws. In health preference research, often the primary aim is to estimate maximum acceptable risk (MAR) parameters or value sets on a quality-adjusted life year scale, instead of monetary units. These statistics are mathematically equivalent to \(E\left(WT{P}_{{\alpha }_{1}}\right)\).

Starting with Group 1 (No Patterns; Fig. 1), the true \(E\left(WT{P}_{{\alpha }_{1}}\right)\) is represented with a black dot, and the estimated \(E\left(WT{P}_{{\alpha }_{1}}\right)\) for the Model 1, Model 2 and Model 3 methods are represented with blue, red and green dots, respectively. This plot illustrates that the correct specification of the mixed logit (i.e., Model 1) produces relatively better estimates of \(E\left(WT{P}_{{\alpha }_{1}}\right)\) than the two over-specified models (Models 2 and 3). In Group 2 (Fig. 2), the correct specification of the mixed logit (i.e., Model 2) again produces relatively better estimates of \(E\left(WT{P}_{{\alpha }_{1}}\right)\) than the under-specified model (Model 1), depending on the extent of the correlation \({\rho }_{\beta }\).

The simulation biases may seem negligible in Groups 1 and 2, but the bias increases in Group 3 (substitution patterns), depending on the extent of the correlation \({\rho }_{\alpha }\). Furthermore, erroneously restricting the correlation to zero (Model 1) leads to substantive under-specification biases in \(E\left(WT{P}_{{\alpha }_{1}}\right)\) regardless of the correlation. For analysts, we infer two additional lessons from Fig. 3. Its simulation bias may be due to the binary nature of the data. Even when the model is correctly specified, controlling for the correlation between alternative specific constants may be challenging due to insufficient variability. Lastly, any choice task that includes an opt-out, status quo, or referent alternative likely has substitution patterns (e.g., a red bus may substitute for blue bus along the same route), and added care in the estimation of WTP is warranted due to potential simulation and under specification biases.

According to Palma et al. (2020), 72% among 150 papers indexed in the Research Papers in Economics (RePEc) produced in the period of 2008–2018 use less than 500 Halton draws, and only 5.6% use more than 2000 Halton draws. In this paper, we likewise used 500 Halton draws to be consistent with the levels used in the leading MSL applications.

Czajkowski and Budziński (2019) found that Halton draws are the second best performing quasi-random numbers after Sobol draws, and they suggested that approximately 2000 Halton draws (or 1170 Sobol draws) are necessary to have a Minimum Tolerance Level of 5% in a setting with 400 individuals, 4 choice tasks per individual, 3 alternatives, and 5 random parameters. Our DCE design consists of 200 individuals, 25 choice tasks per individual, 3 alternatives and only 2 random parameters. Therefore, for a robustness test, we believe that 3000 Halton draws would be sufficient. However, our analysis with 3000 Halton draws did not identify substantive differences, which may be an indicator of that our initial choice of the number of Halton draws (i.e., 500) is sufficient. The results with 3000 Halton draws can found in the Supplementary Materials.

In summary, Table 10 provides the average percentage distance between the true and estimated \(E\left(WT{P}_{{\alpha }_{1}}\right)\) and their ranges for 500 and 3000 Halton draws. On average, the simulation bias is less than 1% and misspecification biases ranged from 2.5 to 3.6%. In our previous simulation study (Jumamyradov et al., 2023) we showed that the conditional logit parameter estimates (\({\alpha }_{1}\), \({\alpha }_{2}\) and \(\beta )\) may be biased depending on the structure of unobservable heterogeneity. Consequently, biases in the estimated parameters affect the estimated WTP, causing the WTP to be underestimated up to 20%. In combination with the results from this paper, some researchers may estimate the conditional logit model when their main interest is the coefficients, and estimate the mixed logit model when their main interest is WTP or similar parameters (e.g., quality-adjusted life years).

Table 10 Average Percentage Difference between the True and Estimated \(E\left(WT{P}_{{\alpha }_{1}}\right)\) and Their Ranges

There are three primary limitations of this simulation study. First, the design of this DCE was selected for its simplicity and ability to mimic uninformative and informative heterogeneity, including taste and substitution patterns. The DCE is based on a single full-factorial design with three alternatives, 200 respondents and 25 choice sets. Its parsimonious model contained a constant term and a single variable for each alternative. It is unclear what would come out of a much larger study of a more complicated design and model, similar to the ones researchers ultimately intend to use. Nevertheless, the results suggest that the use of the mixed logit (when correctly specified) estimates WTP well, but not variances and correlations. Further MSL implications along other purposes, such as market shares and substitution effects, are left for another day. Second, the data generation process in this study used only lognormal random parameters. Alternative distributions (e.g., symmetric or discrete) may lead to different results. Further work is necessary to investigate the implications of different distributions assumptions. Third, our specifications of unobservable heterogeneity allow only for interpersonal heterogeneity, however, some analysts may wish to account for both heterogeneity and intrapersonal variability (i.e., variability in a person’s parameters across tasks).

5 Conclusion

In this study, we illustrate the extent of biases in MSL estimates of mixed logit coefficients, variances and correlations using a standard DCE. We found that while estimating the mixed logit model with its correct specification may have little to none implications on the estimated WTP parameters, erroneously restricting or unrestricting correlations may lead up to 9% difference in the true and estimated WTP. Simulation and under specification biases are particularly detrimental to WTP estimates when alternative-specific constants are correlated (substitution patterns). Based on these findings, choice modellers may conduct similar simulations of their own DCE and assess the potential implications of these biases before relying on the MSL estimates of mixed logit parameters, particularly their variances and correlations.

Fig. 1
figure 1

\(E\left(WT{P}_{{\alpha }_{1}}\right)\) by Model Specification in the Group 1 (no Pattern)

Fig. 2
figure 2

\(E\left(WT{P}_{{\alpha }_{1}}\right)\) by Model Specification in the Group 2 (Taste Patterns)

Fig. 3
figure 3

\(E\left(WT{P}_{{\alpha }_{1}}\right)\) by the Model Specification in the Group 3 (Substitution Patterns)