Bayes Factor for Model Choice

Bozza, Silvia; Taroni, Franco; Biedermann, Alex

doi:10.1007/978-3-031-09839-0_2

Silvia Bozza^7,8,
Franco Taroni⁸ &
Alex Biedermann⁸

Part of the book series: Springer Texts in Statistics ((STS))

2827 Accesses

Abstract

This chapter addresses the problem of discrimination between competing propositions regarding selected features of a population of interest, also commonly known as “hypothesis testing”. Examples include counting processes when propositions refer to the proportion of items in a given population that show features of forensic interest (e.g., items with illegal content). Another typical example is the discrimination between competing propositions regarding the concentration of a controlled substance, such as drugs in blood, exceeding a given threshold. This chapter develops and explains the use of the Bayes factor for one-sided hypothesis testing involving model parameters in the form of a proportion and a mean. In both situations, additional factors (e.g., errors) are considered as well as aspects of decision making.

You have full access to this open access chapter, Download chapter PDF

2.1 Introduction

The Bayes factor can assist forensic scientists in the evaluation of findings when recipients of expert information need help in discriminating between propositions concerning, for example, a parameter of interest. A typical example is the discrimination between competing propositions regarding the concentration of a controlled substance (e.g., drugs in blood) with respect to a given threshold. This chapter will approach one-sided hypothesis testing involving model parameters in the form of a proportion (Sect. 2.2) and a mean (Sect. 2.3). In both situations, additional factors, such as errors (Sects. 2.2.2 and 2.3.2), are considered. Aspects of decision-making are also considered (Sects. 2.2.3 and 2.3.3).

Throughout this chapter, the Bayes factor will be obtained as a ratio of marginal likelihoods following the ideas described in Sect. 1.4. The greater marginal likelihood will support the respective proposition against the other. This, along with other aspects, such as the decision maker’s preferences among adverse consequences, has an impact on the decision-making process.

2.2 Proportion

A common problem in forensic practice is the investigation of the proportion of items or individuals that present a characteristic of interest, e.g., the proportion of seized pills containing a controlled substance or the proportion of counterfeit medicines in a given population. A consignment of items is considered a random sample from a super-population of items of the same type, and the parameter θ is the proportion of units in the super-population that present the target characteristic. Note that for consignments of very large size (i.e., several thousands), a finite number of units will correspond to each positive value of θ. For consignments of small size (i.e., smaller than 50), the parameter θ is a nuisance parameter (i.e., one that is not of primary interest) that can be integrated out, leaving a probability distribution for the unknown number of items having the target characteristic. For consignments of intermediate size, θ can be treated as a continuous value in the interval (0, 1) (e.g., Aitken et al., 2021). As an example, consider the following pair of propositions:

H ₁ ::: The proportion θ of items having the characteristic of interest is larger than θ ₀.
H ₂ ::: The proportion θ of items having the characteristic of interest is smaller than or equal to θ ₀,

where θ ₀ ∈ (0, 1) is a given threshold of legal interest.^{Footnote 1} Note that applications of this type of propositions are broad and include, for example, quality control of food (and other consumer products), the analysis of levels of contamination of laboratory equipment, and the extent of environmental pollution.

This section covers three main topics: (1) inference about an unknown proportion θ (Sect. 2.2.1), (2) inference about θ when background elements may affect the counting process (Sect. 2.2.2), and (3) decision regarding competing propositions about θ (Sect. 2.2.3).

2.2.1 Inference About a Proportion

Consider a case of inference about a population parameter based on a sample of size n. Aitken (1999) and Aitken et al. (2021) discuss how to choose a sample size. Suppose that among the n items, x shows a characteristic that is of interest from a legal point of view. The question then is how such an analytical result supports one or the other of the competing propositions regarding the proportion of items in the population that have the target characteristic.

Experiments of this kind can be regarded as Bernoulli trials (after the Swiss mathematician Jacob Bernoulli, 1654–1705), where trials are independent and give rise to one of the two mutually exclusive outcomes, conventionally labeled success and failure, with constant probability of success in each trial. The binomial distribution Bin(n, θ) is a statistical model for data that arise from a sequence of Bernoulli trials:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x\mid n,\theta)=\binom{n}{x} \theta^x(1-\theta)^{n-x}, \qquad \qquad x=0,1,\dots,n. \end{array} \end{aligned} $$

In the Bayesian perspective, the most common prior distribution for the parameter of interest θ is the beta distribution Be(α, β):

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(\theta\mid\alpha,\beta)=\theta^{\alpha-1}(1-\theta)^{\beta-1}/\mbox{B}(\alpha,\beta), \qquad \qquad 0<\theta <1\; ; \; \alpha,\beta >0, \end{array} \end{aligned} $$

with $\mbox{B}(\alpha ,\beta )=\frac {\varGamma (\alpha )\varGamma (\beta )}{\varGamma (\alpha +\beta )}$.

The beta prior distribution and the binomial likelihood are conjugate (see Sect. 1.10): after inspecting a sample, one can easily compute the posterior distribution, which is still beta, Be(α ^∗, β ^∗) with parameters updated according to well-known updating rules, α ^∗ = α + x, β ^∗ = β + n − x (e.g., Lee, 2012). The prior odds, the posterior odds, and the Bayes factor can be easily computed, as discussed in Sect. 1.4, by means of standard routines.

Example 2.1 (Counterfeit Medicines)

Consider a case in which a large batch of medicines (say, N > 50) is seized, suspected to contain counterfeit items. The following propositions are of interest:

H ₁ ::: The proportion θ of counterfeit medicines is greater than 0.2.
H ₂ ::: The proportion θ of counterfeit medicines in not greater than 0.2.

Suppose that, initially, limited information is available so that a uniform prior distribution is chosen over the interval (0, 1), that is, θ ∼Be(1, 1). Note that although a prior distribution Be(1, 1) is often called uninformative, it is in fact informative (see Sect. 1.10 and de Finetti (1993b)). It conveys the view that every value of θ in the interval (0, 1) is considered equally probable. The prior odds can then easily be obtained.

A uniform prior distribution clearly favors, a priori, hypothesis H ₁, that θ is greater than 0.2. Next, suppose that a sample of size 40 is analyzed and 12 out 40 items are found to be positive (counterfeit). The posterior distribution follows immediately and so the posterior odds and the Bayes factor.

The posterior probability of proposition H ₁ is, therefore, approximately 18 times greater than the posterior probability of the alternative proposition H ₂. Thus, the Bayes factor can be obtained as

The Bayes factor indicates that the evidence is in favor of proposition H ₁ that the proportion of counterfeit medicines is greater than 0.2, rather than proposition H ₂ (i.e., θ < 0.2). According to the verbal scale presented in Table 1.2, the BF weakly supports proposition H ₁ over H ₂.

To help specify the prior distribution, information in the form of data regarding similar consignments from cases with comparable circumstances may be used. Such data may suggest a distribution other than the uniform distribution used in the above example. An example of how to elicit a subjective prior distribution about a proportion is provided in Sect. 1.10. For a more extensive discussion about prior elicitation for a proportion, the reader can refer to O’Hagan et al. (2006). Forensically relevant applications of prior elicitation for θ are discussed in Aitken (1999). Note, however, that in certain practical applications, analytical results may be affected by further factors that cannot be dissociated from the observational process. An example for such a factor is considered is Sect. 2.2.2.

The analysis pursued above focused on the problem of inference about a proportion for a large batch. Consider now the case where the size N of the consignment is small (less than 50). Suppose a sample of size n is inspected and x items are found to present the target characteristic (e.g., yield a positive test result), so that θ ∼Be(α + x, β + n − x). Denote by Y the unknown number of positive items in the uninspected part of the consignment. This random variable has still a binomial distribution , Y ∼Bin(m, θ), where m = N − n represents the number of units that have not been inspected. The probability distribution for the unknown number of positive units can be obtained by integrating out parameter θ. This turns out to be a beta-binomial distribution Be-Bin(n, m, x, α, β):

$$\displaystyle \begin{aligned} &\Pr(Y=y\mid n,m,x,\alpha,\beta)\\&\ =\frac{\varGamma(n+\alpha+\beta)\binom{m}{y}\varGamma(y+x+\alpha)\varGamma(n+m-x-y+\beta)}{\varGamma(x+\alpha)\varGamma(n-x+\beta)\varGamma(n+m+\alpha+\beta)} (y=0,1,\dots,n) \end{aligned} $$

(2.1)

(Aitken, 1999).

Example 2.2 (Counterfeit Medicines—Small Consignment)

Consider Example 2.1 and suppose now that the consignment is small, say N = 40. Suppose further that a sample of size n = 10 has been inspected and that 2 items are found to be counterfeit. Starting from a uniform prior distribution θ ∼Be(1, 1), the beta posterior distribution becomes θ ∼Be(3, 9).

The distribution of Y then is Be-Bin(10, 30, 2, 1, 1). The probability to observe a given number of counterfeit items (e.g., y = 1) in the remainder of the consignment can be obtained using the function dbbinom that is available in the package extraDistr (Wolodzko, 2020).

One can also use the function pbbinom that allows to compute the cumulative distribution function for the beta-binomial random variable in (2.1). For example, the probability to observe at most 2 counterfeit items can be obtained as

A Bayesian network for inference about a proportion of a small consignment has been developed in Biedermann et al. (2008). Posterior probabilities for θ can easily be calculated with such models.

2.2.2 Background Elements Affecting Counting Processes

In many real-world applications, counting processes performed in forensic laboratories cannot be considered error-free. Examinations may be affected by inefficiencies and perturbing factors. For example, it may be that items are lost or missed during counting or that background elements are present, i.e., objects observationally indistinguishable from the target objects. This section addresses inferential challenges due to such background elements.

Suppose that x is the number of recorded successes, i.e., the number of times that the target characteristic is detected. However, the number x may not correspond to the number x _s of items actually showing the characteristic of interest but be affected by a certain number of background elements, x _b, that are wrongly counted as successes. This complication may typically arise in applications where the items of interest are small particles. Consider, for example, the assessment of rice quality in a context of food quality control . Rice quality can be measured by means of several features, such as the percentage of cracked or immature grains. For example, there may be legal provisions regarding the maximum tolerated amount of cracked grains.^{Footnote 2} It might then be of interest to compare alternative propositions according to which the percentage of cracked grains is above or below a given regulatory threshold. A key question is how to conduct such a comparison when the counting process may be affected by background elements, e.g., oil seeds in the example here.

While the number of elements actually showing the target characteristic is modeled as the outcome of a binomial distribution , X _s ∼Bin(n, θ), the amount of background elements affecting the counting process, x _b, can be modeled by a Poisson distribution , X _b ∼Pn(λ), where λ is the mean number of background elements (D’Agostini, 2004). The total number of recorded successes is therefore X = X _s + X _b. The graphical model (see e.g. Cowell et al., 1999) in Fig. 2.1 offers a schematic representation of the probabilistic relationship among the variables.

It can be shown^{Footnote 3} that X has the following probability distribution:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x\mid n,\theta,\lambda)=\sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}\mbox{e}^{-\lambda}\lambda^{x_b}/x_b! \end{array} \end{aligned} $$

Recall that prior uncertainty about θ can modeled by a beta distribution Be(α, β). The posterior distribution is then given by

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} f(\theta\mid n,x,\lambda)=\frac{\sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}e^{-\lambda}\lambda^{x_b}/x_b!\theta^{\alpha-1}(1-\theta)^{\beta-1}}{f(x\mid n,\lambda)B(\alpha,\beta)},\qquad \end{array} \end{aligned} $$

(2.2)

where the normalizing constant f(x∣n, λ) in the denominator is

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} f(x\mid n,\lambda)=\int f(x\mid n,\theta,\lambda)f(\theta)d\theta. \end{array} \end{aligned} $$

(2.3)

The posterior distribution (2.2) cannot be obtained in closed form as the integral characterizing the normalizing constant f(x∣, n, λ) is not tractable analytically. However, since it is possible to draw values from the beta distribution, the integral in (2.3) can be computed by Monte Carlo approximation as in (1.30), that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat{f}(x\mid n,\lambda)=\frac 1 N \sum_{i=1}^{N}f(x\mid n,\theta^{(i)},\lambda), \end{array} \end{aligned} $$

(2.4)

where θ ⁽ⁱ⁾ ∼Be(α, β).

Example 2.3 (Rice Quality)

Consider a consignment of rice and suppose that it is of interest to assess whether the proportion of cracked grains is below a given level of tolerance. The following competing propositions may be of interest:

H ₁ ::: The proportion θ of cracked grains is greater than 0.025.
H ₂ ::: The proportion θ of cracked grains is smaller than or equal to 0.025.

In a sample of 1000 grains, a total of 28 cracked grains are observed.

The beta prior distribution for θ needs to be elicited . Suppose that available knowledge indicates that it is implausible that the proportion of cracked grains is greater than 5%. An asymmetric prior distribution with a prior mass concentrated over values lower than 0.05 can be elicited as follows. Start with α = 1 and β = 1, then increment β by 1 until the shape of the beta distribution is such that $\Pr (\theta >0.05)$ is small, e.g., equal to 0.1.

The parameters α and β can thus be set equal to 1 and 45, respectively. Figure 2.2 (left) can be obtained with

The prior odds can now be computed in a straightforward manner.

This value, approximately 0.5, means that the probability of hypothesis H ₂ is, a priori, approximately 2 times greater than the probability of hypothesis H ₁.

Suppose that when inspecting a sample of 1000 rice grains, on average, 1 grain (e.g., oil seed) is wrongly counted as cracked. Parameter λ can thus be taken to be equal to 0.001.

First, we write a function dbinpois that computes the product between a binomial likelihood Bin(n, θ) at x − x _b and a Poisson likelihood Pn(λ) at x _b.

The unnormalized posterior distribution in (2.2)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}e^{-\lambda}\lambda^{x_b}/x_b!\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)} \end{array} \end{aligned} $$

is computed as

The normalizing constant f(x∣n, λ) can be approximated as in (2.4)

and the approximated posterior density, represented in Fig. 2.2 (right), can be obtained as

To calculate the BF, we need to obtain the posterior probabilities of the competing propositions H ₁ and H ₂. Consider proposition H ₂. The (approximate) posterior probability of proposition H ₂ can be obtained by Monte Carlo integration as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat\alpha_2& =&\displaystyle \frac{1}{\hat f (x\mid n,\lambda)}\int_0^{\theta_0} f(x\mid n,\theta,\lambda)f(\theta)d\theta\\ & =&\displaystyle \frac{\theta_0}{\hat f (x\mid n,\lambda)}\int_0^{\theta_0} f(x\mid n,\theta,\lambda)f(\theta)\frac{1}{\theta_0}d\theta\\ & \approx &\displaystyle \frac{\theta_0}{\hat f (x\mid n,\lambda)}\cdot \frac{1}{N}\sum_{i=1}^{N}f(x\mid n,\theta^i,\lambda)f(\theta^i)d\theta, \end{array} \end{aligned} $$

(2.5)

where θ ⁱ is sampled from a uniform distribution in the interval (0, θ ₀), θ ⁱ ∼Unif(0, θ ₀), and the normalizing constant f(x∣n, λ) is approximated as in (2.4). The (approximate) posterior probability of hypothesis H ₁ is $1-\hat {\alpha }_{2}$. The (approximated) BF will be

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \widehat{\mbox{BF }}=\frac{\widehat{\alpha_1}/\widehat{\alpha_2}}{\pi_1/\pi_2}. \end{array} \end{aligned} $$

(2.6)

Example 2.4 (Rice Quality—Continued)

Consider the scenario described in Example 2.3, and compute the (approximate) posterior probability of the hypothesis H ₂: the proportion θ of cracked grains is smaller than or equal to 0.025 (as in 2.5).

The (approximate) posterior probability of hypothesis H ₁ then is $\hat \alpha _1=0.6925$. This is highlighted by the gray shaded area in Fig. 2.2 (right). The posterior odds and the BF therefore are

The Bayes factor indicates that the evidence favors hypothesis H ₁, i.e., θ > 0.025, over H ₂, i.e., θ ≤ 0.025. A BF of approximately 5 provides limited support for the hypothesis H ₁. Note that the results obtained by the laboratory analyses clearly affect our belief about θ. The analytical results change prior odds in favor of H ₁ (0.47) to posterior odds of approximately 2.25 in favor of H ₁.

2.2.2.1 Sensitivity to Monte Carlo Approximation

The Monte Carlo estimate of the Bayes factor obtained in (2.6) is subject to variability, which may be a source of concern. Figure 2.3 provides an illustration of BF variability. The figure shows 500 realizations of the BF approximation in (2.6).

The purpose of the graphical representation in Fig. 2.3 is to illustrate that the repeated application of the procedure leads to a distribution of BFs. While the Monte Carlo estimate is not an exact value, it can be shown that the approximation error can be made arbitrarily small by generating a sufficiently large amount of observations. For a large number of simulations, it can also be proven, by Central Limit Theorem, that the error $\mid \hat f(x)-f(x)\mid \sqrt {N}$ is normally distributed. This can be used to analyze the variability of the Monte Carlo estimate (see, e.g., Marin and Robert (2014)). Note that the shape of the histogram is roughly symmetric and bell-shaped, as shown in Fig. 2.3.

It is worth noting that other, more efficient ways than traditional Monte Carlo methods may be implemented to compute the integrals related to the posterior probabilities of the competing hypotheses. Importance sampling (see Sect. 1.8), for example, can improve the integral approximation. It can also be used when the target density is unnormalized. Consider again the posterior probability of hypothesis H ₂:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \alpha_2=\int_0^{\theta_0} \frac{f(x\mid n,\theta,\lambda)f(\theta)}{f (x\mid n,\lambda)}d\theta. \end{array} \end{aligned} $$

This can be rewritten as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \alpha_2& =&\displaystyle \frac{1}{f (x\mid n,\lambda)}\int_0^{1} h(\theta)f(x\mid n,\theta,\lambda)f(\theta)\frac{g(\theta)}{g(\theta)}d\theta\\ {} & =&\displaystyle \frac{1}{f (x\mid n,\lambda)}\int_0^{1} h(\theta)w(\theta)g(\theta)d\theta, \end{array} \end{aligned} $$

where

$$\displaystyle \begin{aligned} h(\theta)=\left\{ \begin{array}{ccc} 1 & \mbox{if} & 0<\theta<\theta_0\\ &&\\ 0 & \mbox{if} & \theta_0\le \theta<1, \end{array} \right. \end{aligned}$$

w(θ) = f(x∣n, θ, λ)f(θ)∕g(θ) and g(θ) is the importance sampling function.

The posterior probability α ₂ can be approximated as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat\alpha_2=\frac{\frac{1}{N}\sum_{i=1}^{N}h(\theta^i)w(\theta^i)}{\frac{1}{N}\sum_{i=1}^{N}w(\theta^i)}, \end{array} \end{aligned} $$

(2.7)

where θ ⁱ ∼ g(θ).

Example 2.5 (Rice Quality—Continued) A Be(20, 780) is chosen as importance sampling function g(θ). It can be readily verified that it is centered at 0.025 and that the density rapidly collapses toward zero for values greater than 0.04. This will avoid the generation of points for which the integrand is close to zero, with a very modest contribution to the approximation. Next, sample 10000 values from this distribution.

The posterior probability α ₂ of hypothesis H ₂ can be obtained as in (2.7)

Figure 2.4 provides an illustration of BF variability. Notice that while the BFs in Figs. 2.3 and 2.4 have roughly the same location, the importance sampling in (2.7) produced an increase in precision.

It is important to understand that the resulting distribution does not mean that there is a distribution for a given BF because the BF, by definition, is a single number. See, e.g., Taroni et al. (2016) and Biedermann et al. (2017a) for discussions of this topic among forensic statisticians and forensic scientists. The error resulting from the implementation of numerical techniques is an important source of information about which the scientist should be transparent. Following ideas presented in Tanner (1996), recently reconsidered by Ommen et al. (2017) in a forensic context, the numerical precision in the overall approximated value can be estimated by the associated Monte Carlo standard error .

2.2.2.2 Unknown Expected Value of the Number of Background Elements

It is important to note that, contrary to what was developed in Example 2.3, the expected value λ of the number of background events is generally unknown. The uncertainty about λ can be modeled by means of a gamma distribution , λ ∼Ga(a, b). The marginal posterior distribution of parameter θ, written f(θ∣n, x), now takes a more complicated form as one needs to handle the joint posterior distribution that is proportional to

$$\displaystyle \begin{aligned} &f(\theta,\lambda\mid n,x)\\ & \qquad \propto \sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}\frac{\mbox{ e}^{-\lambda}\lambda^{x_b}}{x_b!}\theta^{\alpha-1}(1-\theta)^{\beta-1}\lambda^{a-1}\mbox{e}^{-b\lambda}. \end{aligned} $$

(2.8)

Following ideas described in Taroni et al. (2010), a two-block M–H algorithm (Sect. 1.8) can be implemented in order to draw a sample from the joint posterior distribution in (2.8). For each block, the candidate generating density is taken to be Normal with the mean equal to the current value of the parameter and the variance chosen so as to obtain a good acceptance rate (Gamerman & Lopes, 2006).

Consider the parameter θ first. The full conditional density of θ is proportional to

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_1(\theta\mid\lambda,n,x)\propto \sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}\frac{\lambda^{x_b}}{x_b!}\theta^{\alpha-1}(1-\theta)^{\beta-1}. \end{array} \end{aligned} $$

Starting from the current value for θ, say θ ⁽ⁱ⁻¹⁾, a candidate value θ ^prop for θ can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \theta^{\text{prop}}=\frac{\mbox{e}^{\psi^{\text{prop}}}}{1+\mbox{e}^{\psi^{\text{prop}}}}, \qquad \mbox{where}\;\; \psi^{\text{prop}}\sim\mbox{ N}\left(\psi^{(i-1)},\tau_1^2\right), \end{array} \end{aligned} $$

and $\psi ^{(i-1)}= \log \left (\frac {\theta ^{(i-1)}}{1-\theta ^{(i-1)}}\right )$. In this way, the proposed value θ ^prop will be defined in the interval (0, 1). The candidate value θ ^prop is accepted with probability

$$\displaystyle \begin{aligned} \begin{array}{rcl} \alpha(\psi^{(i-1)},\psi^{\text{prop}})=\min\left\{1,\frac{f(\psi^{\text{prop}}\mid\lambda^{(i-1)})}{f(\psi^{(i-1)}\mid\lambda^{(i)})}\right\}, \end{array} \end{aligned} $$

where f(ψ∣λ) is the reparametrized full conditional density of parameter θ and can be obtained as

$$\displaystyle \begin{aligned} f(\psi\mid\lambda)=\frac{\mbox{e}^{\psi}}{(1+\mbox{e}^{\psi})^2}f_1\left(\frac{\mbox{e}^{\psi}}{(1+\mbox{e}^{\psi})^2}\mid\lambda,n,x\right). \end{aligned}$$

See, e.g., Casella and Berger (2002) for distributions of functions of random variables.

If the candidate θ ^prop is accepted, it becomes the current value of the chain, i.e., θ ⁽ⁱ⁾ = θ ^prop; otherwise θ ⁽ⁱ⁾ = θ ⁽ⁱ⁻¹⁾.

The second block refers to parameter λ. The full conditional density of parameter λ is proportional to

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_2(\lambda\mid\theta, n,x)\propto \sum_{x_b=0}^{x}\binom{n}{x-x_b}\theta^{x-x_b}(1-\theta)^{n-x+x_b}\frac{\mbox{e}^{-\lambda}\lambda^{x_b}}{x_b!}\lambda^{a-1}\mbox{ e}^{-b\lambda}. \end{array} \end{aligned} $$

Starting from the current value for λ, say λ ⁽ⁱ⁻¹⁾, a candidate value λ ^prop for λ can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \lambda^{\text{prop}}=\mbox{e}^{\phi^{\text{prop}}}, \qquad \mbox{where}\; \phi^{\text{prop}}\sim\mbox{N}\left(\phi^{(i-1)},\tau_2^2\right), \end{array} \end{aligned} $$

and $\phi ^{(i-1)}=\log \lambda ^{(i-1)}$. In this way, the proposed value λ ^prop will be defined in the interval (0, ∞). The candidate value λ ^prop is accepted with probability

$$\displaystyle \begin{aligned} \begin{array}{rcl} \alpha(\phi^{(i-1)},\phi^{\text{prop}})=\min\left\{1,\frac{f(\phi^{\text{prop}}\mid\theta^{(i-1)})}{f(\phi^{(i-1)}\mid\theta^{(i-1)})}\right\}, \end{array} \end{aligned} $$

where f(ϕ∣θ) is the reparametrized full conditional density of parameter λ and can be obtained as

$$\displaystyle \begin{aligned} f(\phi\mid\theta)=\mbox{e}^{\phi}f_2(\mbox{e}^{\phi}\mid\theta,n,x). \end{aligned}$$

If the candidate λ ^prop is accepted, it becomes the current value of the chain, i.e., λ ⁽ⁱ⁾ = λ ^prop; otherwise λ ⁽ⁱ⁾ = λ ⁽ⁱ⁻¹⁾.

The two-block M–H algorithm can be summarized as follows:

Initialization: start with arbitrary values θ ⁽⁰⁾ and λ ⁽⁰⁾

Iteration i:

1.
Given θ ⁽ⁱ⁻¹⁾ and λ ⁽ⁱ⁻¹⁾,
- Generate θ ^prop according to f ₁(θ∣λ ⁽ⁱ⁻¹⁾, n, x).
- With probability α(θ ⁽ⁱ⁻¹⁾, θ ^prop) accept θ ^prop and set θ ⁽ⁱ⁾ = θ ^prop; otherwise reject θ ^prop and set θ ⁽ⁱ⁾ = θ ⁽ⁱ⁻¹⁾.
2.
Given θ ⁽ⁱ⁾ and λ ⁽ⁱ⁻¹⁾,
- Generate λ ^prop according to f ₂(λ∣θ ⁽ⁱ⁾, n, x).
- With probability α(λ ⁽ⁱ⁻¹⁾, λ ^prop) accept λ ^prop and set λ ⁽ⁱ⁾ = λ ^prop; otherwise reject λ ^prop and set λ ⁽ⁱ⁾ = λ ⁽ⁱ⁻¹⁾.

Return $\{\theta ^{(n_b+1)},\dots ,\theta ^{(N)}\}$ and $\{\lambda ^{(n_b+1)},\dots ,\lambda ^{(N)}\}$,

where n _b is the burn-in period and N is the number of iterations.

Example 2.6 (Rice Quality—Continued)

Consider again Example 2.3 where prior uncertainty about θ was modeled by a Be(1, 45) distribution, and the parameter λ was set equal to 0.001. For the purpose of the example here, a gamma distribution with parameters a = 2 and b = 1000 is used to model prior uncertainty about λ. The prior density Ga(2, 1000) is shown in Fig. 2.5. It can be observed that the prior mass is concentrated at very small values of λ.

Let the starting values for θ and λ be θ ⁽⁰⁾ = 0.1 and λ ⁽⁰⁾ = 0.001, and the variances $\tau ^2_1$ and $\tau ^2_2$ of the proposal densities be set equal to 0.7 and 3, respectively.

Current values of the parameters θ and λ will be stored in a vector called thetav and lambdav, respectively.

Before running the algorithm, it is useful to introduce the following functions: mh1 is used to obtain the candidate (current) value θ ^prop (θ ^curr); mh2 is used to calculate the probability of acceptance of the candidate value θ ^prop; dbinpois computes the product between a binomial likelihood Bin(n, θ) at x − x _b and a Poisson likelihood at x _b.

The MCMC algorithm is run over 15000 iterations, with a burn-in range of 5000 iterations.

These values represent the acceptance rates for θ and λ, respectively.

The output of the simulation run is shown in Fig. 2.6, representing the trace-plot , the autocorrelation plot (showing the correlation structure of the sequences), and the histogram of the simulated draws for θ (left column) and λ (right column). The simulated draws have an acceptance rate of approximately 31% for θ and 30% for λ. The trace-plots of simulated draws look like random noise and the autocorrelation decreases rapidly as the time lag at which it is calculated increases.

Note that the argument ci=0 in the function acf for computing and plotting the estimate of the autocorrelation function suppresses the plot of the confidence interval.

The simulated values $\theta ^{(n_b+1)},\dots ,\theta ^{(N)}$ can serve as draws from the posterior distribution f ₁(θ∣λ, n, x). The posterior probability of hypothesis H ₁ can then be approximated as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \widehat{\alpha_1}=\sum_{\theta^{(i)}>0.025}\theta^{(i)}/(N-n_b), \end{array} \end{aligned} $$

(2.9)

and the BF can be obtained straightforwardly.

Example 2.7 (Rice Quality—Continued)

Using a burn-in range of 5000 iterations, the average value of parameter θ over the last 10000 iterations can be computed as

The posterior probability of hypothesis H ₁ can be approximated as in (2.9):

Recall that the prior odds have been quantified previously as approximately 0.47. The Bayes factor then is

The uncertainty about the presence of background elements, modeled by λ, modifies the value of the BF from approximately 4.77 to 5.2. This change is small. The BF still provides only weak support for the hypothesis H ₁ that θ > 0.025, compared to H ₂.

2.2.3 Decision for a Proportion

The normative framework for decision-making introduced in Chap. 1 is well suited for addressing problems of statistical inference presented in this chapter. Consider again a pair of competing propositions as defined in Sect. 2.2 regarding the question of whether the proportion of items showing a target characteristic of interest is greater (H ₁) or not greater (H ₂) than a given threshold θ ₀. From a decision-theoretic point of view, two courses of action are possible: d ₁ and d ₂. Decision d ₁ amounts to accepting the view that the proportion θ is greater than a given (legal) threshold , θ ₀. Decision d ₂ amounts to accepting the view that θ is smaller than or equal to the threshold θ ₀. A possible loss function L(⋅) for such a two-action decision problem is

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mbox{L}(d_1,\theta)=\left\{ \begin{array}{ll} 0 & \mbox{if } \theta\in\varTheta_1,\\ &\\ l_1(\theta_0-\theta) & \mbox{if } \theta\in\varTheta_2. \\ \end{array} \right. \;\;\;\; \mbox{L}(d_2,\theta)=\left\{ \begin{array}{ll} 0 & \mbox{if } \theta\in\varTheta_2,\\ &\\ l_2(\theta-\theta_0) & \mbox{if } \theta\in\varTheta_1.\\ \end{array} \right. \end{array} \end{aligned} $$

(2.10)

This is a linear loss function where the loss is proportional to the magnitude of the error (e.g., θ ₀ − θ). An example is shown in Fig. 2.7, where θ ₀ = 0.2, and loss values l ₁ and l ₂ are equal to 1.

Given this loss function, the Bayesian posterior expected loss for d ₁, that is accepting H ₁ : θ > θ ₀, is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{EL} (d_1\mid x)=\int_{\varTheta_2}l_1\theta_0f(\theta\mid x)\mbox{d}\theta-\int_{\varTheta_2}l_1\theta f(\theta\mid x)\mbox{d}\theta, \end{array} \end{aligned} $$

where f(θ∣x) = Be(α ^∗ = α + x, β ^∗ = β + n − x). Similarly, the Bayesian posterior expected loss for d ₂, that is accepting H ₂ : θ ≤ θ ₀, is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{EL} (d_2\mid x)=\int_{\varTheta_1}l_2\theta f(\theta\mid x)\mbox{d}\theta-\int_{\varTheta_1}l_2\theta_0f(\theta\mid x)\mbox{d}\theta. \end{array} \end{aligned} $$

After some algebra, it can be shown (Taroni et al., 2010) that

$$\displaystyle \begin{aligned} \mbox{EL} (d_1\mid x)=l_1\theta_0\Pr(\theta<\theta_0\mid \alpha^*,\beta^*) -l_1\frac{\alpha+x}{\alpha+\beta+n}\Pr(\theta<\theta_0\mid\alpha^*+1,\beta^*), \end{aligned} $$

(2.11)

and

$$\displaystyle \begin{aligned} \mbox{EL} (d_2\mid x)=l_2\frac{\alpha+x}{\alpha+\beta+n}\Pr(\theta>\theta_0\mid\alpha^*+1,\beta^*) -l_2\theta_0\Pr(\theta>\theta_0\mid \alpha^*,\beta^*). \end{aligned} $$

(2.12)

The decision criterion then is to decide d ₁ (d ₂) whenever EL(d ₁) is smaller (greater) than EL(d ₂).

Example 2.8 (Counterfeit Medicines—Continued)

Recall Example 2.1 where the competing propositions refer to the proportion of counterfeit medicines that may be either greater or not greater than a given limiting value, e.g., θ ₀ = 0.2. Consider a uniform prior Be(1, 1) for θ and the finding that 12 out 40 items are positive. Consider a linear loss function as in (2.10), with l ₁ = 1 and l ₂ = 1. This is a symmetric loss , reflecting the idea that falsely deciding that the proportion is greater than the threshold is as undesirable, and hence as severely penalized, as falsely deciding that the proportion is smaller than the threshold. The expected losses of decisions d ₁ and d ₂ are computed as in (2.11) and (2.12).

The optimal decision thus is d ₁, since it minimizes the expected loss. Given prior beliefs, the observed data, and personal loss assignments, the optimal course of action is to decide in favor of proposition H ₁ according to which the proportion of counterfeit medicines is greater than 0.2.

A decision maker may find a “0 − l _i” loss function , as shown in Table 1.4, more appropriate. Consider again the case discussed in Sect. 2.2.1 where it was of interest to compare the hypotheses that the proportion of counterfeit medicines in a seizure was greater (H ₁) or not greater (H ₂) than a given threshold θ ₀. In such a context, the loss l ₁ (i.e., the loss incurred when deciding d ₁ and H ₂ is true) could amount to the net loss represented by expenses incurred by issuing legal proceedings in a non-priority case (i.e., falsely considering θ > θ ₀). In turn, loss l ₂ could amount to monetary value of property that could have been confiscated by investigative authorities in a meritorious case. Following results in Sect. 1.9, the decision criterion becomes

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{decide}\; d_1 \;\mbox{if}\quad \frac{\alpha_1}{\alpha_2}>\frac{l_1}{l_2} \quad \text{or}\quad \mbox{BF }>\frac{l_1/l_2}{\pi_1/\pi_2}. \end{array} \end{aligned} $$

Decision d ₁ is to be preferred to decision d ₂ if and only if the posterior odds in favor of H ₁ are greater than the ratio of the losses of adverse outcomes or, alternatively, if the BF is greater than the ratio between the loss ratio of adverse outcomes and the prior odds.

Decision makers may find it difficult to assign losses l ₁ and l ₂. Note, however, that when adverse outcomes are considered equally undesirable, then the loss ratio simplifies to 1, and the decision criterion becomes to decide d ₁ whenever the posterior odds are larger than 1, i.e., the posterior probability of hypothesis H ₁ is greater than the posterior probability of hypothesis H ₂. In turn, when adverse consequences are not equally undesirable, a decision maker may consider how much more (less) undesirable one adverse outcome is compared to the other. This can be expressed as l ₁ = kl ₂, i.e., by specifying how much worse deciding d ₁ is when θ ≤ θ ₀ is true, compared to deciding d ₂ when θ > θ ₀ is true (Biedermann et al., 2016b). A sensitivity analysis can be performed for different values of k.

2.3 Normal Mean

Toxicology laboratories are frequently asked to quantify the amount of target substance (e.g., alcohol, illegal drugs, particular metabolites, etc.) in samples such as blood, urine, and hair in order to help assess whether an unknown target quantity θ (e.g., the level of alcohol in blood) exceeds a given value (e.g., a legal threshold). Competing propositions of interest may be specified as follows:

H ₁ ::: The target quantity θ exceeds a given level θ ₀.
H ₂ ::: The target quantity θ is equal to or smaller than a given level θ ₀.

This section considers three main topics: (1) inference about an unknown quantity θ (Sect. 2.3.1), (2) inference about θ in presence of factors influencing the measurement process (Sect. 2.3.2), and (3) decision about competing propositions regarding θ (Sect. 2.3.3).

2.3.1 Inference About a Normal Mean

Consider the hypothetical case of a person, Mr. X, stopped by traffic police because of suspicion of driving under the influence of a given substance (e.g., alcohol or THC). A blood sample is taken and a series of analyses are performed by a forensic laboratory. The propositions of interest may be, for example, that “The quantity θ of target substance in Mr. X’s blood exceeds the legal threshold θ ₀” (H ₁) versus the alternative proposition “The quantity θ of target substance in Mr. X’s blood is smaller than or equal to the legal threshold θ ₀” (H ₂). A series of measurements x are obtained. It is often reasonable to assume that such measurements follow a Normal distribution N(θ, σ ²):

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(x\mid\theta,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{1}{2\sigma^2}(x-\theta)^2\right\}, \end{array} \end{aligned} $$

where the mean θ is the unknown quantity of target substance. The variance σ ² can be approximated from previous ad hoc calibrations (see discussion by Howson and Urbach (1996)). The most common prior distribution for the Normal mean θ is itself a Normal distribution N(μ, τ ²):

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(\theta\mid\mu,\tau^2)=\frac{1}{\sqrt{2\pi\tau^2}}\exp\left\{-\frac{1}{2\tau^2}(\theta-\mu)^2\right\}, \end{array} \end{aligned} $$

where the hyperparameters μ and τ ² are often called prior mean and prior variance, respectively.

The posterior distribution of the target quantity θ is still a Normal distribution, denoted $\mbox{N}(\mu _x,\tau ^2_x)$, because the Normal prior and the Normal likelihood are conjugate . Generalizing the updating formulae (1.19) and (1.20) to the case where a vector of n measurements (x ₁, …, x _n) is available leads to

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mu_x = \frac{\sigma^2/n}{\sigma^2/n +\tau^2} \mu+ \frac{\tau^2}{\sigma^2/n +\tau^2}\bar x \end{array} \end{aligned} $$

(2.13)

and

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \tau^2_x=\frac{\tau^2\sigma^2/n}{\sigma^2/n+\tau^2}, \end{array} \end{aligned} $$

(2.14)

where $\bar x=\sum _{i=1}^{n}x_i/n$.

The posterior mean μ _x and the posterior variance $\tau ^2_x$ can be calculated by means of the function post_distr.

The prior odds, the posterior odds, and the Bayes factor can be easily computed, as discussed in Sect. 1.4, by means of standard routines (see Example 2.9). The case where the population variance σ ² is unknown and a prior distribution must be specified for both parameters (θ, σ ²) will be addressed in Sect. 3.3.2.

Example 2.9 (Alcohol Concentration in Blood)

A person is stopped by traffic police because of suspicion of driving under the influence of alcohol. Two measurements are obtained by the laboratory, 0.4866 g/kg and 0.5078 g/kg. The population variance σ ² is known and is taken to be equal to 0.023². Available information, e.g., the fact that the person has been stopped by traffic police while driving late in the night, exceeding the speed limit etc., suggests a prior mean equal to 0.8 and a prior variance equal to 0.15², say θ ∼ N(μ = 0.8, τ ² = 0.15²). This amounts to say that, a priori, values for the alcohol level in blood lower than 0.35 and larger than 1.25 are considered extremely implausible (prior probabilities for values outside this range are on the order of 0.01).

The propositions of interest are the following:

H ₁ ::: The alcohol level θ in the blood of Mr. X exceeds the legal threshold θ ₀ = 0.5 (θ > 0.5).
H ₂ ::: The alcohol level in the blood of Mr. X is smaller than or equal to the legal threshold θ ₀ = 0.5 (θ ≤ 0.5).

The prior odds can be easily computed as follows:

The probability of hypothesis H ₁ is, a priori, approximately 43 times greater than the probability of the alternative hypothesis H ₂. Consider now the effect of the measurements made on the blood sample.

The posterior distribution of the quantity of alcohol in blood θ is, therefore, N(0.5007, 3e − 04). The posterior odds are

The ratio between posterior and prior odds gives the Bayes factor:

The probability to obtain the two measurements if Mr X’s alcohol level in blood does not exceed the legal threshold θ ₀ = 0.5 is approximately 40 times greater than given the proposition that the blood alcohol level is greater than the legal threshold. The evidence thus provides moderate support for the hypothesis H ₂, compared to H ₁.

2.3.1.1 Choosing the Parameters of the Normal Prior for the Mean

If the experimenter has no reason to consider the distribution describing prior uncertainty about the unknown quantity θ to be asymmetric, then a choice may be made in the family of Normal distributions. When choosing a member from this family, the analyst will need to assign a value to the prior mean μ and a value to the prior standard deviation τ. To elicit a Normal prior, it is useful to recall that for a Normal distribution θ ∼N(μ, τ ²), approximately 99.7% of values are within 3 standard deviation from the mean, thus

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \Pr\left\{\mu-3\tau\le\theta\le\mu+3\tau\right\}\approx 0.997. \end{array} \end{aligned} $$

Hence, if the practitioner can assign a measure of location μ and a pair of values that define the upper and lower bounds of an interval that covers a range of plausible values of the unknown quantity θ, then the standard deviation can be assigned as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \tau=\frac{l_{\mbox{up}}-\mu}{3}, \end{array} \end{aligned} $$

(2.15)

where l _up is the upper bound mentioned above. In Example 2.9, a prior location was fixed at μ = 0.8. Moreover, prior probabilities for values smaller than 0.35 and greater than 1.25 were extremely small (i.e., on the order of 0.01). The standard deviation has been elicited as in (2.15).

It may be worth to inspect the reasonableness of the elicited prior. This includes, as highlighted in Sect. 1.10, producing a graphical representation to see whether the amount of available information is suitably conveyed. Consider a random sample of size n _e from a Normal population providing an equivalent amount of information conveyed by the prior. The equivalent sample size n _e can be found by matching the prior variance τ ² to the dispersion from the sample, σ ²∕n _e, and solving for n _e. The smaller n _e, the weaker will be prior beliefs, and the more the posterior distribution will be influenced by even a modest amount of data. Vice versa, the larger n _e, the stronger will be the prior beliefs, and the more the posterior distribution will be dominated by the prior. Thus, more data will be necessary to make a substantial impact on prior beliefs.

Whenever the state of information is such as to consider all possible values of θ equally plausible, a locally uniform prior can be defined:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(\theta)\propto \mbox{constant}. \end{array} \end{aligned} $$

In the latter case, the posterior distribution of θ is a Normal distribution centered at the sample mean $\bar x$ with spread parameter equal to σ ²∕n (e.g., Bolstad & Curran, 2017).

2.3.1.2 Sensitivity to the Choice of the Prior Distribution

As noted in Sect. 1.11, the marginal likelihood is highly sensitive to the choice of the prior distribution and so is the Bayes factor. Thus, it should be emphasized that the BF obtained in Example 2.9, the value 0.02, does not depend on the data alone. It also depends on the choice of the prior distribution on θ.

For the purpose of illustration, consider a sensitivity analysis for the hyperparameters that characterize the prior distribution for the unknown level of alcohol in blood . Let values of μ range from 0.4 to 1 and the prior variance τ ² be fixed and equal to 0.0225.

The prior odds, the posterior odds, and the BF can be calculated for all possible values of the prior mean μ (pm). Note that computing the posterior Normal distribution with the function post_distr, using several possible values for the prior mean μ, returns an output vector of length n = 61 whose first n − 1 = 60 elements represent the posterior mean, while the last element represents the posterior variance.

Figure 2.8 shows the prior probability π ₁ of proposition H ₁, the posterior probability α ₁, and the BF in favor of proposition H ₁ for values of the prior mean μ ranging from 0.4 to 1.

Note that the BF favors proposition H ₁ (i.e., a BF greater than 1) over H ₂ only for values of μ smaller than 0.47. Most importantly, one can observe the impact of the prior assessments (i.e., different choices of the prior mean μ) on the value of the BF. The higher the prior probability of proposition H ₁, the lower is the value of the measurements x = (0.4866, 0.5078) in terms of the BF in favor of H ₁ over H ₂ Note, however, that the BF in the latter case represents strong support for H ₂ over H ₁.

2.3.2 Continuous Measurements Affected by Errors

As noted in Sect. 2.2.2, a measurement process or observations may be affected by background noise. Consider a case in which it is of interest to assess the height of an individual based on video recordings made by a surveillance camera during a bank robbery. Propositions of interest may be as follows:

H ₁ ::: The height of the individual is less than 180 cm.
H ₂ ::: The height of the individual is equal to or greater than 180 cm.

Assume that the height measurements x of an individual are normally distributed, X ∼N(θ, σ ²), where θ represents the true height of the individual and σ ² represents the variance of the measurement device. Assume also that the variance σ ² is inferred from previous ad hoc experiments. However, the measured height is, generally, affected by an error ξ, related to the circumstances under which the recording was made. Factors of interest here include the posture and movements of the person, the type of clothing (including headwear and shoes) and lighting conditions. Such circumstances represent a further source of variation δ ², unrelated to σ ². The measured height is therefore X ∼N(θ + ξ, σ ² + δ ²). A conjugate Normal prior distribution N(μ, τ ²) is taken to model prior uncertainty about θ. The values of the parameters ξ and δ ² are case-specific assignments. It can be shown that the posterior distribution of the true height θ is still Normal with mean

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mu_x=\frac{\tau^2(\bar x-\xi)+\mu(\sigma^2+\delta^2)/n}{\tau^2+(\sigma^2+\delta^2)/n} \end{array} \end{aligned} $$

(2.16)

and variance

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \tau^2_x=\frac{\tau^2(\sigma^2+\delta^2)/n}{\tau^2+(\sigma^2+\delta^2)/n}. \end{array} \end{aligned} $$

(2.17)

Example 2.10 (Image Analysis)

Consider the hypothetical case introduced above and assume that, according to eyewitness testimony, the height of the perpetrator is approximately between 175 cm and 185 cm. This allows one to define a prior probability distribution for the height θ centered at 180 cm with variance equal to 2.79 cm, i.e., θ ∼N(180, 2.79). The standard deviation can be quantified as in (2.15):

Thus, the two hypotheses H ₁ and H ₂ introduced above are, a priori, equally probable (hence, the prior odds equal 1).

The available recordings depict an individual appearing in n = 10 images. Height measurements yield the sample mean $\bar x=180.25$. The variance of the measurement procedure is known and equal to σ ² = 0.12. The experimental setting is such that the values for the parameters of the Normal distribution of the error can be set to ξ = 0.5 and δ ² = 1.

The posterior mean and the posterior variance of θ can be computed as in (2.16) and (2.17), respectively.

The gray shaded area in Fig. 2.9 shows the posterior probability of the hypothesis H ₁. The posterior odds and the Bayes factor can be obtained straightforwardly

Given that the prior odds are 1, the BF is numerically equivalent to the posterior odds. This value represents support for the hypothesis H ₁ (the height of the individual is lower than 180 cm) over H ₂. Specifically, the BF indicates that it is approximately 3 times more probable to obtain such height measurements if the height of the individual is less than 180 cm than if the height is equal to or greater than 180 cm.

2.3.3 Decision for a Mean

The previous sections focused on how to draw a probabilistic inference about a Normal mean, using the Bayes factor. Recall that the competing propositions were:

H ₁ ::: The target quantity θ exceeds a given level θ ₀.
H ₂ ::: The target quantity θ is equal to or smaller than a given level θ ₀.

A related question is how to decide about whether or not a quantity of interest is above a given (legal) threshold , i.e., accepting either H ₁ or H ₂. In order to address this question, it is necessary to introduce a loss function to take into account the decision maker’s preferences. Suppose a linear loss function is considered as in (2.18):

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mbox{L}(d_1,\theta)=\left\{ \begin{array}{ll} 0 & \mbox{if } \theta >\theta_0,\\ &\\ l_1(\theta_0-\theta) & \mbox{if } \theta\le \theta_0. \\ \end{array} \right. \;\;\;\; \mbox{L}(d_2,\theta)=\left\{ \begin{array}{ll} 0 & \mbox{if } \theta\le \theta_0,\\ &\\ l_2(\theta-\theta_0) & \mbox{if } \theta>\theta_0.\\ \end{array} \right. \end{array} \end{aligned} $$

(2.18)

The Bayesian posterior expected loss of decision d ₁ can be computed as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mbox{EL}(d_1\mid x)& =&\displaystyle l_1\int_{\theta\le \theta_0}(\theta_0-\theta)f(\theta\mid x)d\theta\\ & =&\displaystyle l_1\tau_x\left[\phi(t)+t\int_0^t \phi(s)ds\right], \end{array} \end{aligned} $$

(2.19)

where f(θ∣x) is a Normal posterior distribution with parameters μ _x and τ ²(x) as in (2.13) and (2.14), t = τ _x(θ ₀ − μ _x), while ϕ(⋅) denotes the probability density of a standardized Normal distribution (Bernardo & Smith, 2000).

In turn, the Bayesian posterior expected loss of decision d ₂ can be computed as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mbox{EL}(d_2\mid x)& =&\displaystyle l_2\int_{\theta>\theta_0}(\theta-\theta_0)f(\theta\mid x)d\theta\\ & =&\displaystyle l_2\tau_x\left[\phi(t)-t\int_t^{\infty}\phi(s)ds\right]. \end{array} \end{aligned} $$

(2.20)

Again, the decision criterion amounts to deciding d ₁ (d ₂) whenever EL(d ₁∣x) is smaller (greater) than EL(d ₂∣x).

Example 2.11 (Alcohol Concentration in Blood—Continued)

Recall Example 2.9 where the posterior distribution of the alcohol level θ was N(0.50072, 0.00026), and the legal threshold was equal to 0.5.

Consider a symmetric linear loss function as in (2.18) with l ₁ = l ₂ = 1. The Bayesian posterior expected losses in (2.19) and (2.20) can be obtained as

The optimal decision thus is to consider that the alcohol level is greater than the legal threshold because this decision has a lower expected loss, though the difference between the two expected losses is, in the example here, extremely small

Note that this result crucially depends on the decision maker’s value assessments (i.e., the chosen loss function).

When expected losses for rival decisions are very similar, as is the case in Example 2.11, a sensitivity analysis should be performed as suggested, for example, in legal literature (Edwards, 1988). The sensitivity analysis should evaluate the effect of changes in the prior parameters and the loss values. See also Sect. 2.3.1 for a sensitivity analysis of the BF for evaluating the impact of changes in hyperparameters characterizing the prior distribution for the unknown level of alcohol in blood.

It is also worth to reflect on the choice of the loss function. A symmetric loss function, as previously suggested, may not realistically reflect the decision maker’s preferences. For example, a decision maker who is concerned about road safety may consider that falsely concluding that an individual’s blood alcohol concentration is below the legal limit is a more serious error than falsely concluding that an individual’s blood alcohol concentration is above the legal threshold. Therefore, l ₂ may be taken to be larger than l ₁, reflecting the greater inconvenience associated with underestimating the alcohol concentration. For example, when l ₁ = 1 and l ₂ = 2, meaning that underestimating the alcohol level is considered twice as serious as overestimating it, the expected loss of decision d ₂ will increase. One can verify that for any reasonable value of l ₂ greater than l ₁, decision d ₁ will be the one with the smaller expected loss.

2.4 Summary of R Functions

The R functions outlined below have been used in this chapter.

Functions Available in the Base Package

apply: applies a function to the margins (either rows or columns) of a matrix

acf: computes and plots estimates of the autocorrelation function

d< name of distribution> , p< name of distribution> ,
r< name of distribution> (e.g., dbeta, pbeta, rbeta): calculates the density and the cumulative probability and generates random numbers for various parametric distributions

rowSums: forms row sums for numeric arrays (or data frames)

Further details can be found in the Help menu, helpstart.

Functions Available in Other Packages

dbbinom and pbbinom in package extraDistr: calculates the density and the cumulative probability for a beta-binomial distribution

Functions Developed in This Chapter

dbinpois: computes the product between a binomial likelihood Bin(n, θ) at x − x _b and a Poisson likelihood Pn(λ) at x _b where x represents the number of items counted as presenting a given target characteristic and x _b represents the number of background elements affecting the counting process

Usage: dbinpois(xb)
Arguments: xb: a vector of integers ranging from 0 to x
Output: a vector of values, where each value represents the probability of the product between the binomial and the Poisson likelihood at a given value of the input argument xb

mh1: computes the function x∕(1 + x)
Usage: mh1(x)
Arguments: x: a scalar value x
Output: the value of x∕(1 + x)

mh2: computes the function x∕(1 + x)²
Usage: mh2(x)
Arguments: x: a scalar value x
Output: the value of x∕(1 + x)²

post_distr: computes the posterior distribution $\mbox{N}(\mu _x,\tau ^2_x)$ of a Normal mean θ, with X ∼N(θ, σ ²) and θ ∼N(μ, τ ²)
Usage: post_distr(sigma,n,barx,pm,pv)
Arguments: sigma, the variance σ ² of the observations; n, the number of observations; barx, the sample mean $\bar x$ of the observations; pm, the mean μ of the prior distribution N(μ, τ ²) and pv, the variance τ ² of the prior distribution N(μ, τ ²)
Output: a vector of two values: the first is the posterior mean μ _x and the second is the posterior variance $\tau ^2_x$

Published with the support of the Swiss National Science Foundation (Grant no. 10BP12_208532/1).

Notes

1.
See Biedermann et al. (2012, 2018) for a general discussion of thresholds of legal interest when data are continuous.
2.
For legislation in, e.g., Italy, see Gazzetta Ufficiale della Repubblica Italiana, 6, 09-01-2018, Decreto 20 settembre 2017.
3.
The method for finding the distribution of a sum of random variables is given, for example, in Casella and Berger (2002). It can be used to extend the model to the case of missing counts, an aspect that is not treated here.

References

Aitken, C. G. G. (1999). Sampling - How big a sample? Journal of Forensic Sciences, 44, 750–760.
Article Google Scholar
Aitken, C. G. G., Taroni, F., & Bozza, S. (2021). Statistics and the evaluation of evidence for forensic scientists (3rd ed.). Chichester: Wiley, Chichester.
Book MATH Google Scholar
Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory (2nd edn.). Chichester: Wiley.
MATH Google Scholar
Biedermann, A., Taroni, F., Bozza, S., & Aitken, C. G. G. (2008). Analysis of sampling issues using Bayesian networks. Law, Probability & Risk, 7, 35–60.
Article Google Scholar
Biedermann, A., Bozza, S., Garbolino, P., & Taroni, F. (2012). Decision-theoretic analysis of forensic sampling criteria using Bayesian decision networks. Forensic Science International, 223, 217–227.
Article MATH Google Scholar
Biedermann, A., Bozza, S., & Taroni, F. (2016b). The decisionalization of individualization. Forensic Science International, 266, 29–38.
Article Google Scholar
Biedermann, A., Bozza, S., Taroni, F., & Aitken, C. G. G. (2017a). The consequences of understanding expert probability reporting as a decision. Science & Justice, 57, 80–483. Special Issue on Measuring and Reporting the Precision of Forensic Likelihood Ratios.
Google Scholar
Biedermann, A., Taroni, F., Bozza, S., Augsburger, M., & Aitken, C. G. G. (2018). Critical analysis of forensic cut-offs and legal thresholds: A coherent approach to inference and decision. Forensic Science International, 288, 72–80.
Article Google Scholar
Bolstad, W. M., & Curran, J. M. (2017). Introduction to Bayesian statistics (3rd ed.). Hoboken: Wiley.
MATH Google Scholar
Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Pacific Grove: Duxbury Press.
MATH Google Scholar
Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhalter, D. J. (1999). Probabilistic networks and expert systems. New York: Springer.
MATH Google Scholar
D’Agostini, G. (2004). Bayesian Reasoning in Data Analysis. Singapore: World Scientific Publishing Co.
MATH Google Scholar
de Finetti, B. (1993b). Recent suggestions for the reconciliation of theories of probability (Paper originally published in the “Proceedings of the Second Berkely Symposium on Mathematical Statistics and Probability”, held from July 31 to August 12, 1950, University of California Press, 1951, pp. 217–225). In P. Monari & D. Cocchi (eds.), Probabilità e induzione (pp. 375–387). Bologna: CLUEB.
Google Scholar
Edwards, W. (1988). Insensitivity, commitment, belief and other Bayesian virtues, or, who put the snake in the warlord’s bed? In P. Tillers & E. D. Green (eds.), Probability and Inference in the Law of Evidence, The Uses and Limits of Bayesianism (Boston Studies in the Philosophy of Science) (pp. 271–276). Dordrecht: Springer.
Google Scholar
Gamerman, D., & Lopes, H. F. (2006). Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. London: Chapman & Hall.
Book MATH Google Scholar
Howson, C., & Urbach, P. (1996). Scientific reasoning: The Bayesian approach (2nd edn.) Chicago: Open Court Publishing Company.
Google Scholar
Lee, P. M. (2012). Bayesian statistics (4th ed.). Chichester: Wiley.
MATH Google Scholar
Marin, J., & Robert, C. (2014). Bayesian essentials with R (2nd ed.). New York: Springer.
Book MATH Google Scholar
O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., & Rakow, T. (2006). Uncertain judgements: Eliciting experts’ probabilities. Hoboken: Wiley.
Book MATH Google Scholar
Ommen, D., Saunders, P., & Neumann, C. (2017). The characterization of Monte Carlo errors for the quantification of the value of forensic evidence. Journal of Statistical Computation and Simulation, 87, 1608–1643.
Article MathSciNet MATH Google Scholar
Tanner, M. A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions (3rd ed.). New York: Springer.
Book MATH Google Scholar
Taroni, F., Bozza, S., Biedermann, A., Garbolino, G., & Aitken, C. G. G. (2010). Data analysis in forensic science: A Bayesian decision perspective. Chichester: Wiley.
Book MATH Google Scholar
Taroni, F., Bozza, S., Biedermann, A., & Aitken, C. G. G. (2016). Dismissal of the illusion of uncertainty in the assessment of a likelihood ratio. Law, Probability & Risk, 15, 1–16.
Article Google Scholar
Wolodzko, T. (2020). Package ‘extraDistr’. https://cran.r-project.org/web/packages/extraDistr/extraDistr.pdf
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, Ca’ Foscari University of Venice, Venice, Italy
Silvia Bozza
Faculty of Law, Criminal Justice and Public Administration, School of Criminal Justice, University of Lausanne, Lausanne-Dorigny, Switzerland
Silvia Bozza, Franco Taroni & Alex Biedermann

Authors

Silvia Bozza
View author publications
You can also search for this author in PubMed Google Scholar
Franco Taroni
View author publications
You can also search for this author in PubMed Google Scholar
Alex Biedermann
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bozza, S., Taroni, F., Biedermann, A. (2022). Bayes Factor for Model Choice. In: Bayes Factors for Forensic Decision Analyses with R. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-09839-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-09839-0_2
Published: 08 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09838-3
Online ISBN: 978-3-031-09839-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics