Reduced bias estimation of the log odds ratio

Saleh, Asma

doi:10.1007/s00362-024-01593-7

Reduced bias estimation of the log odds ratio

Regular Article
Open access
Published: 26 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

Reduced bias estimation of the log odds ratio

Download PDF

Asma Saleh ORCID: orcid.org/0000-0001-7710-5618¹

198 Accesses
Explore all metrics

Abstract

Analysis of binary matched pairs data is problematic due to infinite maximum likelihood estimates of the log odds ratio and potentially biased estimates, especially for small samples. We propose a penalised version of the log-likelihood function based on adjusted responses which always results in a finite estimator of the log odds ratio. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. We implement indirect inference to the adjusted log-likelihood estimator. It is shown, through a complete enumeration study, that the indirect inference estimator is competitive in terms of bias and variance in comparison to the maximum likelihood, conditional, modified profile log-likelihood and Firth’s penalised log-likelihood estimators.

Bias Correction in Estimating Proportions by Pooled Testing

Article 01 August 2017

Mantel–Haenszel estimators of a common odds ratio for multiple response data

Article 18 May 2018

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider a series of q independent pairs of independent binomial random variables $(Y_{i1},Y_{i2})$, with $Y_{i1}\sim \textrm{Bi}(1,\pi _{i1})$ and $Y_{i2}\sim \textrm{Bi}(m_{i},\pi _{i2})$ as in Section 4 of Lunardon (2018). Let the success probabilities satisfy $\pi _{i1}=\exp (\psi +\lambda _{i})/\{1+\exp (\psi +\lambda _{i})\}$ and $\pi _{i2}=\exp (\lambda _{i})/\{1+\exp (\lambda _{i})\}$, where $\psi =\log \{\pi _{i1}/(1-\pi _{i2})\}-\log \{\pi _{i2}/(1-\pi _{i2})\}$, the log odds ratio, is the parameter of interest and $\lambda _{i}=\log \{\pi _{i2}/(1-\pi _{i2})\}$ is the nuisance parameter, $i=1,\ldots ,q$. This is a stratified setting as in Sartori (2003) where the sample size is $n=\sum _{i=1}^{q}m_{i}$ and where q is the number of strata and $m_{i}$ is the ith stratum sample size. We will refer to this exponential family model in canonical form as the binomial matched pairs model, and to the model with $m_{i}=m=1$ as the binary matched pairs model. This model often arises in case–control studies in medical contexts, where $y_{i1}$ and $y_{i2}$ may represent for example the numbers of exposed or experimental persons among one case and $m_{i}$ controls in the ith stratum and where interest lies in studying the influence of some risk factor or the effect of some treatment. For example, suppose that we have data from a clinical trial evaluating the effectivness of a new Covid vaccine in preventing Covid disease. The data was collected according to a matched case–control design with one case and $m_{i}$ controls per stratum, where $i=1,\ldots ,10$, is the number of strata and $m_{i}$, which is the number of control patients, takes on various values from 3 to 7. On each of 10 days patients testing negative for Covid at a specified time in a hospital were served as subjects. On each day one patient chosen at random formed the experimental group and the remainder were controls. The binary response was whether the patient tested positive or not at the end of a specified period, where testing negative is taken as a “success" and the observed numbers $y_{i1}$ and $y_{i2}$ are therefore the numbers of patients in the two groups testing negative for Covid. The object of the analysis is to assess the effect of the drug on the probabilities of success $\pi _{i1}$ and $\pi _{i2}$ in the case and control groups, respectively. The parameter of interest $\psi $ is the difference in the log odds ratio of the case and control groups (for more examples, see Section 1.2 of Cox and Snell 1989).

It is well known, since Neyman and Scott (1948), that in stratified modelling settings the maximum likelihood estimator, derived from the profile log-likelihood, is not in general, a consistent or unbiased estimator of the parameter of interest as the dimension of the nuisance parameter increases while the stratum sample size is kept fixed; this is known as the incidental parameter problem. It is possible to solve this problem, in some cases when the model has a particular structure, like in exponential families in canonical form, using conditional or marginal log-likelihoods (see Sections 4.4 and 4.5 of Pace and Salvan 1997). However, these are not always available and so an alternative is to work with the approximate conditional profile log-likelihood of Cox and Reid (1987), which requires orthogonality of the parameter of interest and the nuisance parameter, or the modified profile log-likelihood of Barndorff-Nielson (1983) whose computation requires a sample space derivative. These are often simple to compute for exponential and composite group families and only involve the observed information matrix for the components of the nuisance parameter which is readily available from direct differentiation (see Section 4.7 of Pace and Salvan 1997) and often provide accurate approximations to conditional or marginal log-likelihoods when they exist. It has been shown through many examples that when the profile log-likelihood performs poorly, approximate conditional and modified profile log-likelihoods can perform much better (see Section 3.1, Example 1 of McCullagh and Tibshirani 1990).

An alternative family of estimators in regular parametric problems was developed in Firth (1993) where the first order bias term is removed from the asymptotic bias of the maximum likelihood estimator by solving a set of adjusted score equations. Firth (1993) considered the case of exponential families with canonical parametrisation, amongst others, and showed that for this family of models, his method is equivalent to maximising a penalised likelihood were the penalty function is the Jeffreys invariant prior. Lunardon (2018) showed that the bias reduction approach of Firth (1993) provides an inferential framework which is, from an asymptotic perspective, equivalent to that for the approximate conditional and modified profile log-likelihoods when dealing with nuisance parameters. The advantage of bias reduction of Firth (1993) is that it can handle the problem of monotone likelihoods for stratified models with categorical responses. Nevertheless, the approach of Firth (1993) is not in general invariant under interest-respecting reparameterizations.

Indirect inference is another class of inferential procedures that appeared in the Econometrics literature in Gourieroux et al. (1993) and has been used for bias reduction of the maximum likelihood estimator and of other estimators. Its simplest version proceeds by subtracting from the maximum likelihood estimator its full bias and evaluating it at the new indirect inference estimator which therefore becomes the solution of an implicit equation. Kuk (1995) describes a simulation based approach for indirect inference by implementing an iterative bias correction of any suitably defined initial estimator, to yield an estimator which is asymptotically unbiased and consistent.

In this paper, we review in Sect. 2 the profile, conditional, modified profile and Firth (1993) penalised likelihood estimators of the log odds ratio. Since the maximum likelihood, conditional and modified profile likelihood estimators inherit the problem of infinite estimates of the log odds ratio, we propose in Sect. 3 a penalised log-likelihood function based on adjusted responses which always yields finite point estimates of the parameter of interest. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. In Sect. 4 we implement indirect inference to reduce the bias of the adjusted log-likelihood estimator, by adapting the method of Kuk (1995) to nuisance parameter settings. The finite-sample properties of all the above estimators are compared through a complete enumeration study in Sect. 5, as in Lunardon (2018) where no simulation is required, followed by a discussion. Finally, a real data set is analysed in Sect. 6.

2 Review of point estimation of the log odds ratio

2.1 Maximum likelihood

Several estimators of the common log odds ratio $\psi $ have been proposed in the literature. These include the Mantel and Haenszel, empirical logit and Birch estimators (see Breslow 1981; Gart 1971, for a review of these estimators and of their properties). In this section, however, we only consider estimators of the log odds ratio that depend on the data only through the sufficient statistic.

The log-likelihood function for $\theta =(\psi ,\lambda _{1},\ldots ,\lambda _{q})^{\intercal }$ for the above binomial matched pairs model is (Lunardon 2018, Section 4.1)

$$\begin{aligned} l(\theta )=\sum _{i=1}^{q}\psi y_{i1}+\sum _{i=1}^{q}\lambda _{i}(y_{i1}+y_{i2})-\sum _{i=1}^{q}\big [\log \{1+\exp (\psi +\lambda _{i})\}+m_{i}\log \{1+\exp (\lambda _{i})\}\big ]. \end{aligned}$$

(1)

This is a linear full exponential family in canonical form (Pace and Salvan 1997, Section 5) with jointly sufficient statistics $t=\sum _{i=1}^{q}y_{i1}$ and $s_{i}=y_{i1}+y_{i2}$ for $\psi $ and $\lambda _{i}$, respectively. Throughout this section, we consider for simplicity $m_{i}=m$ with totals $s_{i}=(m+1)/2$ as in (Sartori 2003, Example 3). In this setting, the constrained maximum likelihood estimator of $\lambda _{i}$ for a fixed value of $\psi $, denoted by ${\hat{\lambda }}_{i,\psi }$, will be identical for all $i=1,\ldots ,q$ and so we set ${\hat{\lambda }}_{i,\psi }={\hat{\gamma }}_{\psi }$. Equivalently, denote the constrained maximum likelihood estimator of $\psi $ for a fixed value of $\gamma $ by ${\hat{\psi }}_{\gamma }$. The score equations for the log-likelihood function with respect to $\gamma $ and $\psi $ are respectively

$$\begin{aligned} & \frac{m+1}{2}-\frac{\exp (\psi +\gamma )}{1+\exp (\psi +\gamma )}-m\frac{\exp (\gamma )}{1+\exp (\gamma )}=0, \end{aligned}$$

(2)

$$\begin{aligned} & t-q\frac{\exp (\psi +\gamma )}{1+\exp (\psi +\gamma )}=0. \end{aligned}$$

(3)

The solution of (3) is ${\hat{\psi }}_{\gamma }=\log \{t/(q-t)\}-\gamma $, and on substituting this in (2) and solving for $\gamma $ we get the maximum likelihood estimator of the nuisance parameter ${\hat{\gamma }}=\log \big [\{q(m+1)-2t\}/\{q(m-1)+2t\}\big ]$. Substituting ${\hat{\gamma }}$ in ${\hat{\psi }}_{\gamma }$ we get the maximum likelihood estimator of the parameter of interest

$$\begin{aligned} {\hat{\psi }}=\log \bigg (\frac{t\{q(m-1)+2t\}}{\{q-t\}\{q(m+1)-2t\}}\bigg ). \end{aligned}$$

(4)

Note that when $t=0$ or $t=q$, ${\hat{\psi }}$ is $-\infty $ or $+\infty $, respectively. This is problematic because it means in these extreme situations were all cases ‘succeed’ or ‘fail’ we cannot estimate $\psi $. Since $\psi $ is defined as a logarithm of odds ratio, infinite estimates arise when the argument of the logarithm, i.e. the odds ratio is zero. Therefore, a better estimator of $\psi $ that avoids infinity needs to avoid zero as a possibility for the argument of the log function.

Using the weak law of large numbers, Slutsky’s theorem and the Continuous mapping theorem (Florescu 2014, Section 7), we find that ${\hat{\psi }}$ converges in probability to $\psi +\log [\{(m+1)\exp (\psi )+m-1\}/\{(m-1)\exp (\psi )+m+1\}]$ as $q\rightarrow \infty $, and so it is inconsistent. When m is also allowed to increase to $\infty $, ${\hat{\psi }}$ will tend to $\psi $. This means that ${\hat{\psi }}$ will be consistent only when both m and q diverge.

Given that the totals $s_{i}$ are fixed, the maximum likelihood estimator of $\psi $ depends on the data only through the sufficient statistic $T=\sum _{i=1}^{q}Y_{i1}$ and so its bias and variance can be calculated exactly using

$$\begin{aligned} \textrm{E}_{\psi }\{{\hat{\psi }}(T)\}= & \sum _{t=1}^{q-1}{\hat{\psi }}(t)\frac{\textrm{pr}(T=t\vert S_{i}=s_{i})}{1-\textrm{pr}(T=0\vert S_{i}=s_{i})-\textrm{pr}(T=q\vert S_{i}=s_{i})}, \end{aligned}$$

(5)

$$\begin{aligned} \textrm{var}_{\psi }\{{\hat{\psi }}(T)\}= & \textrm{E}_{\psi }[\{{\hat{\psi }}(T)\}^2]-[\textrm{E}_{\psi }\{{\hat{\psi }}(T)\}]^2. \end{aligned}$$

(6)

2.2 Conditional maximum likelihood

The conditional log-likelihood function is based on the distribution of $Y_{i1}$ given $S_{i}=s_{i}$ in each stratum. Davison (1988) and Gart (1970) noted that the conditional density of $Y_{i1}$ given $S_{i}$ is

$$\begin{aligned} \textrm{pr}(Y_{i1}=y_{i1}\vert S_{i}=s_{i})=\frac{{1\atopwithdelims ()y_{i1}}{m_{i}\atopwithdelims ()s_{i}-y_{i1}}\exp (\psi y_{i1})}{\sum _{u=0}^{\min (1,s_{i})}{1\atopwithdelims ()u}{m_{i}\atopwithdelims ()s_{i}-u}\exp (\psi u)}. \end{aligned}$$

(7)

This is the noncentral hypergeometric distribution which is obtained by rewriting the left hand side of (7) as $\textrm{Pr}(Y_{i1}=y_{i1},S_{i}=s_{i})/\textrm{Pr}(S_{i}=s_{i})=\textrm{Pr}(Y_{i1}=y_{i1})\textrm{Pr}(Y_{i2}=s_{i}-y_{i1})/\textrm{Pr}(Y_{i1}+Y_{i2}=s_{i})$, by independence of $Y_{i1}$ and $Y_{i2}$, and noting that $\textrm{Pr}(Y_{i1}+Y_{i2}=s_{i})=\sum _{u=0}^{s_{i}}\textrm{Pr}(Y_{i1}=u)\textrm{Pr}(Y_{i2}=s_{i}-u)$. Because $y_{i1}$ can only be 0 or 1, the right hand side of (7) can be further simplified to

$$\begin{aligned} \textrm{pr}(Y_{i1}=y_{i1}\vert S_{i}=s_{i})=\bigg [\frac{s_{i}\exp (\psi )}{m_{i}+1+s_{i}\{\exp (\psi )-1\}}\bigg ]^{y_{i1}}\bigg [\frac{m_{i}-s_{i}+1}{m_{i}+1+s_{i}\{\exp (\psi )-1\}}\bigg ]^{1-y_{i1}}. \end{aligned}$$

(8)

This shows that $Y_{i1}\vert S_{i}$ has a Bernoulli distribution with success probability the first term inside the bracket of the right hand side of (8). Taking the logarithm of the product of (8) gives the conditional log-likelihood function which simplifies to

$$\begin{aligned} l_{c}(\psi )=\sum _{i=1}^{q}\psi y_{i1}-\sum _{i=1}^{q}\log \big [m_{i}+1+s_{i}\{\exp (\psi )-1\}\big ]. \end{aligned}$$

(9)

Letting $m_{i}=m$, $s_{i}=(m+1)/2$ and differentiating (9) gives the conditional maximum likelihood estimator

$$\begin{aligned} {\hat{\psi }}_{c}=\log \bigg (\frac{t}{q-t}\bigg ). \end{aligned}$$

(10)

When $t=0$ or $t=q$, ${\hat{\psi }}_{c}$ is $-\infty $ or $+\infty $, respectively. In the setting of Lunardon (2018), i.e. when $m_{i}=m$ and $s_{i}=(m+1)/2$, the success probability of the Bernoulli random variable $Y_{i1}\vert S_{i}$ simplifies to $\pi =\exp (\psi )/\{\exp (\psi )+1\}$. The distribution of the sufficient statistic T given $S_{i}$ is therefore Binomial with denominator q and success probability $\pi $. The conditional distribution of T can also be obtained using the convolution method following (Butler and Stephens 2017, Section 2). This will be particularly useful for general $m_{i}$ and $s_{i}$ where the Binomial conditional distribution of T no longer holds. In fact, the conditional distribution of T will be Poisson binomial.

By noting that T converges in probability to $q\pi $ by the weak law of large numbers, we find that $\log \{t/(q-t)\}\xrightarrow {p}\psi $ by Slutsky’s theorem and the Continuous mapping theorem, so ${\hat{\psi }}_{c}$ is consistent. As the conditional maximum likelihood estimator depends on the data only through the sufficient statistic, its bias and variance can be calculated using (5) and (6) but replacing ${\hat{\psi }}(T)$ by ${\hat{\psi }}_{c}(T)$.

2.3 Modified profile maximum likelihood

(Davison 2003, Section 12) showed that for a linear exponential family in canonical form, the modified profile log-likelihood function of Barndorff-Nielson (1983) reduces to

$$\begin{aligned} l_{mp}(\psi )=l(\psi ,{\hat{\lambda }}_{\psi })+\frac{1}{2}\log \left\{ \det j_{\lambda \lambda }(\psi ,{\hat{\lambda }}_{\psi })\right\} , \end{aligned}$$

(11)

where $l(\psi ,{\hat{\lambda }}_{\psi })$ is the profile log-likelihood obtained from (1) by substituting the constrained maximum likelihood estimator of $\lambda $ and where $j_{\lambda \lambda }(\psi ,\lambda )$ is the observed information per observation for the $\lambda $ components and is given by the negative of the second derivative of the log-likelihood function with respect to $\lambda $. In the setting $m_{i}=m$ and $s_{i}=(m+1)/2$, $j_{\lambda \lambda }(\psi ,\lambda )$ becomes the $q\times q$ matrix with ith diagonal element

$$\begin{aligned} -\frac{\partial ^2l(\psi ,\gamma )}{\partial \gamma ^2}=\frac{\exp (\psi +\gamma )}{\big (1+\exp (\psi +\gamma )\big )^2}+m\frac{\exp (\gamma )}{\big (1+\exp (\gamma )\big )^2}, \end{aligned}$$

(12)

and zero elsewhere, and where we observed in Sect. 2.1 that the solution of (2), ${\hat{\lambda }}_{\psi }={\hat{\gamma }}_{\psi }$, is not available in closed form. This means that there is no closed form expression for (11) and we calculate the maximum modified profile log-likelihood estimator, ${\hat{\psi }}_{mp}$, numerically and evaluate its bias and variance using (5) and (6), respectively, by replacing ${\hat{\psi }}(T)$ with ${\hat{\psi }}_{mp}(T)$.

2.4 Firth penalized likelihood

When $\theta =(\psi , \lambda _{1},\ldots ,\lambda _{q})$ is the canonical parameter of an exponential family model like in the model considered here, Firth (1993) showed that the adjusted score equations estimator of $\theta $ is equivalent to the maximiser of the penalised log-likelihood function

$$\begin{aligned} l_{*}(\theta )=l(\theta )+\frac{1}{2}\log \{\det i(\theta )\}, \end{aligned}$$

(13)

where $i(\theta )=\textrm{E}\{j(\theta )\}$ is the Fisher information matrix. In the setting $m_{i}=m$ and $s_{i}=(m+1)/2$, the second order partial derivatives of $l(\psi ,\lambda _{i})$ are

$$\begin{aligned} \frac{\partial ^2l(\psi ,\lambda _{i})}{\partial \psi ^2}= & -\sum _{i=1}^{q}\frac{\exp (\psi +\lambda _{i})}{\big (1+\exp (\psi +\lambda _{i})\big )^2} \end{aligned}$$

(14)

$$\begin{aligned} \frac{\partial ^2l(\psi ,\lambda _{i})}{\partial \psi \partial \lambda _{i}}= & -\frac{\exp (\psi +\lambda _{i})}{\big (1+\exp (\psi +\lambda _{i})\big )^2} \end{aligned}$$

(15)

$$\begin{aligned} \frac{\partial ^2l(\psi ,\lambda _{i})}{\partial \lambda _{i}^2}= & -\frac{\exp (\psi +\lambda _{i})}{\big (1+\exp (\psi +\lambda _{i})\big )^2}-m\frac{\exp (\lambda _{i})}{\big (1+\exp (\lambda _{i})\big )^2}. \end{aligned}$$

(16)

Since the above derivatives do not depend on the data, the Fisher information matrix coincides with the observed information and is given by

$$\begin{aligned} i(\psi ,\lambda _{i})=\begin{pmatrix} \sum _{i=1}^{q}V_{i1} & V_{11} & \cdots & V_{q1}\\ V_{11} & (V_{11}+V_{12}) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ V_{q1} & 0 & \cdots & (V_{q1}+V_{q2}) \end{pmatrix}, \end{aligned}$$

(17)

where $V_{i1}=\big (\exp (\psi +\lambda _{i})\big )/\big (1+\exp (\psi +\lambda _{i})\big )^2$ and $V_{i2}=m\big (\exp (\lambda _{i})\big )/\big (1+\exp (\lambda _{i})\big )^2$, $i=1,\ldots ,q$. The determinant of $i(\psi ,\lambda _{i})$ is obtained using the standard identity (see, Magnus and Neudecker 2019, Chapter 1, page 28) and simplifies to

$$\begin{aligned} \text {det}{i(\psi ,\lambda _{i})}=\bigg (\prod _{i=1}^{q}(V_{i1}+V_{i2})\bigg )\bigg (\sum _{i=1}^{q}\frac{V_{i1}V_{i2}}{V_{i1}+V_{i2}}\bigg ). \end{aligned}$$

(18)

Therefore the penalty function that needs to be added to the log-likelihood function is

$$\begin{aligned} \frac{1}{2}\log \{\det i(\psi ,\lambda _{i})\}=\frac{1}{2}\sum _{i=1}^{q}\log (V_{i1}+V_{i2})+\frac{1}{2}\log \bigg (\sum _{i=1}^{q}\frac{V_{i1}V_{i2}}{V_{i1}+V_{i2}}\bigg ). \end{aligned}$$

(19)

The score equations for the penalised log-likelihood of Firth (1993), $l_{*}(\psi ,\lambda _{i})$, with respect to $\lambda _{i}$ and $\psi $ involve cumbersome expressions and have no closed form solution so the penalised log-likelihood of Firth (1993) estimator of $\psi $, denoted by ${\hat{\psi }}_{*}$ is obtained numerically. This estimator is always finite as shown in Section 2.1 of Kosmidis and Firth (2021). The bias and variance of ${\hat{\psi }}_{*}$ are calculated using (5) and (6), respectively.

2.5 Binary matched pairs model

The binary matched pairs model is a special case of the binomial matched pairs model when $m=1$. This implies that in the setting of Lunardon (2018), $s_{i}=1$, and so $a=d=0$, where a, b, c and d denote the number of pairs of the form (0, 0), (0, 1), (1, 0) and (1, 1) respectively, with $a+b+c+d=q$. Note that $\sum _{i=1}^{q}y_{i1}=c+d$, $\sum _{i=1}^{q}y_{i2}=b+d$ and $\sum _{i=1}^{q}(y_{i1}+y_{i2})=b+c+2d$. We call pairs of the form (0, 0) and (1, 1) concordant, while pairs of the form (0, 1) and (1, 0) are called discordant. In this case, ${\hat{\gamma }}_{\psi }=-\psi /2$ and so the profile log-likelihood for $\psi $ is (see Davison 2003, Example 12.23)

$$\begin{aligned} l_{p}(\psi )=\psi t-2q\log \{1+\exp (\psi /2)\}, \end{aligned}$$

(20)

which is maximised at

$$\begin{aligned} {\hat{\psi }}= & 2\log \bigg (\frac{t}{q-t}\bigg )\nonumber \\= & 2\log \bigg (\frac{c}{b}\bigg ). \end{aligned}$$

(21)

Alternatively, ${\hat{\psi }}$ can be obtained by substituting $m=1$ in (4). Davison (2003) showed that ${\hat{\psi }}$ converges in probability to $2\psi $ as $q\rightarrow \infty $, thus it is inconsistent.

Only discordant pairs enter the conditional log-likelihood and it is given by

$$\begin{aligned} l_{c}(\psi )=c\psi -(b+c)\log \{\exp (\psi )+1\}, \end{aligned}$$

(22)

which is maximised at

$$\begin{aligned} {\hat{\psi }}_{c}=\log \bigg (\frac{c}{b}\bigg ), \end{aligned}$$

(23)

which converges in probability to $\psi $ as $q\rightarrow \infty $, as noted by (Davison 2003, Example 12.23), so it is consistent.

Substituting ${\hat{\gamma }}_{\psi }=-\psi /2$ in (12) gives $ j_{\lambda \lambda }(\psi ,{\hat{\lambda }}_{\psi })=2\big (\exp (\psi /2)\big )/\big (1+\exp (\psi /2)\big )^2$, and in this case (Davison 2003, Example 12.23) showed that

$$\begin{aligned} l_{mp}(\psi )=\frac{1}{4}\psi (q+4t)-3q\log \{1+\exp (\psi /2)\}, \end{aligned}$$

(24)

which is maximized at

$$\begin{aligned} {\hat{\psi }}_{mp}= & 2\log \bigg (\frac{q+4t}{5q-4t}\bigg )\nonumber \\= & 2\log \bigg (\frac{b+5c}{c+5b}\bigg ), \end{aligned}$$

(25)

where the latter converges in probability to $2\log \big [\{1+5\exp (\psi )\}/\{5+\exp (\psi )\}\big ]$ as $q\rightarrow \infty $. Note that when $c=0$ or $b=0$, ${\hat{\psi }}_{mp}$ is $2\log (1/5)$ or $2\log (5)$, respectively, i.e. ${\hat{\psi }}_{mp}$ is finite. Although ${\hat{\psi }}_{mp}$ is inconsistent, Davison (2003) showed that it is less biased than ${\hat{\psi }}$.

3 Adjusted likelihood method

3.1 Penalised likelihood based on adjusted responses

In order to avoid infinite estimates of $\psi $, as is the case with ${\hat{\psi }}$, ${\hat{\psi }}_{c}$ and ${\hat{\psi }}_{mp}$ (for $m\ne 1$), when all of the $y_{i1}$ observations are zero or one, we propose to adjust the log-likelihood function by adding a small number $\delta >0$ to each success, $y_{i1}$ and $y_{i2}$, and to each failure, $1-y_{i1}$ and $m_{i}-y_{i2}$. The penalised log-likelihood function based on adjusted responses for $\theta =(\psi ,\lambda _{1},\ldots ,\lambda _{q})^{\intercal }$ becomes

$$\begin{aligned} l_{a}(\theta )= & \sum _{i=1}^{q}\psi (y_{i1}+\delta )+\sum _{i=1}^{q}\lambda _{i}(y_{i1}+y_{i2}+2\delta )\nonumber \\ & -\sum _{i=1}^{q}\big [(1+2\delta )\log \{1+\exp (\psi +\lambda _{i})\}+(m_{i}+2\delta )\log \{1+\exp (\lambda _{i})\}\big ].\nonumber \\ \end{aligned}$$

(26)

When $m_{i}=m$ and $s_{i}=(m+1)/2$, the score equations for the above log-likelihood function with respect to $\gamma $ and $\psi $ simplify respectively to

$$\begin{aligned} & \frac{m+1+4\delta }{2}-(1+2\delta )\frac{\exp (\psi +\gamma )}{1+\exp (\psi +\gamma )}-(m+2\delta )\frac{\exp (\gamma )}{1+\exp (\gamma )}=0, \end{aligned}$$

(27)

$$\begin{aligned} & t+q\delta -q(1+2\delta )\frac{\exp (\psi +\gamma )}{1+\exp (\psi +\gamma )}=0, \end{aligned}$$

(28)

where we set the constrained penalised maximum likelihood estimator of $\lambda _{i}$, based on adjusted responses, for a fixed value of $\psi $, denoted by ${\hat{\lambda }}_{i,\psi ,a}$, to be ${\hat{\gamma }}_{\psi ,a}$ because it will be identical for all $i=1,\ldots ,q$. The simultaneous solution of (27) and (28) give the penalised maximum likelihood estimators of $\gamma $ and $\psi $, based on adjusted responses

$$\begin{aligned} {\hat{\gamma }}_{a}= & \log \bigg [\frac{q(m+1+2\delta )-2t}{q(m-1+2\delta )+2t}\bigg ], \end{aligned}$$

(29)

$$\begin{aligned} {\hat{\psi }}_{a}= & \log \bigg [\frac{(t+q\delta )\{q(m-1+2\delta )+2t\}}{\{q(m+1+2\delta )-2t\}\{q(1+\delta )-t\}}\bigg ]. \end{aligned}$$

(30)

Note that when $\delta =0$, ${\hat{\psi }}_{a}={\hat{\psi }}$. Note also that when $t=0$ or $t=q$, ${\hat{\psi }}_{a}$ is finite, while when $t=q/2$, ${\hat{\psi }}_{a}=0$. When $m=1$,

$$\begin{aligned} {\hat{\psi }}_{a}=2\log \Bigg [\frac{t+q\delta }{q(1+\delta )-t}\Bigg ]. \end{aligned}$$

(31)

Since ${\hat{\psi }}_{a}$ depends on the data only through the sufficient statistic t, its bias and variance are computed using (5) and (6), respectively.

3.2 Probability limit of the penalised likelihood estimator based on adjusted responses

In this section we obtain the probability limit of the penalised log-likelihood estimator based on adjusted responses and derive the relationship that $\delta $ should satisfy in order to make this estimator consistent. We also show how the modified profile log-likelihood estimator (when $m=1$) and the conditional log-likelihood estimator can be recovered for particular values of $\delta $.

When $m=1$, as $q\rightarrow \infty $, ${\hat{\psi }}_{a}$ converges in probability to

$$\begin{aligned} 2\log \Bigg (\frac{\delta \{\exp (\psi )+1\}+\exp (\psi )}{\delta \{\exp (\psi )+1\}+1}\Bigg ), \end{aligned}$$

(32)

while for a general m, we find that as $q\rightarrow \infty $, ${\hat{\psi }}_{a}$ converges in probability to

$$\begin{aligned} \log \Bigg (\frac{\big [\delta \{\exp (\psi )+1\}+\exp (\psi )\big ]\big [\{m-1+2\delta \}\{\exp (\psi )+1\}+2\exp (\psi )\big ]}{\big [\delta \{\exp (\psi )+1\}+1\big ]\big [\{m+1+2\delta \}\{\exp (\psi )+1\}-2\exp (\psi )\big ]}\Bigg ). \end{aligned}$$

(33)

Similar to Table 12.3 of Davison (2003), Table 1 compares the limiting values of ${\hat{\psi }}$, ${\hat{\psi }}_{c}$, ${\hat{\psi }}_{mp}$ and ${\hat{\psi }}_{a}$ when $m=1$ for a set of values of $\psi $ ranging from 0 to 5 and a set of values of $\delta $ ranging from 0.05 to 0.50. We note that for any given $\psi $, there exists a value of $\delta $ for which the limit of ${\hat{\psi }}_{a}$ is closer to the truth than ${\hat{\psi }}_{mp}$. In other words, there is evidence that there exists a $\delta $ value such that ${\hat{\psi }}_{a}$ converges to the truth faster than ${\hat{\psi }}_{mp}$. These values of $\delta $ decrease as the true value of $\psi $ increase. Observe also that when $\delta =0.25$, the limiting value of ${\hat{\psi }}_{a}$ coincides with that of ${\hat{\psi }}_{mp}$. In fact substituting $\delta =0.25$ in (32), we find that ${\hat{\psi }}_{a}$ converges in probability to the same limit of ${\hat{\psi }}_{mp}$ given in Sect. 2.5. This means that the penalised log-likelihood estimator based on adjusted responses recovers the modified profile log-likelihood estimator when $m=1$ and $\delta =0.25$, i.e. ${\hat{\psi }}_{a}={\hat{\psi }}_{mp}$.

When $m=1$, in order to make ${\hat{\psi }}_{a}$ consistent we need to equate the ratio inside the logarithm of (32) with $\sqrt{\exp (\psi )}$ which simplifies to the equation

$$\begin{aligned} \big [\exp (\psi )-1\big ]\big [\delta ^2\{\exp (\psi )\}^2+\{2\delta ^2-1\}\{\exp (\psi )\}+\delta ^2\big ]=0. \end{aligned}$$

(34)

When $\psi =0$, there is no adjustment because there is no bias so we consider the positive solution of the quadratic equation $[\delta ^2\{\exp (\psi )\}^2+\{2\delta ^2-1\}\{\exp (\psi )\}+\delta ^2]=0$ in terms of $\delta $ which simplifies to

$$\begin{aligned} \delta =\frac{\sqrt{\exp (\psi )}}{1+\exp (\psi )}. \end{aligned}$$

(35)

Substituting (35) into (31) and solving for $\psi $ gives the same estimate as ${\hat{\psi }}_{c}$. This means that the value of $\delta $ that achieves consistency of ${\hat{\psi }}_{a}$ is the one that recovers ${\hat{\psi }}_{c}$. This is disadvantageous because we inherit exactly the same problems with conditional log-likelihood (i.e. infinite estimates) if we attempt to tune $\delta $ to make ${\hat{\psi }}_{a}$ consistent. The value of $\delta $ in terms of t and q that recovers ${\hat{\psi }}_{c}$ is obtained by equating (31) with ${\hat{\psi }}_{c}$ and simplifies to

$$\begin{aligned} \delta =\sqrt{\frac{t(3qt-2t^2-q^2)}{q^2(2t-q)}}. \end{aligned}$$

(36)

Observe that when $t=0$ or $t=q$, $\delta =0$ and so ${\hat{\psi }}_{a}={\hat{\psi }}$, while when $t=q/2$, $\delta $ is infinite.

For a general m, the relationship that $\delta $ should satisfy in order to make ${\hat{\psi }}_{a}$ consistent is found by equating the ratio inside the logarithm of (33) with $\exp (\psi )$ which simplifies to the equation

$$\begin{aligned} & \delta \big \{\exp (\psi )\big \}^3\big \{m-1+2\delta \big \}-\big \{\exp (\psi )\big \}^2\big \{2+\delta (1-m-2\delta )\big \}\nonumber \\ & \quad \quad +\big \{\exp (\psi )\big \}\big \{2+\delta (1-m-2\delta )\big \}-\delta (m-1+2\delta )=0,\nonumber \\ \end{aligned}$$

(37)

with no closed form solution. The value of $\delta $ in terms of t, q and m that recovers ${\hat{\psi }}_{c}$ satisfies the equation

$$\begin{aligned} 2q^2\delta ^2(q-2t)+q^2\delta \big \{q(m-1)+2t(1-m)\big \}-2t(q^2-3tq+2t^2)=0. \end{aligned}$$

(38)

Fig. 1 shows a plot of $\delta $, the root of (37), against $\psi $ for $m=1$, $m=3$, $m=11$ and $m=39$. This plot shows that there is a scaled logistic relationship between $\delta $ and $\psi $ and that the best choice of $\delta $ lies in the range $0<\delta <0.5$. Given the true value of $\psi $, as m increases, the value of $\delta $ that makes ${\hat{\psi }}_{a}$ consistent decreases to zero. This is expected because we know from standard asymptotic theory that the maximum likelihood estimator is asymptotically consistent.

Table 1 Probability limits of profile, conditional, modified profile and adjusted log-likelihood based on adjusted responses estimators of the log odds ratio $\psi $ in the binary matched pairs model when $m=1$. For each $\psi $, the value in bold face corresponds to the limiting value closest to the truth if we ignore ${\hat{\psi }}_{c}$

Reduced bias estimation of the log odds ratio

Abstract

Similar content being viewed by others

Bias Correction in Estimating Proportions by Pooled Testing

Mantel–Haenszel estimators of a common odds ratio for multiple response data

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

1 Introduction

2 Review of point estimation of the log odds ratio

2.1 Maximum likelihood

2.2 Conditional maximum likelihood

2.3 Modified profile maximum likelihood

2.4 Firth penalized likelihood

2.5 Binary matched pairs model

3 Adjusted likelihood method

3.1 Penalised likelihood based on adjusted responses

3.2 Probability limit of the penalised likelihood estimator based on adjusted responses

4 Indirect inference estimation of the log odds ratio

5 Complete enumeration study

6 Analysis of crying babies data

7 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Supplementary information

Conflict of interest

Additional information

Publisher's Note

Appendix A: Results tables for complete enumeration study

Appendix A: Results tables for complete enumeration study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation