Introduction

Analyzing a study with a crossed or factorial arrangement of treatments that includes a zero level is an underestimated challenge because often, a zero amount of each level of a qualitative/categorical factor is essentially the same treatment. Consider a study in which p different pesticides are applied at r different rates, with one of those rates being zero. Since it doesn’t matter which pesticide is applied at zero rate, there are {p (r‒1) + 1} single treatment combinations rather than p × r. Thus, an analysis with a two-way linear model cannot be carried out because the sums of squares (SS) and degrees of freedom (df) of the analysis require adjustment to account for the incomplete factorial arrangement of treatments. The situation described above is common in the literature examining applications (e.g., pesticides, adhesive dental cements), enrichment (e.g., fertilization, isotope enrichment study), inoculations (e.g., growth hormones, vaccines), pollutants, length of storage, etc.

Quenouille (1953) and Addelman (1974) independently proposed a solution, hereinafter referred to as the Quenouille-Addelman (QA) solution for linear models to deal with the issue discussed above. In short, the QA solution involves amalgamating the SS of two models to obtain a single model, without increasing the Type I and Type II error (see below). However, despite the existence of this solution, it is practically never applied when necessary. To support this claim, a literature search of recent publications with a qualitative factor and a quantitative factor including a zero level is presented in Additional file 1: Review of the frequency of use of the Quenouille-Addelman solution in the literature. This search showed that none of the reviewed studies applied the QA solution. Instead, 11.4% of the studies used an erroneous factorial linear model, 2.5% excluded control treatment data from analysis, 36.7% performed a one-way test on a variable combining the qualitative and quantitative factors, and 49.4% used other inadequate approaches such as leaving out the comparison between qualitative treatments (Additional file 1: Review of the frequency of use of the Quenouille-Addelman solution in the literature). All aforementioned approaches are either biased, or contribute to information loss, as explained below. The resulting corollary is that the QA solution has been largely forgotten, perhaps due to a misunderstanding of the impact of zero levels in factorial arrangements. An update on the subject appears overdue. Herein, I (1) demonstrate how noncompliance with the QA solution alters the conclusions of a study, (2) describe how to achieve the solution using current statistical packages, and (3) examine how the solution can be adapted to solve situations not considered by Quenouille (1953) and Addelman (1974).

The Quenouille-Addelman solution and substitute (flawed) approaches

To date, the adverse effects of not using the QA solution when warranted have not been demonstrated. In a review paper, Gates (1991) discusses the solution using an example where the adjustment imparted is subtle, which does not do justice to the effect this solution has in most publications where it was used (Quenouille 1953; Green et al. 1976, 1977; Conrad et al. 1993; Lu and Nielsen 1993; Cushman et al. 1998; Olivier et al. 2000; Gong et al. 2001; Moreau and Bauce 2001, 2003). Using simulated data (available in Additional file 2: Simulated data used to produce Fig. 1 and Table 1) inspired by the aforementioned studies and literature review, I determined that the trends followed by the different levels of the qualitative variable as the quantitative variable increases can predict the effects of the solution. In the uncommon situation where the relationships with the quantitative variable of all individual qualitative treatments extend linearly from level zero (e.g., Fig. 1a), the adjustment provided by the solution is at its lowest. The interpretation of the results is slightly modified although the result of a linear model fit can substantially change (Table 1). Only one of all the published studies using the QA solution (i.e., Gong et al. 2001) reported such data. For all other situations, significant differences between the QA solution and an unadjusted model occur if the solution is not used (i.e., Quenouille 1953; Green et al. 1976, 1977; Conrad et al. 1993; Lu and Nielsen 1993; Cushman et al. 1998; Olivier et al. 2000; Moreau and Bauce 2001, 2003). If the relationships with and without the zero level are different (e.g., Fig. 1b), the adjustment conferred by the solution is at its highest (Table 1). In the latter example, the unadjusted model and the QA solution offer contrasting results, one indicating a strong interaction and the other not. Changing the scale (i.e., applying data transformation) does not help. Thus, in most cases, the QA solution increases the SS associated with the qualitative variable at the expense of the interaction term. Unadjusted models have an inflated Type II error rate when evaluating the main effect of the qualitative variable and an inflated Type I error rate when evaluating interactions.

Fig. 1
figure 1

Theoretical examples of factorial arrangements of treatments involving one qualitative factor with three levels and one quantitative factor with four levels that includes a zero amount. In a the relation radiates linearly from the zero level while in b the relation does not radiate in the same way

Table 1 Inadequate and Quenouille-Addelman solutions for the data presented in Fig. 1a, b

Other approaches have often been used instead of the QA solution (Additional file 1: Review of the frequency of use of the Quenouille-Addelman solution in the literature). For instance, some authors omitted control treatment data from the analysis. While this can approximate the QA solution in some situations, it can also alter the results because the main effect of the quantitative variable is not evaluated over its entire range. Other authors repeatedly used the control treatment in a series of analyses with the other treatments, which results in inflated Type I error rate. In many cases, authors combined the qualitative and quantitative variables into a single variable and performed a one-way test followed by multiple comparison (post hoc) tests, a procedure comparing each treatment to a single control (e.g., Dunnett or Williams test) or orthogonal contrasts. For example, a dose of 0, 25 and 100 ml of a given pesticide could be classified as control, low dose and high dose, respectively. This, however, does not change the fact that a control dose of two different pesticides is the same treatment. In addition, this approach means that the interaction between the factors cannot be examined and trend analysis (see below) is impossible. The same would apply with an ANCOVA or a regression model. A Dunnett or Williams test is also considerably less informative than the QA solution. For example, using the data in Fig. 1a, a Dunnett test only reveals that the zero level is different from all but one of the treatment combinations (i.e., Level A at the value 1 of the quantitative factor). The Dunnett or Williams test also precludes subsequent tests without the zero level because this inflates the Type I error rate. It is possible that contrasts could be used to derive main effect and interaction test statistics approximating the QA solution, but to our knowledge, no one has yet investigated this avenue.

The Quenouille-Addelman solution for fixed-effects two-way linear models

Quenouille (1953) and Addelman (1974) presented a hand calculation solution for a two-way fixed-effects linear model. Although some steps of the solution can be performed using statistical packages (Gates 1991), the solution is generally hard to perform in one execution (Hocking 2013). Suspecting that the lack of a detailed example may have contributed to the underutilization of the QA solution, I describe below a step-by-step approach to achieving the solution with most packages.

  1. 1.

    Using the whole dataset, calculate the unadjusted SS and df for all sources of variation using a two-way linear model.

  2. 2.

    Remove the zero level from the dataset and run the same analysis as in step 1.

  3. 3.

    Create a table of the SS and by combining the two linear models. Take the SS and df of the quantitative variable, error and total obtained from the first model. The SS and df of the qualitative factor and interaction are obtained from the second model.

  4. 4.

    Increase the number of df associated with the error term to incorporate the degrees lost by the interaction term because the difference between the qualitative factor for the zero level can only be chance differences (Quenouille 1953). The sum of the df of the treatments (A + Z + [A × Z]; Table 1) is now equal to the number of distinct treatments minus one. Of course, if there is any reason to suspect that there is a difference between the qualitative factors for the zero level, their SS can also be calculated separately according to the method presented by Quenouille (1953). The unadjusted two-way linear model and the QA solution both yield the same total SS if the design is balanced [i.e., no missing data; see Addelman (1974) and Gates (1991)].

  5. 5.

    Calculate the mean squares (MS = SS ÷ df), F-values (F = MS ÷ MSError) and P-values (tabulated using functions included in spreadsheets or probability tables) of the adjusted model. For examples of this last step, the reader is invited to refer to a statistical textbook or to the QA solution presented on the right side of Table 1.

The Quenouille-Addelman solution in other experimental situations

Below, I identify solutions, if possible, for analytical and experimental situations that cannot be solved using the calculations provided by Quenouille (1953) and Addelman (1974) and have not been previously addressed in the literature.

Polynomial contrasts

Because a linear model does not identify which of the pairs of means are different when there are more than two levels, additional tests are often required. Instead of post-hoc tests, when quantitative variables with fixed intervals are used, an effective approach is to perform a trend analysis using polynomial contrasts (Keppel 1982). For example, data in Fig. 1b allow for a third-order polynomial contrast model presented at the bottom of Table 1. Note that the polynomial model contains one level less for the interaction term than for the main effect of the continuous variable due to the adjustment associated with the QA solution (Table 1).

Mixed models, maximum likelihood and REML

Models with both fixed and random effects are nowadays analyzed using mixed model procedures, Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) (Zuur et al. 2009). Most software programs performing mixed model analyses now incorporate REML estimation as a default option (Gurka 2006). While several statistical packages do not display a complete table of SS when performing these analyses, the MS and df can be obtained and allow for the inverse calculation of the error term. For example, a REML of the data from Gates (1991) developed with the lmer function in the lme4 package of R (R Core Team 2021) can be used to calculate the MS of the error term by dividing the MS of any fixed effect by its F-value. Once these terms are secured through cross-multiplications, the QA solution presented above can be applied.

Unbalanced designs

An unbalanced dataset with missing data presents a challenge because the SS cannot be estimated independently and do not add up with the error term(s) to the total SS as they would in a balanced design. This non-orthogonality means that the Type I SS is affected by the order in which the terms are included in the model. One way to deal with this situation is to remove missing cells, randomly remove samples from the dataset until equilibrium is reached and apply the QA solution. Because most researchers are unwilling to throw data away, another approach is to fit the missing cells using imputation techniques (reviewed in van Ginkel et al. 2007), and then apply the QA solution. A third method is to apply the QA solution using Type III SS but if two treatments exhibit different levels of imbalance (e.g., if the control has fewer missing data than the other treatments), this leads to biases in SS estimations and result in a different total SS for the unadjusted linear model and the QA solution. Ultimately, the choice between strategies to deal with missing data should depend upon the situation at hand (see review by Graham 2009).

Three-way ANOVAs and higher-order models

The methodology presented by Quenouille (1953) and Addelman (1974) does not apply to higher-order models such as three-, four- or five-way linear models. Although the potential for an inflated rate of Type I error increases as the order of a model increases (Cohen 2001), these models are frequently applied and need to be addressed.

In the case of a three-way linear model with one zero-level quantitative variable and two qualitative variables, the solution is similar to the two-way fixed-effect linear model presented above. The SS of the error term and the quantitative variable are retrieved as usual while the SS of the two qualitative variables and all interactions are only calculated for non-zero quantities of the quantitative factor. The degrees of freedom associated with the main effects are not modified but the degrees of freedom associated with all four interaction terms are reduced and transferred to the error term. An example of a 3-way ANOVA calculation is shown in Additional file 3: Data and solutions for a 3-way analysis. Fitting a four-way and higher-order models with a single quantitative variable follows the same procedure.

A higher-order linear model with at least one qualitative variable could also include more than one quantitative variable with a zero level. An example of this situation would be a study of the effect of tillage (i.e., qualitative variable), nitrogen fertilization (i.e., quantitative variable that includes a zero level), and pesticide application (i.e., a quantitative variable that includes a zero level) on the biomass of a given crop. However, the mathematical solution has not been developed to our knowledge for this situation and cannot be solved using the QA solution as discussed here.

GLMs, GAMs and Bayesian models

The QA solution has not been developed for generalized, additive and Bayesian models. Considering the usefulness of these approaches, I stress the need of developing an equivalent of the QA solution for these models in the near future. However, it is important to note that in the literature search presented in Additional file 1: Review of the frequency of use of the Quenouille-Addelman solution in the literature, no study used any of these approaches to handle the data and thus, that the solution for a linear model herein is still relevant.

Discussion/conclusion

In this article, the emphasis has been placed primarily on hypothesis testing for the QA solution but many contemporary analyses focus instead on estimating the variability associated with the mean or median. As the sum of the degrees of freedom differs when the QA solution is applied, the eventual calculations of confidence intervals or error estimates will be impacted. For fixed-effects models, these calculations can be easily adjusted by following the methods described in standard statistical textbooks. On the other hand, more complex models (e.g., REML) require a mathematical solution that is beyond the scope of this article.

Statistical errors are generally not intentional. In most cases where the QA solution was not used when needed, the study authors were probably unaware that they were making a mistake. It is likely that a lack of statistical literacy also contributes to this situation. Likewise, poor statistical literacy among editors probably exacerbates this problem. As a peer reviewer, I have suggested to some authors to employ the QA solution. However, the suggested changes have never been enforced by the editorial board, perhaps due to a lack of awareness that noncompliance with the QA solution inflates type I and type II error rates. In their defense, the QA solution has practically fallen into oblivion since 2010 as a single review article (Moreau et al. 2015) has cited Addelman (1974). Quenouille (1953) was cited 27 times in this same period but not for an application of the solution discussed herein. My aspiration with this article is to rectify this situation and reduce the incidence of this recurring error in future publications.