13.1. Introduction

We will employ the same notations as in the previous chapters. Lower-case letters x, y, … will denote real scalar variables, whether mathematical or random. Capital letters X, Y, … will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as \(\tilde {x},\tilde {y},\tilde {X},\tilde {Y}\) to denote variables in the complex domain. Constant matrices will for instance be denoted by A, B, C. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute value or modulus of the determinant of A will be denoted as |det(A)|. When matrices are square, their order will be taken as p × p, unless specified otherwise. When A is a full rank matrix in the complex domain, then AA is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will indicate the wedge product of all the distinct differentials of the elements of the matrix X. Thus, letting the p × q matrix X = (x ij) where the x ij’s are distinct real scalar variables, \(\text{d}X=\wedge _{i=1}^p\wedge _{j=1}^q\text{d}x_{ij}\). For the complex matrix \(\tilde {X}=X_1+iX_2,\ i=\sqrt {(-1)}\), where X 1 and X 2 are real, \(\text{d}\tilde {X}=\text{d}X_1\wedge \text{d}X_2\).

In this chapter, we only consider analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) problems involving real populations. Even though all the steps involved in the following discussion focusing on the real variable case can readily be extended to the complex domain, it does not appear that a parallel development of analysis of variance methodologies in the complex domain has yet been considered. In order to elucidate the various steps in the procedures, we will first review the univariate case. For a detailed exposition of the analysis of variance technique in the scalar variable case, the reader may refer Mathai and Haubold (2017). We will consider the cases of one-way classification or completely randomized design as well as two-way classification without and with interaction or randomized block design. With this groundwork in place, the derivations of the results in the multivariate setting ought to prove easier to follow.

In the early nineteenth century, Gauss and Laplace utilized methodologies that may be regarded as forerunners to ANOVA in their analyses of astronomical data. However, this technique came to full fruition in Ronald Fisher’s classic book titled “Statistical Methods for Research Workers”, which was initially published in 1925. The principle behind ANOVA consists of partitioning the total variation present in the data into variations attributable to different sources. It is actually the total variation that is split rather than the total variance, the latter being a fraction of the former. Accordingly, the procedure could be more appropriately referred to as “analysis of variation”. As has already been mentioned, we will initially consider the one-way classification model, which will then be extended to the multivariate situation.

Let us first focus on an experimental design called a completely randomized experiment. In this setting, the subject matter was originally developed for agricultural experiments, which influenced its terminology. For example, the basic experimental unit is referred to as a “plot”, which is a piece of land in an agricultural context. When an experiment is performed on human beings, a plot translates into an individual. If the experiment is carried out on some machinery, then a machine corresponds to a plot. In a completely randomized experiment, a set of n 1 + n 2 + ⋯ + n k plots, which are homogeneous with respect to all factors of variation, are selected. Then, k treatments are applied at random to these plots, the first treatment to n 1 plots, the second treatment to n 2 plots, up to the k-th treatment being applied to n k plots. For instance, if the effects of k different fertilizers on the yield of a certain crop are to be studied, then the treatments consist of these k fertilizers, the first treatment meaning one of the fertilizers, the second treatment, another one and so on, with the k-th treatment corresponding to the last fertilizer. If the experiment involves studying the yield of corn among k different varieties of corn, then a treatment coincides with a particular variety. If an experiment consists of comparing k teaching methods, then a treatment refers to a method of teaching and a plot corresponds to a student. When an experiment compares the effect of k different medications in curing a certain ailment, then a treatment is a medication, and so on. If the treatments are denoted by t 1, …, t k, then treatment t j is applied at random to n j homogeneous plots or n j homogeneous plots are selected at random and treatment t j is applied to them, for j = 1, …, k. Random assignment is done to avoid possible biases or the influence of confounding factors, if any. Then, observations measuring the effect of these treatments on the experimental units are made. For example, in the case of various methods of teaching, the observation x ij could be the final grade obtained by the j-th student who was subjected to the i-th teaching method. In the case of comparing k different varieties of corn, the observation x ij could consist of the yield of corn observed at harvest time in the j-th plot which received the i-th variety of corn. Thus, in this instance, i stands for the treatment number and j represents the serial number of the plot receiving the i-th treatment, x ij being the final observation. Then, the corresponding linear additive fixed effect model is the following:

$$\displaystyle \begin{aligned} x_{ij}=\mu+\alpha_i+e_{ij},\ j=1,\ldots,n_i, \ i=1,\ldots,k, {} \end{aligned} $$
(13.1.1)

where μ is a general effect, α i is the deviation from the general effect due to treatment t i and e ij is the random component, which includes the sum total contributions originating from unknown or uncontrolled factors. When the experiment is designed, the plots are selected so that they be homogeneous with respect to all possible factors of variation. The general effect μ can be interpreted as the grand average or the expected value of x ij when α i is not present or treatment t i is not applied or has no effect. The simplest assumption that we will make is that E[e ij] = 0 for all i and j and Var(e ij) = σ 2 > 0 for all i and j and for some positive quantity σ 2, where E[  ⋅ ] denotes the expected value of [  ⋅ ]. It is further assumed that μ, α 1, …, α k are all unknown constants. When α 1, …, α k are assumed to be random variables, model (13.1.1) is referred to as a“random effect model”. In the following discussion, we will solely consider fixed effect models. The first step consists of estimating the unknown quantities from the data. Since no distribution is assumed on the e ij’s, and thereby on the x ij’s, we will employ the method of least squares for estimating the parameters. In that case, one has to minimize the error sum of squares which is

$$\displaystyle \begin{aligned}\sum_{ij}e_{ij}^2=\sum_{ij}[x_{ij}-\mu-\alpha_i]^2.\end{aligned}$$

Applying calculus principles, we equate the partial derivatives of \(\sum _{ij}e_{ij}^2\) with respect to μ to zero and then, equate the partial derivatives of \(\sum _{ij}e_{ij}^2\) with respect to α 1, …, α k to zero and solve these equations. A convenient notation in this area is to represent a summation by a dot. As an example, if the subscript j is summed up, it is replaced by a dot, so that ∑j x ij ≡ x i. ; similarly, ∑ij x ij ≡ x .. . Thus,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial\mu}\Big[\sum_{ij}e_{ij}^2\Big]=0\Rightarrow -2\sum_{ij}[x_{ij}-\mu-\alpha_i]=0\Rightarrow \sum_i\Big(\sum_j[x_{ij}-\mu-\alpha_i]\Big)=0 \end{aligned}$$

that is,

$$\displaystyle \begin{aligned}\sum_i[x_{i.}-n_i\mu-n_i\alpha_i]=0\Rightarrow x_{..}-n_{.}\,\mu-\sum_{i=1}^kn_i\alpha_i=0, \end{aligned}$$

and since we have taken α i as a deviation from the general effect due to treatment t i, we can let ∑i n i α i = 0 without any loss of generality. Then, x ..n . is an estimate of μ, and denoting estimates/estimators by a hat, we write \(\hat {\mu }={x_{..}}/{n_{.}}\). Now, note that for example α 1 appears in the terms \((x_{11}-\mu -\alpha _1)^2+\cdots +(x_{1n_1}-\mu -\alpha _1)^2=\sum _j(x_{1j}-\mu -\alpha _1)^2\) but does not appear in the other terms in the error sum of squares. Accordingly, for a specific i,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial\alpha_i}\Big[\sum_{ij}e_{ij}^2\Big]=0\Rightarrow \sum_j[x_{ij}-\mu-\alpha_i]=0\Rightarrow x_{i.}-n_i\hat{\mu}-n_i\hat{\alpha}_i=0, \end{aligned}$$

that is, \(\hat {\alpha }_i =\frac {x_{i.}}{n_i}-\hat {\mu }\,\). Thus,

$$\displaystyle \begin{aligned} \hat{\mu}=\frac{1}{n_{.}}x_{..}\ \ \text{and} \ \ \hat{\alpha}_i=\frac{1}{n_i}x_{i.}-\hat{\mu}\,. {} \end{aligned} $$
(13.1.2)

The least squares minimum is obtained by substituting the least squares estimates of μ and α i, i = 1, …, k, in the error sum of squares. Denoting the least squares minimum by s 2,

$$\displaystyle \begin{aligned} s^2&=\sum_{ij}(x_{ij}-\hat{\mu}-\hat{\alpha}_i)^2=\sum_{ij}\Big[x_{ij}-\frac{x_{..}}{n_{.}}-\Big(\frac{x_{i.}}{n_i}-\frac{x_{..}}{n_{.}}\Big)\Big]^2\\ &=\sum_{ij}\Big[x_{ij}-\frac{x_{i.}}{n_i}\Big]^2=\sum_{ij}\Big[x_{ij}-\frac{x_{..}}{n_{.}}\Big]^2-\sum_{ij}\Big[\frac{x_{i.}}{n_i}-\frac{x_{..}}{n_{.}}\Big]^2.{} \end{aligned} $$
(13.1.3)

When the square is expanded, the middle term will become \(-2\sum _{ij}(\frac {x_{i.}}{n_i}-\frac {x_{..}}{n_{.}})^2\), thus yielding the expression given in (13.1.3). As well, we have the following identity:

$$\displaystyle \begin{aligned}\sum_{ij}\Big(\frac{x_{i.}}{n_i}-\frac{x_{..}}{n_{.}}\Big)^2=\sum_{i=1}^{k}n_i\Big(\frac{x_{i.}}{n_i}-\frac{x_{..}}{n_{.}}\Big)^2 =\sum_i\frac{x_{i.}^2}{n_i}-\frac{x_{..}^2}{n_{.}}.\end{aligned}$$

Now, let us consider the hypothesis H o : α 1 = α 2 = ⋯ = α k, which is equivalent to the hypothesis α 1 = α 2 = ⋯ = α k = 0 since, by assumption, ∑i n i α i = 0. Proceeding as before, the least squares minimum, under the null hypothesis H o, denoted by \(s_0^2\), is the following:

$$\displaystyle \begin{aligned}s_0^2=\sum_{ij}\Big(x_{ij}-\frac{x_{..}}{n_{.}}\Big)^2\end{aligned}$$

and hence the sum of squares due to the hypothesis or due to the presence of the α j’s, is given by \(s_0^2-s^2=\sum _{ij}(\frac {x_{i.}}{n_i}-\frac {x_{..}}{n_{.}})^2\). Thus, the total variation is partitioned as follows:

$$\displaystyle \begin{aligned} s_0^2&=[s_0^2-s^2]+[s^2]\\ \sum_{ij}\Big(x_{ij}-\frac{x_{..}}{n_{.}}\Big)^2&=\Big[\sum_in_i\Big(\frac{x_{i.}}{n_i}-\frac{x_{..}}{n_{.}}\Big)^2\Big]+\Big[\sum_{ij}\Big(x_{ij}-\frac{x_{i.}}{n_i}\Big)^2\Big],\ \, \text{that}\ \text{is,}\\ {Total variation ({\$s_0^2\$})}& \!=\! {variation due to the {\$\alpha_j\$}'s ({\$s_0^2-s^2\$})}\! +\! {the residual variation ({\$s^2\$})},\end{aligned} $$

which is the analysis of variation principle. If \(e_{ij}\overset {iid}{\sim } N_1(0,\sigma ^2)\) for all i and j where σ 2 > 0 is a constant, it follows from the chisquaredness and independence of quadratic forms, as discussed in Chaps. 2 and 3, that \(\frac {s_0^2}{\sigma ^2}\sim \chi ^2_{n_{.}-1}\), a real chisquare variable having n . − 1 degrees of freedom, \(\frac {[s_0^2-s^2]}{\sigma ^2}\sim \chi ^2_{k-1}\) under the hypothesis H o and \(\frac {s^2}{\sigma ^2}\sim \chi ^2_{n_{.}-k},\) where the sum of squares due to the α j’s, namely \(s_0^2-s^2,\) and the residual sum of squares, namely s 2, are independently distributed under the hypothesis. Usually, these findings are put into a tabular form known as the analysis of variation table or ANOVA table. The usual format is as follows:

ANOVA Table for the One-Way Classification

Variation due to

df

SS

MS

(1)

(2)

(3)

(3)/(2)

treatments

k − 1

\(\sum _in_i(\frac {x_{i.}}{n_i}-\frac {x_{..}}{n_.})^2\)

\((s_0^2-s^2)/(k-1)\)

residuals

n . − k

\(\sum _{ij}(x_{ij}-\frac {x_{i.}}{n_i})^2\)

s 2∕(n . − k)

total

n . − 1

\(\sum _{ij}(x_{ij}-\frac {x_{..}}{n_{.}})^2\)

 

where df denotes the number of degrees of freedom, SS means sum of squares and MS stands for mean squares or the average of the squares. There is usually a last column which contains the F-ratio, that is, the ratio of the treatments MS to the residuals MS, and enables one to determine whether to reject the null hypothesis, in which case the test statistic is said to be “significant”, or not to reject the null hypothesis, when the test statistic is “not significant”. Further details on the real scalar variable case are available from Mathai and Haubold (2017).

In light of this brief review of the scalar variable case of one-way classification data or univariate data secured from a completely randomized design, the concepts will now be extended to the multivariate setting.

13.2. Multivariate Case of One-Way Classification Data Analysis

Extension of the results to the multivariate case is parallel to the scalar variable case. Consider a model of the type

$$\displaystyle \begin{aligned} X_{ij}=M+A_i+E_{ij},\ j=1,\ldots,n_i, \ i=1,\ldots,k, {} \end{aligned} $$
(13.2.1)

with X ij, M, A i and E ij all being p × 1 real vectors where X ij denotes the j-th observation vector in the i-th group or the observed vectors obtained from the n i plots receiving the i-th vector of treatments, M is a general effect vector, A i is a vector of deviations from M due to the i-th treatment vector so that ∑i n i A i = O since we are taking deviations from the general effect M, and E ij is a vector of random components assumed to be normally distributed as follows: \(E_{ij}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O,\) for all i and j where Σ is a positive definite covariance matrix, that is,

$$\displaystyle \begin{aligned}\text{Cov}(E_{ij})=E[(E_{ij}-O)(E_{ij}-O)']=E[E_{ij}E_{ij}^{\prime}]=\varSigma>O{ for all }i{ and }j,\end{aligned}$$

where E[  ⋅ ] denotes the expected value operator. This normality assumption will be needed for testing hypotheses and developing certain distributional aspects. However, the multivariate analysis of variation can be set up without having to resort to any distributional assumption. In the real scalar variable case, we minimized the sum of the squares of the errors since the variations only involved single scalar variables. In the vector case, if we take the sum of squares of the elements in E ij, that is, \(E_{ij}^{\prime }E_{ij}\) and its sum over all i and j, then we are only considering the variations in the individual elements of E ij’s; however, in the vector case, there is joint variation among the elements of the vector and that is also to be taken into account. Hence, we should be considering all squared terms and cross product terms or the whole matrix of squared and cross product terms. This is given by \(E_{ij}E_{ij}^{\prime }\) and so, we should consider this matrix and carry out some type of minimization. Consider

$$\displaystyle \begin{aligned} \sum_{ij}E_{ij}E_{ij}^{\prime}=\sum_{ij}[X_{ij}-M-A_i][X_{ij}-M-A_i]'. {} \end{aligned} $$
(13.2.2)

For obtaining estimates of M and A i, i = 1, …, k, we will minimize the trace of \(\sum _{ij}E_{ij}E_{ij}^{\prime }\) as a criterion. There are terms of the type [X ij − M − A i][X ij − M − A i] in this trace. Thus,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial M}\Big[\text{tr}\Big(\sum_{ij}E_{ij}E_{ij}^{\prime}\Big)\Big]&=\sum_{ij}\frac{\partial}{\partial M}[X_{ij}-M-A_i]'[X_{ij}-M-A_i]=O\\ &\Rightarrow\sum_{ij}X_{ij}-n_{.}M-\sum_in_iA_i=O\Rightarrow \hat{M}=\frac{1}{n_{.}}X_{..}\ ,\end{aligned} $$

noting that we assumed that ∑i n i A i = O. Now, on differentiating the trace of \(E_{ij}E_{ij}^{\prime }\) with respect to A i for a specific i, we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial A_i}\text{tr}\Big[\sum_{ij}E_{ij}E_{ij}^{\prime}\Big]&=\frac{\partial}{\partial A_i}\sum_{ij}[X_{ij}-M-A_i]'[X_{ij}-M-A_i]=O\\ &\Rightarrow\sum_{j}[X_{ij}-M-A_i]=O\Rightarrow \hat{A}_i=\frac{1}{n_i}X_{i.}-\hat{M}=\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\ .\end{aligned} $$

Observe that there is only one critical vector for \(\hat {M}\) and for \(\hat {A}_i,\ i=1,\ldots ,k\). Accordingly, the critical point will either correspond to a minimum or a maximum of the trace. But for arbitrary M and A i, the maximum occurs at plus infinity and hence, the critical point \((\hat {M},\hat {A}_i,\ i=1,\ldots ,k)\) corresponds to a minimum. Once evaluated at these estimates, the sum of squares and cross products matrix, denoted by S, is the following:

$$\displaystyle \begin{aligned} S&=\sum_{ij}[X_{ij}-\hat{M}-\hat{A}_i][X_{ij}-\hat{M}-\hat{A}_i]'=\sum_{ij}\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]'\\ &=\sum_{ij}\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]' -\sum_in_i\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]' {} \end{aligned} $$
(13.2.3)

Note that as in the scalar case, the middle terms and the last term will combine into the second term above. Now, let us impose the hypothesis H o : A 1 = A 2 = ⋯ = A k = O. Note that equality of the A j’s will automatically imply that each one is null because the weighted sum is null as per our initial assumption in the model (13.2.1). Under this hypothesis, the model will be X ij = M + E ij, and then proceeding as in the univariate case, we end up with the following sum of squares and cross products matrix, denoted by S 0:

$$\displaystyle \begin{aligned} S_0=\sum_{ij}\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]', {} \end{aligned} $$
(13.2.4)

so that the sum of squares and cross products matrix due to the A i’s is the difference

$$\displaystyle \begin{aligned} S_0-S=\sum_{ij}\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]'. {} \end{aligned} $$
(13.2.5)

Thus, the following partitioning of the total variation in the multivariate data:

$$\displaystyle \begin{aligned} S_0&=[S_0-S]+{S}\\ {Total variation}&={[Variation due to the {\$A_i\$}'s]}+{[Residual variation]}\\ \sum_{ij}\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]'&=\sum_{ij}\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big] \Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]'\\ &\ \ \ \ +\sum_{ij}\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]'.\end{aligned} $$

Under the normality assumption for the random component \(E_{ij}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O,\) we have the following properties, which follow from results derived in Chap. 5, the notation W p(ν, Σ) standing for a Wishart distribution having ν degrees of freedom and parameter matrix Σ:

$$\displaystyle \begin{aligned} {Total variation}&=S_0=\sum_{ij}\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]\Big[X_{ij}-\frac{1}{n_{.}}X_{..}\Big]',\\ S_0&\sim W_p(n_{.}-1,\varSigma);\\ {Variation due to the {\$A_i\$}'s}&=S_0-S=\sum_{ij}\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]\Big[\frac{1}{n_i}X_{i.}-\frac{1}{n_{.}}X_{..}\Big]',\\ S_0-S\sim W_p&(k-1,\varSigma){ under the hypothesis {\$A_1=A_2=\cdots=A_k=O\$}};\\ {Residual variation}&=S=\sum_{ij}\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]\Big[X_{ij}-\frac{1}{n_i}X_{i.}\Big]',\\ S&\sim W_p(n_.-k,\varSigma).\end{aligned} $$

We can summarize these findings in a tabular form known as the multivariate analysis of variation table or MANOVA table, where df means degrees of freedom in the corresponding Wishart distribution, and SSP represents the sum of squares and cross products matrix.

Multivariate Analysis of Variation (MANOVA) Table

Variation due to

df

SSP

treatments

k − 1

\(\sum _{ij}[\frac {1}{n_i}X_{i.}-\frac {1}{n_{.}}X_{..}][\frac {1}{n_i}X_{i.}-\frac {1}{n_{.}}X_{..}]'\)

residuals

n . − k

\(\sum _{ij}[X_{ij}-\frac {1}{n_i}X_{i.}][X_{ij}-\frac {1}{n_i}X_{i.}]'\)

total

n . − 1

\(\sum _{ij}[X_{ij}-\frac {1}{n_{.}}X_{..}][X_{ij}-\frac {1}{n_{.}}X_{..}]'\)

13.2.1. Some properties

The sample values from the i-th sample or the i-th group or the plots receiving the i-th treatment are \(X_{i1},X_{i2},\ldots ,X_{in_i}\). In this case, the average is \(\sum _{j=1}^{n_i}\frac {X_{ij}}{n_i}=\frac {X_{i.}}{n_i}\) and the i-th sample sum of squares and products matrix is

$$\displaystyle \begin{aligned}S_i=\sum_{j=1}^{n_i}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'.\end{aligned}$$

As well, it follows from Chap. 5 that S i ∼ W p(n i − 1, Σ) when \(E_{ij}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O\). Then, the residual sum of squares and products matrix can be written as follows, denoting it by the matrix V :

$$\displaystyle \begin{aligned} V&= \sum_{ij}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'=\sum_{i=1}^{k}\Big[\sum_{j=1}^{n_i}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'\Big]\\ &=\sum_{i=1}^kS_i=S_1+S_2+\cdots+S_k{} \end{aligned} $$
(13.2.6)

where S i ∼ W p(n i − 1, Σ), i = 1, …, k, and the S i’s are independently distributed since the sample values from the k groups are independently distributed among themselves (within the group) and between groups. Hence, S ∼ W p(ν, Σ), ν = (n 1 − 1) + (n 2 − 1) + ⋯ + (n k − 1) = n . − k. Note that \(\bar {X}_i=\frac {X_{i.}}{n_i}\) has \(\text{Cov}(\bar {X}_i)=\frac {1}{n_i}\varSigma \), so that \(\sqrt {n_i}(\bar {X}_i-\bar {X})\) are iid N p(O, Σ) where \(\bar {X}={X_{..}}/{n_{.}}\) . Then, the sum of squares and products matrix due to the treatments or due to the A i’s is the following, denoting it by U:

$$\displaystyle \begin{aligned} U=\sum_{i=1}^kn_i\Big[\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big]\Big[\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big]'\sim W_p(k-1,\varSigma) {} \end{aligned} $$
(13.2.7)

under the null hypothesis; when the hypothesis is violated, it is a noncentral Wishart distribution. Further, the sum of squares and products matrix due to the treatments and the residual sum of squares and products matrix are independently distributed. Thus, by comparing U and V , we should be able to reach a decision regarding the hypothesis. One procedure that is followed is to take the determinants of U and V  and compare them. This does not have much of a basis and determinants should not be called “generalized variance” as previously explained since the basic condition of a norm is violated by the determinant. The basis for comparing determinants will become clear from the point of view of testing hypotheses by applying the likelihood ratio criterion, which is discussed next.

13.3. The Likelihood Ratio Criterion

Let \(E_{ij}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O\), and suppose that we have simple random samples of sizes n 1, …, n k from the k groups relating to the k treatments. Then, the likelihood function, denoted by L, is the following:

$$\displaystyle \begin{aligned} L&=\prod_{ij}\frac{\text{e}^{-\frac{1}{2}(X_{ij}-M-A_i)'\varSigma^{-1}(X_{ij}-M-A_i)}}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}\\ &=\frac{\text{e}^{-\frac{1}{2}\sum_{ij}(X_{ij}-M-A_i)'\varSigma^{-1}(X_{ij}-M-A_i)}}{(2\pi)^{\frac{pn_{.}}{2}}|\varSigma|{}^{\frac{n_{.}}{2}}}.{} \end{aligned} $$
(13.3.1)

The maximum likelihood estimators/estimates (MLE’s) of M is \(\hat {M}=\frac {X_{..}}{n_{.}}=\bar {X}\) and that of A i is \(\hat {A}_i=\frac {X_{i.}}{n_i}-\hat {M}\). With a view to obtaining the MLE of Σ, we first note that the exponent is a real scalar quantity which is thus equal to its trace, so that we can express the exponent as follows, after substituting the MLE’s of M and A i:

$$\displaystyle \begin{aligned} -\frac{1}{2}\sum_{ij}&[X_{ij}-\hat{M}-\hat{A}_i]'\varSigma^{-1}[X_{ij}-\hat{M}-\hat{A}_i]\\ &=-\frac{1}{2}\sum_{ij}\text{tr}\Big(\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]'\varSigma^{-1}\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]\Big)\\ &=-\frac{1}{2}\sum_{ij}\text{tr}\Big(\varSigma^{-1}\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]'\Big).\end{aligned} $$

Now, following through the estimation procedure of the MLE included in Chap. 3, we obtain the MLE of Σ as

$$\displaystyle \begin{aligned} \hat{\varSigma}=\frac{1}{n_{.}}\sum_{ij}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'. {} \end{aligned} $$
(13.3.2)

After substituting \(\hat {M},\ \hat {A}_i\) and \(\hat {\varSigma }\), the exponent in the likelihood ratio criterion λ becomes \(-\frac {1}{2}n_{.}\text{ tr}(I_p)=-\frac {1}{2}n_{.}p\). Hence, the maximum value of the likelihood function L under the general model becomes

$$\displaystyle \begin{aligned} \max L =\frac{\text{e}^{-\frac{1}{2}(n_{.}p)}n_{.}^{\frac{n_{.}p}{2}}}{(2\pi)^{\frac{n_{.}p}{2}}|\sum_{ij}(X_{ij}-\frac{X_{i.}}{n_i}) (X_{ij}-\frac{X_{i.}}{n_i})'|}.{} \end{aligned} $$
(13.3.3)

Under the hypothesis H o : A 1 = A 2 = ⋯ = A p = O, the model is X ij = M + E ij and the MLE of M under H o is still \(\frac {1}{n_{.}}X_{..}\) and \(\hat {\varSigma }\) under H o is \(\frac {1}{n_{.}}\sum _{ij}(X_{ij}-\frac {1}{n_{.}}X_{..})(X_{ij}-\frac {1}{n_{.}}X_{..})^{\prime }\), so that \(\max L\) under H o, denoted by \(\max L_o\), is

$$\displaystyle \begin{aligned}\max L_o=\frac{\text{e}^{-\frac{1}{2}n_{.}p}n_{.}^{\frac{n_{.}p}{2}}}{(2\pi)^{\frac{n_{.}p}{2}}|\sum_{ij}(X_{ij}-\frac{1}{n_{.}}X_{..}) (X_{ij}-\frac{1}{n_{.}}X_{..})'|{}^{\frac{n_{.}}{2}}}.{}\end{aligned} $$
(13.3.4)

Therefore, the λ-criterion is the following:

$$\displaystyle \begin{aligned} \lambda &=\frac{\max L_o}{\max L}=\frac{|\sum_{ij}(X_{ij}-\frac{X_{i.}}{n_i})(X_{ij}-\frac{X_{i.}}{n_i})'|{}^{\frac{n_{.}}{2}}} {|\sum_{ij}(X_{ij}-\frac{X_{..}}{n_{.}})(X_{ij}-\frac{X_{..}}{n_{.}})'|{}^{\frac{n_{.}}{2}}}\\ &=\frac{|V|{}^{\frac{n_{.}}{2}}}{|U+V|{}^{\frac{n_{.}}{2}}}{} \end{aligned} $$
(13.3.5)

where

$$\displaystyle \begin{aligned}U=\sum_{ij}\Big(\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big)\Big(\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big)',\ V=\sum_{ij}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'\end{aligned}$$

and U ∼ W p(k − 1, Σ) under H o is the sum of squares and cross products matrix due to the A i’s and V ∼ W p(n . − k, Σ) is the residual sum of squares and cross products matrix. It has already been shown that U and V  are independently distributed. Then \(W_1=(U+V)^{-\frac {1}{2}}V(U+V)^{-\frac {1}{2}}\), with the determinant \(\frac {|V|}{|U+V|}\), is a real matrix-variate type-1 beta with parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\), as defined in Chap. 5, and \(W_2=V^{-\frac {1}{2}}UV^{-\frac {1}{2}}\) is a real matrix-variate type-2 beta with the parameters \((\frac {k-1}{2},\frac {n_{.}-k}{2})\). Moreover, \(Y_1=I-W_1=(U+V)^{-\frac {1}{2}}U(U+V)^{-\frac {1}{2}}\) with \(\frac {|U|}{|U+V|}\) is a real matrix-variate type-1 beta random variables with parameters \((\frac {k-1}{2},\frac {n_{.}-k}{2})\). Given the properties of independent real matrix-variate gamma random variables, we have seen in Chap. 5 that W 1 and Y 2 = U + V  are independently distributed. Similarly, Y 1 = I − W 1 and Y 2 are independently distributed. Further, \(W_2^{-1}=V^{\frac {1}{2}}U^{-1}V^{\frac {1}{2}}\) is a real matrix-variate type-2 beta random variable with the parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\). Observe that

$$\displaystyle \begin{aligned}|W_1|=\frac{|V|}{|U+V|}=\frac{1}{|V^{-\frac{1}{2}}UV^{-\frac{1}{2}}+I|}=\frac{1}{|W_2+I|},\ W_1=(I+W_2)^{-1}. \end{aligned}$$

A one-to-one function of λ is

$$\displaystyle \begin{aligned} w=\lambda^{\frac{2}{n_{.}}}=\frac{|V|}{|U+V|}=|W_1|. {} \end{aligned} $$
(13.3.6)

13.3.1. Arbitrary moments of the likelihood ratio criterion

For an arbitrary h, the h-th moment of w as well as that of λ can be obtained from the normalizing constant of a real matrix-variate type-1 beta density with the parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\). That is,

$$\displaystyle \begin{aligned} E[w^h]&=\frac{\varGamma_p(\frac{n_{.}-k}{2}+h)}{\varGamma_p(\frac{n_{.}-k}{2})}\frac{\varGamma_p(\frac{n_{.}-1}{2})} {\varGamma_p(\frac{n_{.}-1}{2}+h)},\ (n_{.}-k)+(k-1)=n_{.}-1,\\ &=\left\{\prod_{j=1}^p\frac{\varGamma(\frac{n_{.}-1}{2}-\frac{j-1}{2})}{\varGamma(\frac{n_{.}-k}{2}-\frac{j-1}{2})}\right\} \left\{\prod_{j=1}^p\frac{\varGamma(\frac{n_{.}-k}{2}-\frac{j-1}{2}+h)}{\varGamma(\frac{n_{.}-1}{2}-\frac{j-1}{2}+h)}\right\}.{} \end{aligned} $$
(13.3.7)

As \(E[\lambda ^h]=E[w^{\frac {n_{.}}{2}}]^h=E[w^{(\frac {n_{.}}{2})h}]\), the h-th moment of λ is obtained by replacing h by \((\frac {n_{.}}{2})h\) in (13.3.7). That is,

$$\displaystyle \begin{aligned}E[\lambda^h]=C_{p,k}\left\{\prod_{j=1}^p\frac{\varGamma(\frac{n_{.}-k}{2}-\frac{j-1}{2}+(\frac{n_{.}}{2})h)} {\varGamma(\frac{n_{.}-1}{2}-\frac{j-1}{2}+(\frac{n_{.}}{2})h)}\right\}{}\end{aligned} $$
(13.3.8)

where

$$\displaystyle \begin{aligned}C_{p.k}=\left\{\prod_{j=1}^p\frac{\varGamma(\frac{n_{.}-1}{2}-\frac{j-1}{2})}{\varGamma(\frac{n_{.}-k}{2}-\frac{j-1}{2})}\right\}.\end{aligned}$$

13.3.2. Structural representation of the likelihood ratio criterion

It can readily be seen from (13.3.7) that the h-th moment of w is of the form of the h-th moment of a product of independently distributed real scalar type-1 beta random variables. That is,

$$\displaystyle \begin{aligned} E[w^h]=E[w_1w_2\cdots w_p]^h,\ w=w_1w_2\cdots w_p, {} \end{aligned} $$
(13.3.9)

where w 1, …, w p are independently distributed and w j is a real scalar type-1 beta random variable with the parameters \((\frac {n_{.}-k}{2}-\frac {j-1}{2},\frac {k-1}{2}),\ j=1,\ldots ,p,\) for n . − k > p − 1 and n . > k + p − 1. Hence the exact density of w is available by constructing the density of a product of independently distributed real scalar type-1 beta random variables. For special values of p and k, one can obtain the exact densities in the forms of elementary functions. However, for the general case, the exact density corresponding to E[w h] as specified in (13.3.7) can be expressed in terms of a G-function and, in the case of E[λ h] as given in (13.3.8), the exact density can be represented in terms of an H-function. These representations are as follows, denoting the densities of w and λ as f w(w) and f λ(λ), respectively:

$$\displaystyle \begin{aligned} f_w(w)&=C_{p,k}\,G^{p,0}_{p,p}\left[w\Big\vert_{\frac{n_{.}-k}{2}-\frac{j-1}{2}-1,\ j=1,\ldots,p}^{\frac{n_{.}-1}{2}-\frac{j-1}{2}-1,\ j=1,\ldots,p}\right],\ 0<w\le 1, {} \end{aligned} $$
(13.3.10)
$$\displaystyle \begin{aligned} f_{\lambda}(\lambda)&=C_{p,k}\,H^{p,0}_{p,p}\left[\lambda\Big\vert_{(\frac{n_{.}-k}{2}-\frac{j-1}{2}-\frac{n_{.}}{2}, \frac{n_{.}}{2}),\ j=1,\ldots,p}^{(\frac{n_{.}-1}{2}-\frac{j-1}{2}-\frac{n_{.}}{2},\frac{n_{.}}{2}),\ j=1,\ldots,p}\right],\ 0<\lambda\le 1,{} \end{aligned} $$
(13.3.11)

for n . > p + k − 1, p ≥ 1 and f w(w) = 0, f λ(λ) = 0, elsewhere. The evaluation of G and H-functions can be carried out with the help of symbolic computing packages such as Mathematica and MAPLE. Theoretical considerations, applications and several special cases of the G and H-functions are, for instance, available from Mathai (1993) and Mathai, Saxena and Haubold (2010). The special cases listed therein can also be utilized to work out the densities for particular cases of (13.3.10) and (13.3.11). Explicit structures of the densities for certain special cases are listed in the next section.

13.3.3. Some special cases

Several particular cases can be worked out by examining the moment expressions in (13.3.7) and (13.3.8). The h-th moment of the \(w=\lambda ^{\frac {2}{n_{.}}}\), where λ is the likelihood ratio criterion, is available from (13.3.7) as

$$\displaystyle \begin{aligned} E[w^h]=C_{p,k}\frac{\varGamma(\frac{n_{.}-k}{2}+h)\varGamma(\frac{n_{.}-k}{2}-\frac{1}{2}+h)\cdots\varGamma(\frac{n_{.}-k}{2}-\frac{p-1}{2}+h)} {\varGamma(\frac{n_{.}-1}{2}+h)\varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+h)\cdots\varGamma(\frac{n_{.}-1}{2}-\frac{p-1}{2}+h)}. \end{aligned}$$
(i)

Case (1): p = 1

In this case, from (i),

$$\displaystyle \begin{aligned}E[w^h]=C_{1,k}\frac{\varGamma(\frac{n_{.}-k}{2}+h)}{\varGamma(\frac{n_{.}-1}{2}+h)},\end{aligned}$$

which is the h-th moment of a real scalar type-1 beta random variable with the parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\) and, in this case, w is simply a real scalar type-1 beta random variable with the parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\). We reject the null hypothesis H o : A 1 = A 2 = ⋯ = A k = O for small values of the λ-criterion and, accordingly, we reject H o for small values of w or the hypothesis is rejected when the observed value of w ≤ w α where w α is such that \(\int _0^{w_{\alpha }}f_w(w)\text{d}w=\alpha \) for the preassigned size α of the critical region, f w(w) denoting the density of w for p = 1, n . > k.

Case (2): p = 2

From (i), we have

$$\displaystyle \begin{aligned}E[w^h]=C_{2,k}\frac{\varGamma(\frac{n_{.}-k}{2}+h)\varGamma(\frac{n_{.}-k}{2}-\frac{1}{2}+h)}{\varGamma(\frac{n_{.}-1}{2}+h) \varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+h)}\end{aligned}$$

and therefore

$$\displaystyle \begin{aligned} E[w^{\frac{1}{2}}]^h=C_{2,k}\frac{\varGamma(\frac{n_{.}-k}{2}+\frac{h}{2})\varGamma(\frac{n_{.}-k}{2}-\frac{1}{2}+\frac{h}{2})} {\varGamma(\frac{n_{.}-1}{2}+\frac{h}{2})\varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+\frac{h}{2})}. \end{aligned}$$
(ii)

The gamma functions in (ii) can be combined by making use of a duplication formula for gamma functions, namely,

$$\displaystyle \begin{aligned} \varGamma(z)\varGamma(z+{1}/{2})=\pi^{\frac{1}{2}}2^{1-2z}\varGamma(2z). {} \end{aligned} $$
(13.3.12)

Take \(z=\frac {n_{.}-k}{2}-\frac {1}{2}+\frac {h}{2}\) and \(z=\frac {n_{.}-1}{2}-\frac {1}{2}+\frac {h}{2}\) in the part containing h and in the constant part wherein h = 0, and then apply formula (13.3.12) to obtain

$$\displaystyle \begin{aligned}E[w^{\frac{1}{2}}]^h=\frac{\varGamma(n_{.}-2)}{\varGamma(n_{.}-k-1)}\frac{\varGamma(n_{.}-k-1+h)}{\varGamma(n_{.}-2+h)}, \end{aligned}$$

which is, for an arbitrary h, the h-th moment of a real scalar type-1 beta random variable with parameters (n . − k − 1, k − 1) for n . − k − 1 > 0, k > 1. Thus, \(y=w^{\frac {1}{2}}\) is a real scalar type-1 beta random variable with the parameters (n . − k − 1, k − 1). We would then reject H o for small values of w, that is, for small values of y or when the observed value of y ≤ y α with y α such that \(\int _0^{y_{\alpha }}f_y(y)\text{d}y=\alpha \) for a preassigned probability of type-I error which is the error of rejecting H o when H o is true, where f y(y) is the density of y for p = 2 whenever n . > k + 1.

Case (3): k = 2, p ≥ 1, n . > p + 1

In this case, the h-th moment of w as specified in (13.3.7) is the following:

$$\displaystyle \begin{aligned} E[w^h]&=C_{p,2}\frac{\varGamma(\frac{n_{.}-2}{2}+h)\varGamma(\frac{n_{.}-2}{2}-\frac{1}{2}+h)\cdots\varGamma(\frac{n_{.}-2}{2}-\frac{p-1}{2}+h)} {\varGamma(\frac{n_{.}-1}{2}+h)\varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+h)\cdots\varGamma(\frac{n_{.}-1}{2}-\frac{p-1}{2}+h)}\\ &=C_{p,2}\frac{\varGamma(\frac{n_{.}-1}{2}-\frac{p}{2}+h)}{\varGamma(\frac{n_{.}-1}{2}+h)}\end{aligned} $$

since the numerator gamma functions, except the last one, cancel with the denominator gamma functions except the first one. This expression happens to be the h-th moment of a real scalar type-1 beta random variable with the parameters \((\frac {n_{.}-1-p}{2},\frac {p}{2})\) and hence, for k = 2, n . − 1 − p > 0 and p ≥ 1, w is a real scalar type-1 beta random variable. Then, we reject the null hypothesis H o for small values of w or when the observed value of w ≤ w α, with w α such that \(\int _0^{w_{\alpha }}f_w(w)\text{d}w=\alpha \) for a preassigned significance level α, f w(w) being the density of w for this case. We will use the same notation f w(w) for the density of w in all the special cases.

Case (4): k = 3, p ≥ 1

Proceeding as in Case (3), we see that all the gammas in the h-th moment of w cancel out except the last two in the numerator and the first two in the denominator. Thus,

$$\displaystyle \begin{aligned} E[w^h]&=C_{p,3}\frac{\varGamma(\frac{n_{.}-3}{2}+\frac{1}{2}-\frac{p-1}{2}+h)\varGamma(\frac{n_{.}-3}{2}-\frac{p-1}{2})}{\varGamma(\frac{n_{.}-1}{2}+h) \varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+h)}\\ &=C_{p,3}\frac{\varGamma(\frac{n_{.}-1}{2}-\frac{p}{2}+h)\varGamma(\frac{n_{.}-1}{2}-\frac{p}{2}-\frac{1}{2}+h)}{\varGamma(\frac{n_{.}-1}{2}+h) \varGamma(\frac{n_{.}-1}{2}-\frac{1}{2}+h)}. \end{aligned} $$

After combining the gammas in \(y=w^{\frac {1}{2}}\) with the help of the duplication formula (13.3.12), we have the following:

$$\displaystyle \begin{aligned}E[y^h]=\frac{\varGamma(n_{.}-2)}{\varGamma(n_{.}-2-p)}\frac{\varGamma(n_{.}-p-2+h)}{\varGamma(n_{.}-2+h)}. \end{aligned}$$

Therefore, \(y=w^{\frac {1}{2}}\) is a real scalar type-1 random variable with the parameters (n . − p − 2, p). We reject the null hypothesis for small values of y or when the observed value of y ≤ y α, with y α such that \(\int _0^{y_{\alpha }}f_y(y)\text{d}y=\alpha \) for a preassigned significance level α. We will use the same notation f y(y) for the density of y in all special cases.

We can also obtain some special cases for \(t_1=\frac {1-w}{w}\) and \(t_2=\frac {1-y}{y},\) with \( y=\sqrt {w}\). With this transformation, t 1 and t 2 will be available in terms of type-2 beta variables in the real scalar case, which conveniently enables us to relate this distribution to real scalar F random variables so that an F table can be used for testing the null hypothesis and reaching a decision. We have noted that

$$\displaystyle \begin{aligned} w&=\frac{|V|}{|U+V|}=|(U+V)^{-\frac{1}{2}}V(U+V)^{-\frac{1}{2}}|=|W_1|\\ &=\frac{1}{|V^{-\frac{1}{2}}UV^{-\frac{1}{2}}+I|}=\frac{1}{|W_2+I|}\end{aligned} $$

where W 1 is a real matrix-variate type-1 beta random variable with the parameters \((\frac {n_{.}-k}{2},\frac {k-1}{2})\) and W 2 is a real matrix-variate type-2 beta random variable with the parameters \((\frac {k-1}{2},\frac {n_{.}-k}{2})\). Then, when p = 1, W 1 and W 2 are real scalar variables, denoted by w 1 and w 2, respectively. Then for p = 1, we have one gamma ratio with h in the general h-th moment (13.3.7) and then,

$$\displaystyle \begin{aligned}t_1=\frac{1-w}{w}=\frac{1}{w}-1=(w_2+1)-1=w_2 \end{aligned}$$

where w 2 is a real scalar type-2 beta random variable with the parameters \((\frac {p-1}{2}, \frac {n_{.}-k}{2})\). As well, in general, for a real matrix-variate type-2 beta matrix W 2 with the parameters \((\frac {\nu _1}{2},\frac {\nu _2}{2}),\) we have \(\frac {\nu _2}{\nu _1}W_2=F_{\nu _1,\nu _2}\) where \(F_{\nu _1,\nu _2}\) is a real matrix-variate F matrix random variable with degrees of freedom ν 1 and ν 2. When p = 1 or in the real scalar case \(\frac {\nu _2}{\nu _1}W_2=F_{\nu _1,\nu _2}\) where, in this case, F is a real scalar F random variable with ν 1 and ν 2 degrees of freedom. We have used F for the scalar and matrix-variate case in order to avoid too many symbols. For p = 2, we combine the gamma functions in the numerator and denominator by applying the duplication formula for gamma functions (13.3.12); then, for \(t_2=\frac {1-y}{y}\) the situation turns out to be the same as in the case of t 1, the only difference being that in the real scalar type-2 beta w 2, the parameters are (k − 1, n . − k − 1). Note that the original \(\frac {k-1}{2}\) has become k − 1 and the original \(\frac {n_{.}-k}{2}\) has become n . − k − 1. Thus, we can state the following two special cases.

Case (5): \(p=1,\ t_1=\frac {1-w}{w}\)

As was explained, t 1 is a real type-2 beta random variable with the parameters \((\frac {k-1}{2},\frac {n_{.}-k}{2})\), so that

$$\displaystyle \begin{aligned}\frac{n_{.}-k}{k-1}\,t_1\simeq F_{k-1,n_{.}-k}, \end{aligned}$$

which is a real scalar F random variable with k − 1 and n . − k degrees of freedom. Accordingly, we reject H o for small values of w and y, which corresponds to large values of F. Thus, we reject the null hypothesis H o whenever the observed value of \(F_{k-1,n_{.}-k}\ge F_{k-1,n_{.}-k,\alpha }\) where \(F_{k-1,n_{.}-k,\alpha }\) is the upper 100 α% percentage point of the F distribution or \(\int _{a}^{\infty }g(F)\text{d}F=\alpha \) where \(a=F_{k-1,n_{.}-k,\alpha }\) and g(F) is the density of F in this case.

Case (6): \(p=2,\ t_2=\frac {1-y}{y},\ y=\sqrt {w}\)

As previously explained, t 2 is a real scalar type-2 beta random variable with the parameters (k − 1, n . − k − 1) or

$$\displaystyle \begin{aligned}\frac{n_{.}-k-1}{k-1}\,t_2\simeq F_{2(k-1),2(n_{.}-k-1)}, \end{aligned}$$

which is a real scalar F random variable having 2(k − 1) and 2(n . − k − 1) degrees of freedom. We reject the null hypothesis for large values of t 2 or when the observed value of \([\frac {n_{.}-k-1}{k-1}]t_2\ge b\) with b such that \(\int _{b}^{\infty }g(F){\mathrm{d}}F=\alpha \), g(F) denoting in this case the density of a real scalar random variable F with degrees of freedoms 2(k − 1) and 2(n . − k − 1), and \(b=F_{2(k-1),2(n_{.}-k-1),\alpha }\).

Case (7): \(k=2,\ p\ge 1,\ t_1=\frac {1-w}{w}\)

For the case k = 2, we have already seen that the gamma functions with h in their arguments cancel out, leaving only one gamma in the numerator and one gamma in the denominator, so that w is distributed as a real scalar type-1 beta random variable with the parameters \((\frac {n_{.}-1-p}{2}, \frac {p}{2})\). Thus, \(t_1=\frac {1-w}{w}\) is a real scalar type-2 beta with the parameters \((\frac {p}{2},\frac {n_.-p-1}{2})\), and

$$\displaystyle \begin{aligned}\Big[\frac{n_{.}-1-p}{p}\Big]t_1\simeq F_{p,n_{.}-1-p}, \end{aligned}$$

which is a real scalar F random variable having p and n . − 1 − p degrees of freedom. We reject H o for large values of t 1 or when the observed value of \([\frac {n_{.}-1-p}{p}]t_1\ge b\) where b is such that \(\int _b^{\infty }g(F){\mathrm{d}}F=\alpha \) with g(F) being the density of an F random variable with degrees of freedoms p and n . − 1 − p in this special case.

Case (8): \(k=3,p\ge 1,\ t_2=\frac {1-y}{y},\ y=\sqrt {w}\)

On combining Cases (4) and (6), it is seen that t 2 is a real scalar type-2 beta random variable with the parameters (p, n . − p − 1), so that

$$\displaystyle \begin{aligned}\frac{n_{.}-1-p}{p}\,t_2\simeq F_{2p,2(n_{.}-p-1)}, \end{aligned}$$

which is a real scalar F random variable with the degrees of freedoms (2p, 2(n . − p − 1)). Thus, we reject the hypothesis for large values of this F random variable. For a test at significance level α or with α as the size of its critical region, the hypothesis H o : A 1 = A 2 = ⋯ = A k = O is rejected when the observed value of this \(F\ge F_{2p,2(n_{.}-1-p),\alpha }\) where \(F_{2p,2(n_{.}-1-p),\alpha }\) is the upper 100 α% percentage point of the F distribution.

Example 13.3.1

In a dieting experiment, three different diets D 1, D 2 and D 3 are tried for a period of one month. The variables monitored are weight in kilograms (kg), waist circumference in centimeters (cm) and right mid-thigh circumference in centimeters. The measurements are x 1 =  final weight minus initial weight, x 2 =  final waist circumference minus initial waist reading and x 3 =  final minus initial thigh circumference. Diet D 1 is administered to a group of 5 randomly selected individuals (n 1 = 5), D 2, to 4 randomly selected persons (n 2 = 4), and 6 randomly selected individuals (n 3 = 6) are subjected to D 3. Since three variables are monitored, p = 3. As well, there are three treatments or three diets, so that k = 3. In our notation,

where i corresponds to the diet number and j stands for the sample serial number. For example, the observation vector on individual # 3 within the group subjected to diet D 2 is denoted by X 23. The following are the data on x 1, x 2, x 3:

Diet D 1 :  X 1j, j = 1, 2, 3, 4, 5 : 

Diet D 2 :  X 2j, j = 1, 2, 3, 4 : 

Diet D 3 :  X 3j, j = 1, 2, 3, 4, 5, 6 : 

(1): Perform an ANOVA test on the first component consisting of weight measurements; (2): Carry out a MANOVA test on the first two components, weight and waist measurements; (3): Do a MANOVA test on all the three variables, weight, waist and thigh measurements.

Solution 13.3.1

We first compute the vectors \(X_{1.},\bar {X}_1,X_{2.},\bar {X}_2,X_{3.},\bar {X}_3,X_{..}\) and \( \bar {X}\):

Problem (1): ANOVA on the first component x 1. The first components of the observations are x 1ij. The first components under diet D 1 are

$$\displaystyle \begin{aligned}{}[x_{111},x_{112},x_{113},x_{114},x_{115}]=[2,4,-1,-1,1] \ \text{with}\ x_{11.}=5; \end{aligned}$$

the first components of observations under diet D 2 are

$$\displaystyle \begin{aligned}{}[x_{121},x_{122},x_{123}, x_{124}]=[1,3,-1,1] \ \text{with}\ x_{12.}=4;\end{aligned}$$

and the first components under diet D 3 are

$$\displaystyle \begin{aligned}{}[x_{131},x_{132},x_{133},x_{134},x_{135},x_{136}]=[2,1,-1,2,2,0] \ \text{with}\ x_{13.}=6.\end{aligned}$$

Hence, the total on the first component x 1.. = 15, and \(\bar {x}_1=\frac {x_{1..}}{n_{.}}=\frac {15}{15}=1\). The first component model is the following:

$$\displaystyle \begin{aligned}x_{1ij}=\mu+\alpha_i+e_{1ij}, \ j=1,\ldots,n_i,\ i=1,\ldots,k. \end{aligned}$$

Note again that estimators and estimates will be denoted by a hat. As previously mentioned, the same symbols will be used for the variables and the observations on those variables in order to avoid using too many symbols; however, the notations will be clear from the context. If the discussion pertains to distributions, then variables are involved, and if we are referring to numbers, then we are dealing with observations.

The least squares estimates are \(\hat {\mu }=\frac {x_{1..}}{n_{.}}=1,\ \hat {\alpha }_1=\frac {x_{11.}}{5}=\frac {5}{5}=1\), \(\hat {\alpha }_2=\frac {x_{12.}}{4}=\frac {4}{4}=1\), \(\hat {\alpha }_3=\frac {x_{13.}}{6}=\frac {6}{6}=1\). The first component hypothesis is α 1 = α 2 = α 3 = 0. The total sum of squares is

$$\displaystyle \begin{aligned} \sum_{ij}(x_{1ij}-\bar{x}_{1})^2&=\sum_{ij}x_{1ij}^2-\frac{x_{1..}^2}{n_{.}}\\ &=(2-1)^2+(4-1)^2+(-1-1)^2+(-1-1)^2+(1-1)^2\\ &\ \ \ \ +(1-1)^2+(3-1)^2+(-1-1)^2+(1-1)^2\\ &\ \ \ \ +(2-1)^2+(1-1)^2+(-1-1)^2+(2-1)^2+(2-1)^2+(0-1)^2\\ &=34. \end{aligned} $$

The sum of squares due to the α i’s is available from

$$\displaystyle \begin{aligned} \sum_in_i\Big(\frac{x_{1i.}}{n_i}-\frac{x_{1..}}{n_{.}}\Big)^2&=\sum_i\frac{x_{1i.}^2}{n_i}-\frac{x_{1..}^2}{n_{.}}\\ &=5\Big(\frac{5}{5}-\frac{15}{15}\,\Big)^2+4\Big(\frac{4}{4}-\frac{15}{15}\,\Big)^2+6\Big(\frac{6}{6}-\frac{15}{15}\,\Big)^2=0. \end{aligned} $$

Hence the following table:

ANOVA Table

Variation due to

df

SS

MS

F-ratio

diets

2

0

0

0

residuals

12

34

  

total

14

34

  

Since the sum of squares due to the α i’s is null, the hypothesis is not rejected at any level.

Problem (2): MANOVA on the first two components. We are still using the notation X ij for the two and three-component cases since our general notation does not depend on the number of components in the vector concerned; as well, we can make use of the computations pertaining to the first component in Problem (1). The relevant quantities computed from the data on the first two components are the following:

In this case, the grand total, denoted by X .., and the grand average, denoted by \(\bar {X}\), are the following:

Note that the total sum of squares and cross products matrix can be written as follows:

$$\displaystyle \begin{aligned} \sum_{ij}(X_{ij}-\bar{X})(X_{ij}-\bar{X})'&=\sum_{j=1}^5(X_{1j}-\bar{X})(X_{1j}-\bar{X})'+\sum_{j=1}^4(X_{2j}-\bar{X})(X_{2j}-\bar{X})'\\ &\ \ \ \ +\sum_{j=1}^6(X_{3j}-\bar{X})(X_{3j}-\bar{X})'.\end{aligned} $$

Then,

Now, consider the residual sum of squares and cross products matrix:

$$\displaystyle \begin{aligned} \sum_{ij}(X_{ij}-\tfrac{X_{i.}}{n_i})&(X_{ij}-\tfrac{X_{i.}}{n_i})'=\sum_{j=1}^5(X_{1j}-\tfrac{X_{1.}}{n_1})(X_{1j}-\tfrac{X_{1.}}{n_1})'\\ &+\sum_{j=1}^4(X_{2j}-\tfrac{X_{2.}}{n_2})(X_{2j}-\tfrac{X_{2.}}{n_2})'+\sum_{j=1}^6(X_{3j}-\tfrac{X_{3.}}{n_3})(X_{3j}-\tfrac{X_{3.}}{n_3})'.\end{aligned} $$

That is,

Hence,

Therefore, the observed w is given by

$$\displaystyle \begin{aligned}w=\frac{1120}{1491.73}=0.7508,\ \sqrt{w}=0.8665. \end{aligned}$$

This is the case p = 2, k = 3, that is, our special Case (8). Then, the observed value of

$$\displaystyle \begin{aligned}t_2=\tfrac{1-\sqrt{w}}{\sqrt{w}}=\frac{0.1335}{0.8665}=0.1540, \end{aligned}$$

and

$$\displaystyle \begin{aligned}\frac{n_{.}-1-p}{p}\ t_2=\frac{15-1-2}{2}(0.1540)=0.9244. \end{aligned}$$

Our F-statistic is \(F_{2p,2(n_{.}-p-1)}=F_{4,24}\). Let us test the hypothesis A 1 = A 2 = A 3 = O at the 5% significance level or α = 0.05. Since the observed value 0.9244 < 5.77 = F 4,24,0.05 which is available from F-tables, we do not reject the hypothesis.

Verification of the calculations

Denoting the total sum of squares and cross products matrix by S t, the residual sum of squares and cross products matrix by S r and the sum of squares and cross products matrix due to the hypothesis or due to the effects A i’s by S h, we should have S t = S r + S h where

as previously determined. Let us compute

$$\displaystyle \begin{aligned}S_h=\sum_{i=1}^kn_i\Big(\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big)\Big(\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big)'.\end{aligned}$$

For the first two components, we already have the following:

Hence,

As \(34+\frac {164}{15}=\frac {674}{15}\), S t = S r + S h, that is,

Thus, the result is verified.

Problem (3): Data on all the three variables. In this case, we have p = 3, k = 3. We will first use \(X_{1.},\bar {X}_1,X_{2.},\bar {X}_2,X_{3.},\bar {X}_3,X_{..}\) and \(\bar {X}\) which have already been evaluated, to compute the residual sum of squares and cross product matrix. Since all the matrices are symmetric, for convenience, we will only display the diagonal elements and those above the diagonal. As in the case of two components, we compute the following, making use of the calculations already done for the 2-component case (the notations remaining the same since our general notation does not involve p):

Then,

whose determinant is equal to

The total sum of squares and cross products matrix is the following:

$$\displaystyle \begin{aligned} \sum_{ij}(X_{ij}-\bar{X})(X_{ij}-\bar{X})'&=\sum_{j=1}^5(X_{1j}-\bar{X})(X_{1j}-\bar{X})'+\sum_{j=1}^4(X_{2j}-\bar{X})(X_{2j}-\bar{X})'\\ &\ \ \ \ +\sum_{j=1}^6(X_{3j}-\bar{X})(X_{3j}-\bar{X})',\end{aligned} $$

with

Hence the total sum of squares and cross products matrix is

Then, the observed value of

$$\displaystyle \begin{aligned}w=\frac{15870\times 15}{342126}=0.6958,\ \sqrt{w}=0.8341.\end{aligned}$$

Since p = 3 and k = 3, an exact distribution is available from our special Case (8) for \(t_2=\frac {1-\sqrt {w}}{\sqrt {w}}\) and an observed value of t 2 = 0.1989. Then,

$$\displaystyle \begin{aligned}\frac{n_{.}-1-p}{p}\,t_2=\frac{15-1-3}{3}\,t_2=\frac{11}{3}\,t_2\sim F_{2p,2(n_{.}-1-p)}=F_{6,22}.\end{aligned}$$

The critical value obtained from an F-table at the 5% significance level is F 6,22,.05 ≈ 3.85. Since the observed value of F 6,22 is \(\frac {11}{3}(0.1989)=0.7293<3.85\), the hypothesis A 1 = A 2 = A 3 = O is not rejected. It can also be verified that S t = S r + S h.

13.3.4. Asymptotic distribution of the λ-criterion

We can obtain an asymptotic real chisquare distribution for n . →. To this end, consider the general h-th moments of λ or E[λ h] from (13.3.8), that is,

$$\displaystyle \begin{aligned} E[\lambda^h]&=C_{p,k}\prod_{j=1}^p\Big[\varGamma\Big(\frac{n_{.}-k}{2}-\frac{j-1}{2}+\frac{n_{.}}{2}h\Big)\Big/\varGamma\Big(\frac{n_{.}-1}{2}-\frac{j-1}{2}+\frac{n_{.}}{2}h\Big)\Big]\\ &=C_{p,k}\prod_{j=1}^p\Big[\varGamma\Big(\frac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{k}{2}\Big)/ \varGamma\Big(\frac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{1}{2}\Big)\Big].\end{aligned} $$

Let us expand all the gamma functions in E[λ h] by using the first term in the asymptotic expansion of a gamma function or by making use of Stirling’s approximation formula, namely,

$$\displaystyle \begin{aligned} \varGamma(z+\delta)\approx \sqrt{(2\pi)}z^{z+\delta-\frac{1}{2}}\text{e}^{-z} {} \end{aligned} $$
(13.3.13)

for |z|→ when δ is a bounded quantity. Taking \(\frac {n_{.}}{2}\to \infty \) in the constant part and \(\frac {n_{.}}{2}(1+h)\to \infty \) in the part containing h, we have

$$\displaystyle \begin{aligned} \varGamma\Big(\frac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{k}{2}\Big)&\Big/\varGamma\Big(\frac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{1}{2})\\ &\approx \big\{\sqrt{(2\pi)}[\tfrac{n_{.}}{2}(1+h)]^{\frac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{k}{2}-\frac{1}{2}}\text{e}^{-\frac{n_{.}}{2}(1+h)}\\ &\ \ \ \ \ \ \, \big/ \sqrt{(2\pi)}[\tfrac{n_{.}}{2}(1+h)]^{\tfrac{n_{.}}{2}(1+h)-\frac{j-1}{2}-\frac{1}{2}-\frac{1}{2}}\text{e}^{\frac{n_{.}}{2}(1+h)}\big\}\\ &=(\tfrac{n_{.}}{2})^{-(\frac{k-1}{2})}(1+h)^{-(\frac{k-1}{2})}.\end{aligned} $$

The factor \((\tfrac {n_{.}}{2})^{-(\frac {k-1}{2})}\) is canceled from the expression coming from the constant part. Then, taking the product over j = 1, …, p, we have

$$\displaystyle \begin{aligned}\lambda^{h}\to (1+h)^{-p(k-1)/2}\ { or }\ \lambda^{-2h}\to (1-2h)^{-p(k-1)/2}{ for }1-2h>0,\end{aligned}$$

which is the moment generating function (mgf) of a real scalar chisquare with p(k − 1) degrees of freedom. Hence, we have the following result:

Theorem 13.3.1

Letting λ be the likelihood ratio criterion for testing the hypothesis H o : A 1 = A 2 = ⋯ = A k = O, the asymptotic distribution of \(-2\ln \lambda \) is a real chisquare random variable having p(k − 1) degrees of freedom as n . →∞, that is,

$$\displaystyle \begin{aligned} -2\ln\lambda\to\chi^2_{p(k-1)}\mathit{{ as }}n_{.}\to\infty. {} \end{aligned} $$
(13.3.14)

Observe that we only require the sum of the sample sizes n 1 + ⋯ + n k = n . to go to infinity, and not that the individual n j’s be large. This chisquare approximation can be utilized for testing the hypothesis for large values of n ., and we then reject H o for small values of λ, which means for large values of \(-2\ln \lambda \) or large values of \(\chi ^2_{p(k-1)}\), that is, when the observed \(-2\ln \lambda \ge \chi ^2_{p(k-1),\alpha }\) where \(\chi ^2_{p(k-1),\alpha }\) denotes the upper 100 α% percentage point of the chisquare distribution.

13.3.5. MANOVA and testing the equality of population mean values

In a one-way classification model, we have the following for the p-variate case:

$$\displaystyle \begin{aligned} X_{ij}=M+A_i+E_{ij}\ { or }\ X_{ij}=M_i+E_{ij},\ \text{with}\ M_i=M+A_i, {} \end{aligned} $$
(13.3.15)

for j = 1, …, n i, i = 1, …, k. When the error vector is assumed to have a null expected value, that is, E[E ij] = O, for all i and j, we have E[X ij] = M i for all i and j. Thus, this assumption, in conjunction with the hypothesis A 1 = A 2 = ⋯ = A k = O, implies that M 1 = M 2 = ⋯ = M k, that is, the hypothesis of equality of the population mean value vectors or the test is equivalent to testing the equality of population mean value vectors in k independent populations with common covariance matrix Σ > O. We have already tackled this problem in Chap. 6 under both assumptions that Σ is known and unknown, when the populations are Gaussian, that is, X ij ∼ N p(M i, Σ), Σ > O. Thus, the hypothesis made in a one-way classification MANOVA setting and the hypothesis of testing the equality of mean value vectors in MANOVA are one and the same. In the scalar case too, the ANOVA in a one-way classification data coincides with testing the equality of population mean values in k independent univariate populations. In the ANOVA case, we are comparing the sum of squares attributable to the hypothesis to the residual sum of squares. If the hypothesis really holds true, then the sum of squares due to the hypothesis or to the α j’s (deviations from the general effect due to the j-th treatment) must be zero and hence for large values of the sum of squares due to the presence of the α j’s, as compared to the residual sum of squares, we reject the hypothesis. In MANOVA, we are comparing two sums of squares and cross product matrices, namely,

$$\displaystyle \begin{aligned}U=\sum_{ij}\Big[\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big]\Big[\frac{X_{i.}}{n_i}-\frac{X_{..}}{n_{.}}\Big]'{ and }V =\sum_{ij}\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]\Big[X_{ij}-\frac{X_{i.}}{n_i}\Big]'.\end{aligned}$$

We have the following distributional properties:

$$\displaystyle \begin{aligned} T_1&=(U+V)^{-\tfrac{1}{2}}U(U+V)^{-\tfrac{1}{2}}\sim { real matrix-variate type-1 beta with parameters}\\ &\qquad \qquad \qquad \qquad \qquad \qquad \quad \ (\tfrac{k-1}{2},\tfrac{n_{.}-k}{2});\\ T_2&=(U+V)^{-\tfrac{1}{2}}V(U+V)^{-\tfrac{1}{2}}\sim { real matrix-variate type-1 beta with parameters}\\ &\qquad \qquad \qquad \qquad \qquad \qquad \quad \ (\tfrac{n_{.}-k}{2},\tfrac{k-1}{2});\\ T_3&=V^{-\tfrac{1}{2}}UV^{-\tfrac{1}{2}}\sim { real matrix-variate type-2 beta with parameters }(\tfrac{k-1}{2},\tfrac{n_{.}-k}{2});\\ T_4&=U^{-\frac{1}{2}}VU^{-\frac{1}{2}}\sim { real matrix-variate type-2 beta with parameters }(\tfrac{n_{.}-k}{2},\tfrac{k-1}{2}).{} \end{aligned} $$
(13.3.16)

The likelihood ratio criterion is

$$\displaystyle \begin{aligned} \lambda=\frac{|V|}{|U+V|}=|T_2|=\frac{1}{|T_3+I|}=\frac{1}{\prod_{j=1}^p(1+\eta_j)} {} \end{aligned} $$
(13.3.17)

where the η j’s are the eigenvalues of T 3. We reject H o for small values of λ which means for large values of \(\prod _{j=1}^p[1+\eta _j]\). The basic objective in MANOVA consists of comparing U and V , the matrices due to the presence of treatment effects and due to the residuals, respectively. We can carry out this comparison by using the type-1 beta matrices T 1 and T 2 or the type-2 beta matrices T 3 and T 4 or by making use of the eigenvalues of these matrices. In the type-1 beta case, the eigenvalues will be between 0 and 1, whereas in the type-2 beta case, the eigenvalues will be real positive or simply positive. We may also note that the eigenvalues of T 1 and its nonsymmetric forms U(U + V )−1 or (U + V )−1 U are identical. Similarly, the eigenvalues of the symmetric form T 2 and V (U + V )−1 or (U + V )−1 V  are one and the same. As well, the eigenvalues of the symmetric form T 3 and the nonsymmetric forms UV −1 or V −1 U are the same. Again, the eigenvalues of the symmetric form T 4 and its nonsymmetric forms U −1 V  or VU −1 are the same. Several researchers have constructed tests based on the matrices T 1, T 2, T 3, T 4 or their nonsymmetric forms or their eigenvalues. Some of the well-known test statistics are the following:

$$\displaystyle \begin{aligned} {Lawley-Hotelling trace }&=\text{tr}(T_3)\\ {Roy's largest root }&={the largest eigenvalue of }T_2\\ {Pillai's trace }&=\text{tr}(T_1)\\ {Wilks' lambda }&=|T_2|={ the likelihood ratio statistic}.\end{aligned} $$

For example, when the hypothesis is true, we expect the eigenvalues of T 3 to be small and hence we may reject the hypothesis when its smallest eigenvalue is large or the trace of T 3 is large. If we are using T 4, then when the hypothesis is true, we expect T 4 to be large in the sense that the eigenvalues will be large, and therefore we may reject the hypothesis for small values of its largest eigenvalue or its trace. If we are utilizing T 1, we are actually comparing the contribution attributable to the treatments to the total variation. We expect this to be small under the hypothesis and hence, we may reject the hypothesis for large values of its smallest eigenvalue or its trace. If we are using T 2, we are comparing the residual part to the total variation. If the hypothesis is true, then we can expect a substantial contribution from the residual part so that we may reject the hypothesis for small values of the largest eigenvalue or the trace in this case. These are the main ideas in connection with constructing statistics for testing the hypothesis on the basis of the eigenvalues of the matrices T 1, T 2, T 3 and T 4.

13.3.6. When H o is rejected

When H o : A 1 = ⋯ = A k = O is rejected, it is plausible that some of the differences may be non-null, that is, A i − A jO for some i and j, ij. We may then test individual hypotheses of the type H o1 : A i = A j for ij. There are k(k − 1)∕2 such differences. This type of test is equivalent to testing the equality of the mean value vectors in two independent p-variate Gaussian populations with the same covariance matrix Σ > O. This has already been discussed in Chap. 6 for the cases Σ known and Σ unknown. In this instance, we can use the special Case (7) where for k = 2, and the statistic t 1 is real scalar type-2 beta distributed with the parameters \((\frac {p}{2}, \frac {n_{.}-1-p}{2})\), so that

$$\displaystyle \begin{aligned} \frac{n_{.}-1-p}{p}\,t_1\sim F_{p,n_{.}-1-p} {} \end{aligned} $$
(13.3.18)

where n . = n i + n j for some specific i and j. We can make use of (13.3.18) for testing individual hypotheses. By utilizing Special Case (8) for k = 3, we can also test a hypothesis of the type A i = A j = A m for different i, j, m. Instead of comparing the results of all the k(k − 1)∕2 individual hypotheses, we may examine the estimates of A i, namely, \(\hat {A}_i=\frac {X_{i.}}{n_i}-\frac {X_{..}}{n_{.}},\ i=1,\ldots ,k\). Consider the norms \(\Vert \frac {X_{i.}}{n_i}-\frac {X_{j.}}{n_j}\Vert ,\ i\ne j\) (the Euclidean norm may be taken for convenience). Start with the individual test corresponding to the maximum value of these norms. If this test is not rejected, it is likely that tests on all other differences will not be rejected either. If it is rejected, we then take the next largest difference and continue testing.

Note 13.3.1

Usually, before initiating a MANOVA, the assumption that the covariance matrices associated with the k populations or treatments are equal is tested. It may happen that the error variable E 1j, j = 1, …, n 1, may have the common covariance matrix Σ 1, E 2j, j = 1, …, n 2, may have the common covariance matrix Σ 2, and so on, where not all the Σ j’s equal. In this instance, we may first test the hypothesis H o : Σ 1 = Σ 2 = ⋯ = Σ k. This test is already described in Chap. 6. If this hypothesis is not rejected, we may carry out the MANOVA analysis of the data. If this hypothesis is rejected, then some of the Σ j’s may not be equal. In this case, we test individual hypotheses of the type Σ i = Σ j for some specific i and j , ij. Include all treatments for which the individual hypotheses are not rejected by the tests and exclude the data on the treatments whose Σ j’s may be different, but distinct from those already selected. Continue with the MANOVA analysis of the data on the treatments which are retained, that is, those for which the Σ j’s are equal in the sense that the corresponding tests of equality of covariance matrices did not reject the hypotheses.

Example 13.3.2

For the sake of illustration, test the hypothesis H o : A 1 = A 2 with the data provided in Example 13.3.1.

Solution 13.3.2

We can utilize some of the computations done in the solution to Example 13.3.1. Here, n 1 = 5, n 2 = 4 and n . = n 1 + n 2 = 9. We disregard the third sample. The residual sum of squares and cross products matrix in the present case is available from the Solution 13.3.1 by omitting the matrix corresponding to the third sample. Then,

$$\displaystyle \begin{aligned} \sum_{i=1}^2\sum_{j=1}^{n_i}\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)\Big(X_{ij}-\frac{X_{i.}}{n_i}\Big)'&=\left[\begin{array}{rrr}18&-1&-2\\ &18&2\\ &&4\end{array}\right]+\left[\begin{array}{rrr}8&-6&-6\\ &6&7\\ &&10\end{array}\right]\\ &=\left[\begin{array}{rrr}26&-7&-8\\ -7&24&9\\ -8&9&14\end{array}\right]\end{aligned} $$

whose determinant is

Let us compute \(\sum _{i=1}^2\sum _{j=1}^{n_i}(X_{ij}-\frac {X_{..}}{n_{.}})(X_{ij}-\frac {X_{..}}{n_{.}})^{\prime }\):

Hence the sum

whose determinant is

So, the observed values are as follows:

$$\displaystyle \begin{aligned} w&=\frac{|V|}{|U+V|}=\frac{5416}{8380.055}=0.6463\\ t_1&=\frac{1-w}{w}=\frac{0.3537}{0.6463}=0.5413\\ \frac{n_{.}-1-p}{p}t_1&=\frac{5}{3}t_1=\frac{5}{3}(0.5413)=0.9022,\end{aligned} $$

and \(F_{p,n_{.}-1-p}=F_{3,5}\). Let us test the hypothesis at the 5% significance level. The critical value obtained from F-tables is F 3,5,0.05 = 9.01. But since the observed value of F is 0.9022 < 9.01, the hypothesis is not rejected. We expected this result because the hypothesis A 1 = A 2 = A 3 was not rejected. This example was mainly presented to illustrate the steps.

13.4. MANOVA for Two-Way Classification Data

As was done previously for the one-way classification, we will revisit the real scalar variable case first. Thus, we consider the case of two sets of treatments, instead of the single set analyzed in Sect. 13.3. In an agricultural experiment, suppose that we are considering r fertilizers as the first set of treatments, say F 1, …, F r, along with a set of s different varieties of corn, V 1, …, V s, as the second set of treatments. A randomized block experiment belongs to this category. In this case, r blocks of land, which are homogeneous with respect to all factors that may affect the yield of corn, such as precipitation, fertility of the soil, exposure to sunlight, drainage, and so on, are selected. Fertilizers F 1, …, F r are applied to these r blocks at random, the first block receiving any one of F 1, …, F r, and so on. Each block is divided into s equivalent plots, all the plots being of the same size, shape, and so on. Then, the s varieties of corn are applied to each block at random, with one variety to each plot. Such an experiment is called a randomized block experiment. This experiment is then replicated t times. This replication is done so that possible interaction between fertilizers and varieties of corn could be tested. If the randomized block experiment is carried out only once, no interaction can be tested from such data because each plot will have only one observation. Interaction between the i-th fertilizer and j-th variety is a joint effect for the (F i, V j) combination, that is, the effect of F i on the yield varies with the variety of corn. For instance, an interaction will be present if the effect of F 1 is different when combined with V 1 or V 2. In other words, there are individual effects and joint effects, a joint effect being referred to as an interaction between the two sets of treatments. As an example, consider one set of treatments consisting of r different methods of teaching and a second set of treatments that could be s levels of previous exposure of the students to the subject matter.

13.4.1. The model in a two-way classification

The additive, fixed effect, two-way classification or two-way layout model with interaction is the following:

$$\displaystyle \begin{aligned} x_{ijk}=\mu+\alpha_i+\beta_j+\gamma_{ij}+e_{ijk},\ i=1,\ldots,r,\ j=1,\ldots,s,\ k=1,\ldots,t, {} \end{aligned} $$
(13.4.1)

where μ is a general effect, α i is the deviation from the general effect due to the i-th treatment of the first set, β j is the deviation from the general effect due to the j-th treatment of the second set, and γ ij is the effect due to interaction term or the joint effect of first and second sets of treatments. In a randomized block experiment, the treatments belonging to the first set are called “blocks” or “rows” and the treatments belonging to the second set are called “treatments” or “columns”; thus, the two sets correspond to rows, say R 1, …, R r, and columns, say C 1, …, C s. Then, γ ij is the deviation from the general effect due to the combination (R i, C j). The random component e ijk is the sum total contributions coming from all unknown factors and x ijk is the observation resulting from the effect of the combination of treatments (R i, C j) at the k-th replication or k-th identical repetition of the experiment. In an agricultural setting, the observation may be the yield of corn whereas, in a teaching experiment, the observation may be the grade obtained by the “(i, j, k)”-th student. In a fixed effect model, all parameters μ, α 1, …, α r, β 1, …, β s are assumed to be unknown constants. In a random effect model α 1, …, α r or β 1, …, β s or both sets are assumed to be random variables. We assume that E[e ijk] = 0 and Var(e ijk) = σ 2 > 0 for all i, j, k, where E(⋅) denotes the expected value of (⋅). In the present discussion, we will only consider the fixed effect model. Under this model, the data are called two-way classification data or two-way layout data because they can be classified according to the two sets of treatments, “rows” and “columns”. Since we are not making any assumption about the distribution of e ijk, and thereby that of x ijk, we will apply the method of least squares to estimate the parameters.

13.4.2. Estimation of parameters in a two-way classification

The error sum of squares is

$$\displaystyle \begin{aligned}e_{ijk}^2=\sum_{ijk}(x_{ijk}-\mu-\alpha_i-\beta_j-\gamma_{ij})^2. \end{aligned}$$

Our first objective consists of isolating the sum of squares due to interaction and test the hypothesis of no interaction, that is, H o : γ ij = 0 for all i, j and k. If γ ij≠0, part of the effect of the i-th row R i is mixed up with the interaction and similarly, part of the effect of the j-th column, C j, is intermingled with γ ij, so that no hypothesis can be tested on the α i’s and β j’s unless γ ij is zero or negligibly small or the hypothesis γ ij = 0 is not rejected. As well, on noting that in [μ + α i + β j + γ ij], the subscripts either appear none at a time, one at a time and both at a time, we may write μ + α i + β j + γ ij = m ij. Thus,

$$\displaystyle \begin{aligned} \sum_{ijk}e_{ijk}^2&=\sum_{ijk}(x_{ijk}-m_{ij})^2\Rightarrow \frac{\partial}{\partial m_{ij}}[e_{ijk}^2]=0\\ &\Rightarrow \sum_k(x_{ijk}-m_{ij})=0\Rightarrow x_{ij.}-t\,\hat{m}_{ij}=0{ or }\hat{m}_{ij}=\frac{x_{ij.}}{t}.\end{aligned} $$

We employ the standard notation in this area, namely that a summation over a subscript is denoted by a dot. Then, the least squares minimum under the general model or the residual sum of squares, denoted by s 2, is given by

$$\displaystyle \begin{aligned} s^2=\sum_{ijk}\Big(x_{ijk}-\frac{x_{ij.}}{t}\Big)^2. {} \end{aligned} $$
(13.4.2)

Now, consider the hypothesis H o : γ ij = 0 for all i and j. Under this H o, the model becomes

$$\displaystyle \begin{aligned}x_{ijk}=\mu+\alpha_i+\beta_j+e_{ijk}{ or }\sum_{ijk}e_{ijk}^2=\sum_{ijk}(x_{ijk}-\mu-\alpha_i-\beta_j)^2.\end{aligned}$$

We differentiate this partially with respect to μ and α i for a specific i, and to β j for a specific j, and then equate the results to zero and solve to obtain estimates for μ, α i and β j. Since we have taken α i, β j and γ ij as deviations from the general effect μ, we may let α . = α 1 + ⋯ + α r = 0, β . = β 1 + ⋯ + β s = 0 and γ i. = 0, for each i and γ .j = 0 for each j, without any loss of generality. Then,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \mu}[e_{ijk}^2]=0&\Rightarrow \Big(\sum_{ijk} x_{ijk}\Big)-rst\mu-st\alpha_{.}-rt\beta_{.}=0\Rightarrow \hat{\mu}=\frac{x_{...}}{rst}\\ \frac{\partial}{\partial \alpha_i}[e_{ijk}^2]=0&\Rightarrow \sum_{jk}[x_{ijk}-\mu-\alpha_i-\beta_j]=0\\ &\Rightarrow x_{i..}-st\mu-st\alpha_i-t\beta_{.}=0\Rightarrow \hat{\alpha}_i=\frac{x_{i..}}{st}-\hat{\mu}\\ \frac{\partial}{\partial\beta_j}[e_{ijk}^2]=0&\Rightarrow \sum_{ik}[x_{ijk}-\mu-\alpha_i-\beta_j]=0\\ &\Rightarrow x_{.j.}-rt\mu-t\alpha_{.}-rt\beta_j=0\Rightarrow \hat{\beta}_j=\frac{x_{.j.}}{rt}-\hat{\mu}\ .\end{aligned} $$

Hence, the least squares minimum under the hypothesis H o, denoted by \(s_0^2\), is

$$\displaystyle \begin{aligned} s_0^2&=\sum_{ijk}\Big[\Big(x_{ijk}-\frac{x_{...}}{rst}\Big)-\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)-\Big(\frac{x_{.j.}}{rt}-\frac{x_{...}}{rst}\Big)\Big]^2\\ &=\sum_{ijk}\Big(x_{ijk}-\frac{x_{...}}{rst}\Big)^2-st\sum_i\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)^2-rt\sum_j\Big(\frac{x_{.j.}}{rt}-\frac{x_{...}}{rst}\Big)^2, \end{aligned} $$

the simplifications resulting from properties of summations with respect to subscripts. Thus, the sum of squares due to the hypothesis H o : γ ij = 0 for all i and j or the interaction sum of squares, denoted by \(s^2_{\gamma }\) is the following:

$$\displaystyle \begin{aligned} s_{\gamma}^2=s_0^2-s^2=\sum_{ijk}&\Big(x_{ijk}-\frac{x_{...}}{rst}\Big)^2-st\sum_i\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)^2\\ &-rt\sum_j\Big(\frac{x_{.j.}}{rt}-\frac{x_{...}}{rst}\Big)^2-\sum_{ijk}\Big(x_{ijk}-\frac{x_{ij.}}{t}\Big)^2,\end{aligned} $$

and since

$$\displaystyle \begin{aligned}\sum_{ijk}\Big(x_{ijk}-\frac{x_{ij.}}{t}\Big)^2=\sum_{ijk}\Big(x_{ijk}-\frac{x_{...}}{rst}\Big)^2-t\sum_{ij}\Big(\frac{x_{ij.}}{t}-\frac{x_{...}}{rst}\Big)^2,\end{aligned}$$

the sum of squares due to the hypothesis or attributable to the γ ij’s, that is, due to interaction is

$$\displaystyle \begin{aligned}s^2_{\gamma}=t\sum_{ij}\Big(\frac{x_{ij.}}{t}-\frac{x_{...}}{rst}\Big)^2-st\sum_i\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)^2 -rt\sum_j\Big(\frac{x_{.j.}}{rt}-\frac{x_{...}}{rst}\Big)^2.{}\end{aligned} $$
(13.4.3)

If the hypothesis γ ij = 0 is not rejected, the effects of the γ ij’s are deemed insignificant and then, setting the hypothesis γ ij = 0, α i = 0, i = 1, …, r, we obtain the sum of squares due to the α i’s or sum of squares due to the rows denoted as \(s^2_r \), is

$$\displaystyle \begin{aligned} s^2_r=\sum_{ijk}\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)^2=st\sum_{i=1}^r\Big(\frac{x_{i..}}{st}-\frac{x_{...}}{rst}\Big)^2. {} \end{aligned} $$
(13.4.4)

Similarly, the sum of squares attributable to the β j’s or due to the columns, denoted as \(s^2_c\), is

$$\displaystyle \begin{aligned} s^2_c=rt\sum_{j=1}^s\Big(\frac{x_{.j.}}{rt}-\frac{x_{...}}{rst}\Big)^2. {} \end{aligned} $$
(13.4.5)

Observe that the sum of squares due to rows plus the sum of squares due to columns, once added to the interaction sum of squares, is the subtotal sum of squares, denoted by \(s^2_{rc} =t\sum _{ij}\big (\frac {x_{ij.}}{t}-\frac {x_{...}}{rst}\big )^2\) or this subtotal sum of squares is partitioned into the sum of squares due to the rows, due to the columns and due to interaction. This is equivalent to an ANOVA on the subtotals ∑k x ijk or an ANOVA on a two-way classification with a single observation per cell. As has been pointed out, in that case, we cannot test for interaction, and moreover, this subtotal sum of squares plus the residual sum of squares is the grand total sum of squares. If we assume a normal distribution for the error terms, that is, \(e_{ijk}\overset {iid}{\sim }N_1(0,\sigma ^2),\ \sigma ^2>0\), for all i, j, k, then under the hypothesis H o : γ ij = 0, it can be shown that

$$\displaystyle \begin{aligned} \frac{s^2_{\gamma}}{\sigma^2}\sim\chi^2_{\nu},\ \ \nu =(rs-1)-(r-1)-(s-1)=(r-1)(s-1), {} \end{aligned} $$
(13.4.6)

and the residual variation s 2 has the following distribution whether H o holds or not:

$$\displaystyle \begin{aligned} \frac{s^2}{\sigma^2}\sim \chi^2_{\nu_1}, \ \ \nu_1=rst-1-(rs-1)=rs(t-1), {} \end{aligned} $$
(13.4.7)

where \(s^2_{\gamma }\) and s 2 are independently distributed. Then, under the hypothesis γ ij = 0 for all i and j or when this hypothesis is not rejected, it can be established that

$$\displaystyle \begin{aligned} \frac{s_r^2}{\sigma^2}\sim\chi^2_{r-1},\ \,\frac{s_c^2}{\sigma^2}\sim \chi^2_{s-1} {} \end{aligned} $$
(13.4.8)

and \(s_r^2\) and s 2 as well as \(s_c^2\) and s 2 are independently distributed whenever H o : γ ij = 0 is not rejected. Hence, under the hypothesis,

$$\displaystyle \begin{aligned} \frac{s^2_{\gamma}/(r-1)(s-1)}{s^2/(rs(t-1))}\sim F_{\nu,\,\nu_1},\,\ \nu=(r-1)(s-1),\,\nu_1=rs(t-1). {} \end{aligned} $$
(13.4.9)

The total sum of squares is \(\sum _{ijk}\big (x_{ijk}-\frac {x_{...}}{rst}\big )^2\). Thus, the first decomposition and the first part of ANOVA in this two-way classification scheme is the following:

$$\displaystyle \begin{aligned}{Total variation }={Variation due to the subtotals }+{Residual variation, }\end{aligned}$$

the second stage being

$$\displaystyle \begin{aligned} &{Variation due to the subtotals }={Variation due to the rows }\\&\qquad \qquad \qquad \qquad +{Variation due to the columns } +{Variation due to interaction},\end{aligned} $$

and the resulting ANOVA table is the following:

ANOVA Table for the Two-Way Classification

 

df

SS

MS

Variation due to

(1)

(2)

(3)=(2)/(1)

rows

r − 1

\(s^2_r=st\sum _{i=1}^r(\frac {x_{i..}}{st}-\frac {x_{...}}{rst})^2\)

\(s_r^2/(r-1)=D_1\)

columns

s − 1

\(s^2_c=rt\sum _{j=1}^s(\frac {x_{.j.}}{rt}-\frac {x_{...}}{rst})^2\)

\(s^2_c/(s-1)=D_2\)

interaction

(r − 1)(s − 1)

\(s^2_{\gamma }\)

\(s_{\gamma }^2/(r-1)(s-1)=D_3\)

subtotal

rs − 1

\(t\sum _{ij}(\frac {x_{ij.}}{t}-\frac {x_{...}}{rst})^2\)

 

residuals

rs(t − 1)

s 2

s 2∕[rs(t − 1)] = D

total

rst − 1

\(\sum _{ijk}(x_{ijk}-\frac {x_{...}}{rst})^2\)

 

where df designates the number of degrees of freedom, SS means sum of squares, MS stands for mean squares, the expressions for the residual sum of squares is given in (13.4.2), that for the interaction in (13.4.3), that for the rows in (13.4.4) and that for columns in (13.4.5), respectively. Note that we test the hypothesis on the α i’s and β j’s or row effects and column effects, only if the hypothesis γ ij = 0 is not rejected; otherwise there is no point in testing hypotheses on the α i’s and β j’s because they are confounded with the γ ij’s.

13.5. Multivariate Extension of the Two-Way Layout

Instead of a single real scalar variable being studied, we consider a p × 1 vector of real scalar variables. The multivariate two-way classification, the fixed effect model is the following:

$$\displaystyle \begin{aligned} X_{ijk}=M+A_i+B_j+\varGamma_{ij}+E_{ijk}, {} \end{aligned} $$
(13.5.1)

for i = 1, …, r, j = 1, …, s , k = 1, …, t, where M, A i, B j, Γ ij and E ijk are all p × 1 vectors. In this case, M is a general effect, A i is the deviation from the general effect due to the i-th row, B j is the deviation from the general effect due to the j-th column, Γ ij is the deviation from the general effect due to interaction between the rows and the columns and E ijk is the vector of the random or error component. For convenience, the two sets of treatments are referred to as rows and columns, the first set as rows and the second, as columns. In a two-way layout, two sets of treatments are tested. As in the scalar case of Sect. 13.4, we can assume, without any loss of generality, that \(\sum _iA_i=A_1+\cdots +A_r=A_{.}=O,\ B_{.}=O,\ \sum _{i=1}^r\varGamma _{ij}=\varGamma _{.j}=O \) and \( \sum _{j=1}^s\varGamma _{ij}=\varGamma _{i.}=O\). At this juncture, the procedures are parallel to those developed in Sect. 13.4 for the real scalar variable case. Instead of sums of squares, we now have sums of squares and cross products matrices. As before, we may write M ij = M + A i + B j + Γ ij. Then, the trace of the sum of squares and cross products error matrix \(E_{ijk}E_{ijk}^{\prime }\) is minimized. Using the vector derivative operator, we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial M_{ij}}\text{tr}\Big[\sum_{ijk}E_{ijk}E_{ijk}^{\prime}\Big]=O&\Rightarrow \sum_k(X_{ijk}-M_{ij})=O\\ &\Rightarrow \hat{M}_{ij}=\frac{1}{t}X_{ij.}\,,\end{aligned} $$

so that the residual sum of squares and cross products matrix, denoted by S res, is

$$\displaystyle \begin{aligned} S_{res}=\sum_{ijk}\Big(X_{ijk}-\frac{X_{ij.}}{t}\Big)\Big(X_{ijk}-\frac{X_{ij.}}{t}\Big)'. {} \end{aligned} $$
(13.5.2)

All other derivations are analogous to those provided in the real scalar case. The sum of squares and cross products matrix due to interaction, denoted by S int is the following:

$$\displaystyle \begin{aligned} S_{int}&=t\sum_{ij}\Big(\frac{X_{ij.}}{t}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{ij.}}{t}-\frac{X_{...}}{rst}\Big)'\\ &\ \ \ \ -st\sum_i\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)'\\ &\ \ \ \ -rt\sum_j\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)'.{} \end{aligned} $$
(13.5.3)

The sum of squares and cross products matrices due to the rows and columns are respectively given by

$$\displaystyle \begin{aligned} S_{row}&=st\sum_{i=1}^r\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)',{} \end{aligned} $$
(13.5.4)
$$\displaystyle \begin{aligned} S_{col}&=rt\sum_{j=1}^s\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)'.{} \end{aligned} $$
(13.5.5)

The sum of squares and cross products matrix for the subtotal is denoted by S sub = S row + S col + S int. The total sum of squares and cross products matrix, denoted by S tot, is the following:

$$\displaystyle \begin{aligned} S_{tot}=\sum_{ijk}\Big(X_{ijk}-\frac{X_{...}}{rst}\Big)\Big(X_{ijk}-\frac{X_{...}}{rst}\Big)'. {} \end{aligned} $$
(13.5.6)

We may now construct the MANOVA table. The following abbreviations are used: df stands for degrees of freedom of the corresponding Wishart matrix, SSP means the sum of squares and cross products matrix, MS stands for mean squares and is equal to SSP/df, and S row, S col, S res and S tot are respectively specified in (13.5.4), (13.5.5), (13.5.2) and (13.5.6).

MANOVA Table for a Two-Way Layout

 

df

SSP

MS

Variation due to

(1)

(2)

(3)=(2)/(1)

rows

r − 1

S row

S row∕(r − 1)

columns

s − 1

S col

S col∕(s − 1)

interaction

(r − 1)(s − 1)

S int

S int∕[(r − 1)(s − 1)]

subtotal

rs − 1

S sub

 

residuals

rs(t − 1)

S res

S res∕[rs(t − 1)]

total

rst − 1

S tot

 

13.5.1. Likelihood ratio test for multivariate two-way layout

Under the assumption that the error or random components \(E_{ijk}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O\) for all i, j and k, the exponential part of the multivariate normal density excluding \(-\frac {1}{2}\) is obtained as follows:

$$\displaystyle \begin{aligned} E_{ijk}^{\prime}E_{ijk}&=(X_{ijk}-M-A_i-B_j-\varGamma_{ij})'\varSigma^{-1}(X_{ijk}-M-A_i-B_j-\varGamma_{ij})\\ &=\text{tr}[\varSigma^{-1}(X_{ijk}-M-A_i-B_j-\varGamma_{ij})(X_{ijk}-M-A_i-B_j-\varGamma_{ij})']\Rightarrow\\ \sum_{ijk}E_{ijk}^{\prime}E_{ijk}&=\text{tr}\Big\{\varSigma^{-1}\Big[\sum_{ijk}(X_{ijk}-M-A_i-B_j-\varGamma_{ij})(X_{ijk}-M-A_i-B_j-\varGamma_{ij})'\Big]\Big\}.\end{aligned} $$

Thus, the joint density of all the X ijk’s, denoted by L, is

$$\displaystyle \begin{aligned} L&=\frac{1}{(2\pi)^{\frac{prst}{2}}|\varSigma|{}^{\frac{rst}{2}}}\\ &\ \ \ \ \times\text{e}^{-\frac{1}{2}\text{tr}[\varSigma^{-1}\sum_{ijk}(X_{ijk}-M-A_i-B_j-\varGamma_{ij})(X_{ijk}-M-A_i-B_j-\varGamma_{ij})']}.\end{aligned} $$

The maximum likelihood estimates of M, A i, B j and Γ ij are the same as the least squares estimates and hence, the maximum likelihood estimator (MLE) of Σ is the least squares minimum which is the residual sum of squares and cross products matrix S or S res (in the present notation), where

$$\displaystyle \begin{aligned} S_{res}&=\sum_{ijk}(X_{ijk}-\hat{M}-\hat{A}_i-\hat{B}_j-\hat{\varGamma}_{ij})(X_{ijk}-\hat{M}-\hat{A}_i-\hat{B}_j-\hat{\varGamma}_{ij})'\\ &=\sum_{ijk}\Big(X_{ijk}-\frac{X_{ij.}}{t}\Big)\Big(X_{ijk}-\frac{X_{ij.}}{t}\Big)'.{} \end{aligned} $$
(13.5.7)

This is the sample sum of squares and the cross products matrix under the general model and its determinant raised to the power of \(\frac {rst}{2}\) is the quantity appearing in the numerator of the likelihood ratio criterion λ. Consider the hypothesis H o : Γ ij = O for all i and j. Then, under this hypothesis, the estimator of Σ is S 0, where

$$\displaystyle \begin{aligned} S_0&=\sum_{ijk}\Big[\Big(X_{ijk}-\frac{X_{...}}{rst}\Big)-\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)-\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)\Big]\\ &\ \ \ \ \ \, \times\Big[\Big(X_{ijk}-\frac{X_{...}}{rst}\Big)-\Big(\frac{X_{i..}}{st}-\frac{X_{...}}{rst}\Big)-\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)\Big]'\end{aligned} $$

and \(|S_0|{ }^{\frac {rst}{2}}\) is the quantity appearing in the denominator of λ. However, S 0 − S res = S int is the sum of squares and cross products matrix due to the interaction terms Γ ij’s or to the hypothesis, so that S 0 = S res + S int. Therefore, λ is given by

$$\displaystyle \begin{aligned} \lambda=\frac{|S_{res}|{}^{\frac{rst}{2}}}{|S_{res}+S_{int}|{}^{\frac{rst}{2}}} {} \end{aligned} $$
(13.5.8)

Letting \(w=\lambda ^{\frac {2}{rst}}\),

$$\displaystyle \begin{aligned} w=\frac{|S_{res}|}{|S_{res}+S_{int}|}. {} \end{aligned} $$
(13.5.9)

It follows from results derived in Chap. 5 that S res ∼ W p(rs(t − 1), Σ), S int ∼ W p((r − 1)(s − 1), Σ) under the hypothesis and S res and S int are independently distributed and hence, under H o,

$$\displaystyle \begin{aligned} W=(S_{res}+S_{int})^{-\frac{1}{2}}S_{res}(S_{res}+S_{int})^{-\frac{1}{2}}\sim { real {\$p\$}-variate type-1 beta random variable} \end{aligned}$$

with the parameters \((\frac {rs(t-1)}{2}, \frac {(r-1)(s-1)}{2})\). As well,

$$\displaystyle \begin{aligned}W_1=S_{res}^{-\frac{1}{2}}\,S_{int}\,S_{res}^{-\frac{1}{2}}\sim { real p-variate type-2 beta random variable}\end{aligned}$$

with the parameters \((\frac {(r-1)(s-1)}{2},\frac {rs(t-1)}{2})\). Under H o, the h-th arbitrary moments of w and λ, which are readily obtained from those of a real matrix-variate type-1 beta variable, are

$$\displaystyle \begin{aligned} E[w]^h&=\Big\{\prod_{j=1}^p\frac{\varGamma(\nu_1+\nu_2-\frac{j-1}{2})}{\varGamma(\nu_1-\frac{j-1}{2})}\Big\} \Big\{\prod_{j=1}^p\frac{\varGamma(\nu_1+h-\frac{j-1}{2})}{\varGamma(\nu_1+\nu_2+h-\frac{j-1}{2})}\Big\}{} \end{aligned} $$
(13.5.10)
$$\displaystyle \begin{aligned} E[\lambda]^h&=\Big\{\prod_{j=1}^p\frac{\varGamma(\nu_1+\nu_2-\frac{j-1}{2})}{\varGamma(\nu_1-\frac{j-1}{2})}\Big\} \Big\{\prod_{j=1}^p\frac{\varGamma(\nu_1+\frac{rst}{2}h-\frac{j-1}{2})}{\varGamma(\nu_1+\nu_2+\frac{rst}{2}h-\frac{j-1}{2})}\Big\} {} \end{aligned} $$
(13.5.11)

where \(\nu _1=\frac {rs(t-1)}{2} \) and \( \nu _2=\frac {(r-1)(s-1)}{2}\). Note that we reject the null hypothesis H o : Γ ij = O, i = 1, …, r, j = 1, …, s, for small values of w and λ. As explained in Sect. 13.3, the exact general density of w in (13.5.10) can be expressed in terms of a G-function and the exact general density of λ in (13.5.11) can be written in terms of a H-function. For the theory and applications of the G-function and the H-function, the reader may respectively refer to Mathai (1993) and Mathai et al. (2010).

13.5.2. Asymptotic distribution of λ in the MANOVA two-way layout

Consider the arbitrary h-th moment specified in (13.5.11). On expanding all the gamma functions for large values of rst in the constant part and for large values of rst(1 + h) in the functional part by applying Stirling’s formula or using the first term in the asymptotic expansion of a gamma function referring to (13.3.13), it can be verified that the h-th moment of λ behaves asymptotically as follows:

$$\displaystyle \begin{aligned} \lambda^h\to (1+h)^{-\frac{p(r-1)(s-1)}{2}}\Rightarrow -2\ln \lambda\to\chi^2_{p(r-1)(s-1)}{ as }rst\to\infty. {} \end{aligned} $$
(13.5.12)

Thus, for large values of rst, one can utilize this real scalar chisquare approximation for testing the hypothesis H o : Γ ij = O for all i and j. We can work out a large number of exact distributions of w of (13.5.10) for special values of r, s, t, p. Observe that

$$\displaystyle \begin{aligned} E[w^h]=C\prod_{j=1}^p\frac{\varGamma(\frac{rs(t-1)}{2}-\frac{j-1}{2}+h)}{\varGamma(\frac{rs(t-1)}{2}+\frac{(r-1)(s-1)}{2}-\frac{j-1}{2}+h)} {} \end{aligned} $$
(13.5.13)

where C is the normalizing constant such that when h = 0, E[w h] = 1. Thus, when (r − 1)(s − 1) is a positive integer or when r or s is odd, the gamma functions cancel out, leaving a number of factors in the denominator which can be written as a sum by applying the partial fractions technique. For small values of p, the exact density will then be expressible as a sum involving only a few terms. For larger values of p, there will be repeated factors in the denominator, which complicates matters.

13.5.3. Exact densities of w in some special cases

We will consider several special cases of the h-th moment of w as given in (13.5.13).

Case (1): p = 1. In this case, h-th moment becomes

$$\displaystyle \begin{aligned}E[w^h]=C_1\frac{\varGamma(\frac{rs(t-1)}{2}+h)}{\varGamma(\frac{rs(t-1)}{2}+\frac{(r-1)(s-1)}{2}+h)}\end{aligned}$$

where C 1 is the associated normalizing constant. This is the h-th moment of a real scalar type-1 beta random variable with the parameters \((\frac {rs(t-1)}{2},\frac {(r-1)(s-1)}{2})\). Hence \(y=\frac {1-w}{w}\) is a real scalar type-2 beta with parameters \((\frac {(r-1)(s-1)}{2}, \frac {rs(t-1)}{2})\), and

$$\displaystyle \begin{aligned}\frac{rs(t-1)}{(r-1)(s-1)}\, y\sim F_{(r-1)(s-1),rs(t-1)}.\end{aligned}$$

Accordingly, the test can be carried out by using this F-statistic. One would reject the null hypothesis H o : Γ ij = O if the observed F ≥ F (r−1)(s−1),rs(t−1),α where F (r−1)(s−1),rs(t−1),α is the upper 100 α% percentile of this F-density. For example, for r = 2, s = 3, t = 3 and α = 0.05, we have F 2,12,0.05 = 19.4 from F-tables so that H o would be rejected if the observed value of F 2,12 ≥ 19.4 at the specified significance level.

Case (2): p = 2. In this case, we have a ratio of two gamma functions differing by \(\frac {1}{2}\). Combining the gamma functions in the numerator and in the denominator by using the duplication formula and proceeding as in Sect. 13.3 for the one-way layout, the statistic \(t_1=\frac {1-\sqrt {w}}{\sqrt {w}}\), and we have

$$\displaystyle \begin{aligned}\frac{rs(t-1)}{(r-1)(s-1)}\,t_1\sim F_{2(r-1)(s-1),2(rs(t-1)-1)}, \end{aligned}$$

so that the decision can be made as in Case (1).

Case (3): (r − 1)(s − 1) = 1 ⇒ r = 2, s = 2. In this case, all the gamma functions in (13.3.13) cancel out except the last one in the numerator and the first one in the denominator. This gamma ratio is that of a real scalar type-1 beta random variable with the parameters \((\frac {rs(t-1)+1-p}{2},\frac {p}{2})\), and hence \(y=\frac {1-w}{w}\) is a real scalar type-2 beta so that

$$\displaystyle \begin{aligned}\frac{rs(t-1)+1-p}{p}\,y\sim F_{p,rs(t-1)+1-p}, \end{aligned}$$

and decision can be made by making use of this F distribution as in Case (1).

Case(4): (r − 1)(s − 1) = 2. In this case,

$$\displaystyle \begin{aligned}E[w^h]=C_1\prod_{j=1}^p\frac{1}{\frac{rs(t-1)}{2}-\frac{j-1}{2}+h}\end{aligned}$$

with the corresponding normalizing constant C 1. This product of p factors can be expressed as a sum by using partial fractions. That is,

$$\displaystyle \begin{aligned} E[w^h]=C_1\sum_{j=0}^{p-1}\frac{b_j}{a+h-\frac{j}{2}} \end{aligned}$$
(i)

where

$$\displaystyle \begin{aligned} b_j&=\lim_{a+h\to\tfrac{j}{2}}[(a+h)(a+h-\tfrac{1}{2})\cdots (a+h-\tfrac{j-1}{2})(a+h-\tfrac{j+1}{2})\cdots (a+h-\tfrac{p-1}{2})], \end{aligned} $$
(ii)
$$\displaystyle \begin{aligned} a&=\frac{rs(t-1)}{2}.\end{aligned} $$

Thus, the density of w, denoted by f w(w), which is available from (i) and (ii), is the following:

$$\displaystyle \begin{aligned}f_w(w)=C_1\sum_{j=0}^{p-1}b_jw^{a-\frac{j}{2}-1},\ 0\le w\le 1,\end{aligned}$$

and zero elsewhere. Some additional special cases could be worked out but the expressions would become complicated. For large values of rst, one can apply the asymptotic chisquare result given in (13.5.12) for testing the hypothesis H o : Γ ij = O.

Example 13.5.1

An experiment is conducted among heart patients to stabilize their systolic pressure, diastolic pressure and heart rate or pulse around the standard numbers which are 120, 80 and 60, respectively. A random sample of 24 patients who may be considered homogeneous with respect to all factors of variation, such as age, weight group, race, gender, dietary habits, and so on, are selected. These 24 individuals are randomly divided into two groups of equal size. One group of 12 subjects are given the medication combination Med-1 and the other 12 are administered the medication combination Med-2. Then, the Med-1 group is randomly divided into three subgroups of 4 subjects. These subgroups are assigned exercise routines Ex-1, Ex-2, Ex-3. Similarly, the Med-2 group is also divided at random into 3 subgroups of 4 individuals who are respectively subjected to exercise routines Ex-1, Ex-2, Ex-3. After one week, the following observations are made x 1 =  current reading on systolic pressure minus 120, x 2 =  current reading on diastolic pressure minus 80, x 3 =  current reading on heart rate minus 60. The structure of the two-way data layout is as follows:

Let X ijk be the k-th vector in the i-the row (i-th medication) and j-th column (j-th exercise routine). For convenience, the data are presented in matrix form:

$$\displaystyle \begin{aligned}A_{11}&=[X_{111},X_{112},X_{113},X_{114}],\ A_{12}=[X_{121},X_{122},X_{123},X_{124}],\\ A_{13}&=[X_{131},X_{132},X_{133},X_{134}],\,\, A_{21}=[A_{211},A_{212},A_{213},A_{214}],\\ A_{22}&=[A_{221},A_{222},A_{223},A_{224}],\,\ \, A_{23}=[X_{231},X_{232},X_{233},X_{234}];\end{aligned} $$
$$\displaystyle \begin{aligned} A_{11}&=\begin{array}{rrrr}-2&3&2&5\\ 1&-1&1&-1\\ 2&-1&-1&0\end{array},~\ A_{12}=\begin{array}{rrrr}1&4&-1&4\\ -2&-2&-3&3\\ 3&-2&-1&0\end{array},~\ A_{13}=\,\ \begin{array}{rrrr}4&-3&3&4\\ 2&-3&2&3\\ 1&-1&1&-1\end{array},\\ A_{21}&=\ \ \begin{array}{rrrr}2&0&-2&0\\ 1&4&1&2\\ 2&1&-1&-2\end{array},~\ \ \ \ A_{22}=\ \ \begin{array}{rrrr}3&-1&-1&3\\ 4&4&0&0\\ 0&1&-1&4\end{array},~\ A_{23}=\begin{array}{rrrr}-2&-1&0&-1\\ 1&4&0&3\\ -2&0&-2&0\end{array}.\end{aligned} $$

(1) Perform a two-way ANOVA on the first component, namely, x 1, the current reading minus 120; (2) Carry out a MANOVA on the full data.

Solution 13.5.1

We need the following quantities:

By using the first elements in all these vectors, we will carry out a two-way ANOVA and answer the first question. Since these are all observations on real scalar variables, we will utilize lower-case letters to indicate scalar quantities. Thus, we have the following values:

$$\displaystyle \begin{aligned} s^2_{tot}&=\sum_{ij}(x_{ijk}-\bar{x})^2=(-2-1)^2+(3-1)^2+\cdots+(-1-1)^2=136,\\ s^2_{row}&=12\sum_{i=1}^2\Big(\frac{x_{i..}}{12}-1\Big)^2=12[(2-1)^2+(0-1)^2]=24,\\ s^2_{col}&=8\sum_{j=1}^3\Big(\frac{x_{.j.}}{8}-1\Big)^2=8\Big[(1-1)^2+\Big(\frac{3}{2}-1\Big)^2+\Big(\frac{1}{2}-1\Big)^2\Big]=4,\\ s^2_{int}&=4\sum_{ij}\Big(\frac{x_{ij.}}{4}-\frac{x_{i..}}{12}-\frac{x_{.j.}}{8}+1\Big)^2\\ &=4\Big[\Big(\frac{8}{4}-\frac{24}{12}-\frac{8}{8}+1\Big)^2+\cdots+\Big(-\frac{4}{4}-0-\frac{4}{8}+1\Big)^2\Big]=4,\\ s^2_{sub}&=4\sum_{ij}\Big(\frac{x_{ij.}}{4}-\frac{x_{...}}{24}\Big)^2=4\Big[\Big(\frac{8}{4}-1\Big)^2+\cdots+\Big(-\frac{4}{4}-1\Big)\Big]=32,\end{aligned} $$
$$\displaystyle \begin{aligned} s^2_{res}&=\sum_{ijk}\Big(x_{ijk}-\frac{x_{ij.}}{4}\Big)^2=(-2-2)^2+\cdots+(-1+1)^2=104,\\ s^2_{tot}&=\sum_{ijk}\Big(x_{ijk}-\frac{x_{...}}{24}\Big)^2=(-2-1)^2+ (3-1)^2 +\cdots+(-1-1)^2=136.\end{aligned} $$

All quantities have been calculated separately in order to verify the computations. We could have obtained the interaction sum of squares from the subtotal sum of squares minus the sum of squares due to rows and columns. Similarly, we could have obtained the residual sum of squares from the total sum of squares minus the subtotal sum of squares. We will set up the ANOVA table, where, as usual, df stands for degrees of freedom, SS means sum of squares and MS denotes mean squares:

ANOVA Table for a Two-Way Layout with Interaction

 

df.

SS

MS

 

Variation due to

(1)

(2)

(3)=(2)/(1)

F-ratio

rows

1

24

24

24/5.78

columns

2

4

2

2/5.78

interaction

2

4

2

2/5.78

subtotal

5

32

  

residuals

18

104

5.78

 

total

23

136

  

For testing the hypothesis of no interaction, the F-value at the 5% significance level is F 2,18,0.05 ≈ 19. The observed value of this F 2,18 being \(\frac {2}{5.78}\approx 0.35<19\), the hypothesis of no interaction is not rejected. Thus, we can test for the significance of the row and column effects. Consider the hypothesis α 1 = α 2 = 0. Then under this hypothesis and no interaction hypothesis, the F-ratio for the row sum of squares is 24∕5.78 ≈ 4.15 < 240 = F 1,18,0.05, the tabulated value of F 1,18 at α = 0.05. Therefore, this hypothesis is not rejected. Now, consider the hypothesis β 1 = β 2 = β 3 = 0. Since under this hypothesis and the hypothesis of no interaction, the F-ratio for the column sum of squares is \(\frac {2}{5.78}=0.35<19=F_{2,18,0.05}\), it is not rejected either. Thus, the data show no significant interaction between exercise routine and medication, and no significant effect of the exercise routines or the two combinations of medications in bringing the systolic pressures closer to the standard value of 120.

We now carry out the computations needed to perform a MANOVA on the full data. We employ our standard notation by denoting vectors and matrices by capital letters. The sum of squares and cross products matrices for the rows and columns are the following, respectively denoted by S row and S col :

$$\displaystyle \begin{aligned} S_{col}&=rt\sum_{j=1}^3\Big(\frac{X_{.j.}}{rt}-\frac{X_{...}}{rst}\Big)\Big(\frac{X_{,j.}}{rt}-\frac{X_{...}}{rst}\Big)'\\ &=8\Big\{O+\frac{1}{4}\left[\begin{array}{rrr}1&-1&1\\ -1&1&-1\\ 1&-1&1\end{array}\right]+\frac{1}{4}\left[\begin{array}{rrr}1&-1&1\\ -1&1&-1\\ 1&-1&1\end{array}\right]\Big\}=\left[\begin{array}{rrr}4&-4&4\\ -4&4&-4\\ 4&-4&4\end{array}\right],\end{aligned} $$

We can verify the computations done so far as follows. The sum of squares and cross product matrices ought to be such that S row + S col + S int = S sub. These are

Hence the result is verified. Now, the total and residual sums of squares and cross product matrices are

and

Then,

The above results are included in the following MANOVA table where df means degrees of freedom, SSP denotes a sum of squares and cross products matrix and MS is equal to SSP divided by the corresponding degrees of freedom:

MANOVA Table for a Two-Way Layout with Interaction

 

df.

SSP

MS

Variation due to

(1)

(2)

(3)=(2)/(1)

rows

1

S row

S row

columns

2

S col

\(\frac {1}{2}S_{col}\)

interaction

2

S int

\( \frac {1}{2}S_{int}\)

subtotal

5

S sub

 

residuals

18

S res

\(\frac {1}{18}S_{res}\)

total

23

S tot

 

Then, the λ-criterion is

$$\displaystyle \begin{aligned}\lambda=\frac{|S_{res}|{}^{\frac{rst}{2}}}{|S_{res}+S_{int}|{}^{\frac{rst}{2}}}\ \Rightarrow\ w=\frac{|S_{res}|}{|S_{res}+S_{int}|}. \end{aligned}$$

The determinants are as follows:

Therefore,

$$\displaystyle \begin{aligned}w=\frac{363436}{409320}=0.888\ \Rightarrow\ \ln w=-0.118783\ \Rightarrow \ -2\ln \lambda =24(0.118783)=2.8508.\end{aligned}$$

We have explicit simple representations of the exact densities for the special cases p = 1, p = 2, t = 2, t = 3. However, our situation being p = 3, t = 4, they do not apply. A chisquare approximation is available for large values of rst, but our rst is only equal to 24. In this instance, \(-2\ln \lambda \to \chi ^2_{p(r-1)(s-1)}\simeq \,\chi ^2_{6}\) as rst →. However, since the observed value of \(-2\ln \lambda =2.8508\) happens to be much smaller than the critical value resulting from the asymptotic distribution, which is \(\chi ^2_{6,0.05}=12.59\) in this case, we can still safely decide not to reject the hypothesis H o : Γ ij = O for all i and j, and go ahead and test for the main row and column effects, that is, the main effects of medical combinations Med-1 and Med-2 and the main effects of exercise routines Ex-1, Ex-2 and Ex-3. For testing the row effect, our hypothesis is A 1 = A 2 = O and for testing the column effect, it is B 1 = B 2 = B 3 = O, given that Γ ij = O for all i and j. The corresponding likelihood ratio criteria are respectively,

$$\displaystyle \begin{aligned}\lambda_1=\Big(\frac{|S_{res}|}{|S_{res}+S_{row}|}\Big)^{\frac{rst}{2}}\ \text{and}\ \ \lambda_2=\Big(\frac{|S_{res}|}{|S_{res}+S_{col}|}\Big)^{\frac{rst}{2}}, \end{aligned}$$

and we may utilize \(w_j=\lambda _j^{\frac {2}{rst}},\ j=1,2\). From previous calculations, we have

The required determinants are as follows:

$$\displaystyle \begin{aligned} |S_{res}|&=363436,\ |S_{res}+S_{row}|=675724,\ |S_{res}+S_{col}|=443176\Rightarrow\\ w_1&=\frac{363436}{675724}=0.5378468 \ \text{and }\ w_2=\frac{363436}{443176}=0.8200714,\\ -2\ln\lambda_1&=-24\ln 0.5378468=24(0.62018)=14.88,\\ &\chi^2_{p(r-1),\alpha}=\chi^2_{3,0.05}=7.81<14.88; \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} -2\ln\lambda_2&=-24\ln 0.8200714=24(0.19836)=4.76,\\ &\chi^2_{p(s-1),\alpha}=\chi^2_{6,0.05}=12.59>4.76.\end{aligned} $$
(ii)

When \(rst\to \infty ,-2\ln \lambda _1\to \chi ^2_{p(r-1)}\) and \( -2\ln \lambda _2\to \chi ^2_{p(s-1)}\), referring to Exercises 13.5.9 and 13.5.10, respectively. These results follow from the asymptotic expansion provided in Sect. 12.5.2. Even though rst = 24 is not that large, we may use these chisquare approximations for making decisions as the exact densities of w 1 and w 2 do not fall into the special cases previously discussed. When making use of the likelihood ratio criterion, we reject the hypotheses A 1 = A 2 = O and B 1 = B 2 = B 3 = O for small values of λ 1 and λ 2, respectively, which translates into large values of the approximate chisquare values. It is seen from (i) that the observed value \(-2\ln \lambda _1\) is larger than the tabulated critical value and hence we reject the hypothesis A 1 = A 2 = O at the 5% level. However, the hypothesis B 1 = B 2 = B 3 = O is not rejected since the observed value is less than the critical value. We may conclude that the present data does not show any evidence of interaction between the exercise routines and medication combinations, that the exercise routine does not contribute significantly to bringing the subjects’ initial readings closer to the standard values (120, 80, 60), whereas there is a possibility that the medical combinations Med-1 and Med-2 are effective in significantly causing the subjects’ initial readings to approach standard values.

Note 13.5.1

It may be noticed from the MANOVA table that the second stage analysis will involve one observation per cell in a two-way layout, that is, the (i, j)-th cell will contain only one observation vector X ij. for the second stage analysis. Thus, S sub = S int + S row + S col (the corresponding sum of squares in the real scalar case), and in this analysis with a single observation per cell, S int acts as the residual sum of squares and cross products matrix (the residual sum of squares in the real scalar case). Accordingly, “interaction” cannot be tested when there is only a single observation per cell.

Exercises

13.1

In the ANOVA table obtained in Example 13.5.1, prove that (1) the sum of squares due to interaction and the residual sum of squares, (2) the sum of squares due to rows and residual sum of squares, (3) the sum of squares due to columns and residual sum of squares, are independently distributed under the normality assumption for the error variables, that is, \(e_{ijk}\overset {iid}{\sim } N_1(0,\sigma ^2),\ \sigma ^2>0\).

13.2

In the MANOVA table obtained in Example 13.5.1, prove that (1) S int and S res, (2) S row and S res, (3) S col and S res, are independently distributed Wishart matrices when \(E_{ijk}\overset {iid}{\sim } N_p(O,\varSigma ),\ \varSigma >O\).

13.3

In a one-way layout, the following are the data on four treatments. (1) Carry out a complete ANOVA on the first component (including individual comparisons if the hypothesis of no interaction is not rejected). (2) Perform a full MANOVA on the full data.

13.4

Carry out a full one-way MANOVA on the following data:

13.5

The following are the data on a two-way layout where A ij denotes the data on the i-th row and j-th column cell. (1) Perform a complete ANOVA on the first component. (2) Carry out a full MANOVA on the full data. (3) Verify that S row + S col + S int = S sub and S sub + S res = S tot, (4) Evaluate the exact density of w.

$$\displaystyle \begin{aligned} A_{11}&=\begin{array}{rrrr}2&1&2&-1\\ 1&2&1&0\\ -1&3&1&4\end{array},\ A_{12}=\begin{array}{rrrr}1&3&1&1\\ 4&4&1&1\\ 6&3&-1&0\end{array},\ A_{13}=\begin{array}{rrrr}1&-1&1&-1\\ -2&1&2&1\\ 1&-2&1&-2\end{array},\\ A_{21}&=\begin{array}{rrrr}3&4&2&3\\ 2&-1&2&3\\ 3&5&2&2\end{array},\ A_{22}=\begin{array}{rrrr}1&-1&1&-1\\ 0&1&-1&1\\ 1&1&1&1\end{array},\ A_{23}=\begin{array}{rrrr}2&3&3&0\\ 3&2&-1&2\\ 3&-3&2&4\end{array}. \end{aligned} $$

13.6

Carry out a complete MANOVA on the following data where A ij indicates the data in the i-th row and j-th column cell.

$$\displaystyle \begin{aligned} &A_{11}=\begin{array}{rrrrr}1&-1&1&1&1\\ 3&2&1&4&2\end{array},\ A_{12}=\begin{array}{rrrrr}4&3&4&2&2\\ 5&6&7&2&1\end{array},\ A_{13}=\begin{array}{rrrrr}5&3&5&4&5\\ 5&4&2&4&5\end{array},\\ &A_{21}=\begin{array}{rrrrr}0&1&1&0&-2\\ 2&2&0&-1&-1\end{array}, \ A_{22}=\begin{array}{rrrrr}6&7&6&4&5\\ 4&5&2&3&4\end{array},\ A_{23}=\begin{array}{rrrrr}1&0&1&-1&-1\\ 1&-1&2&1&2\end{array}.\end{aligned} $$

13.7

Under the hypothesis A 1 = ⋯ = A r = O, prove that \(U_1=(S_{res}+S_{row})^{-\frac {1}{2}}S_{res}\) \((S_{res}+S_{row})^{-\frac {1}{2}}\), is a real matrix-variate type-1 beta with the parameters \((\frac {rs(t-1)}{2},\frac {r-1}{2})\) for r ≥ p, rs(t − 1) ≥ p, when the hypothesis Γ ij = O for all i and j is not rejected or assuming that Γ ij = O. The determinant of U 1 appears in the likelihood ratio criterion in this case.

13.8

Under the hypothesis B 1 = ⋯ = B s = O when the hypothesis Γ ij = O is not rejected, or assuming that Γ ij = O, prove that \(U_2=(S_{res}+S_{col})^{-\frac {1}{2}}S_{res}(S_{res}+S_{col})^{-\frac {1}{2}}\) is a real matrix-variate type-1 beta random variable with the parameters \((\frac {rs(t-1)}{2},\ \frac {s-1}{2})\) for s ≥ p, rs(t − 1) ≥ p. The determinant of U 2 appears in the likelihood ratio criterion for testing the main effect B j = O, j = 1, …, s.

13.9

Show that when \(rst\to \infty , -2\ln \lambda _1\to \chi ^2_{p(r-1)}\), that is, \( -2\ln \lambda _1\) asymptotically tends to a real scalar chisquare having p(r − 1) degrees of freedom, where λ 1 = |U 1| and U 1 is as defined in Exercise 13.7. [Hint: Look into the general h-th moment of λ 1 in this case, which can be evaluated by using the density of U 1]. Hence for large values of rst, one can use this approximate chisquare distribution for testing the hypothesis A 1 = ⋯ = A r = O.

13.10

Show that when \(rst\to \infty , -2\ln \lambda _2\to \chi ^2_{p(s-1)},\) that is, \(-2\ln \lambda _2\) asymptotically converges to a real scalar chisquare having p(s − 1) degrees of freedom, where λ 2 = |U 2| with U 2 as defined in Exercise 13.8. [Hint: Look at the h-th moment of λ 2]. For large values of rst, one can utilize this approximate chisquare distribution for testing the hypothesis B 1 = ⋯ = B s = O.