Finding a significant *F*-ratio when you are comparing more
than two groups only tells you that at least one of the groups is significantly
different from at least one other group. To find out which groups are different
from which other groups, you have to conduct additional analyses, which are
referred to as **probing** (short for probing the results of the ANOVA).

There are two general approaches to probing the results of a
significant ANOVA. The first is to decide what means should be significantly
different from what other means before the study is conducted based on a theoretical
analysis of the study. This approach involves performing **planned comparisons**.
This is the preferred approach if you have a strong theoretical reason to expect
a specific pattern of results. The other approach is **post hoc tests**,
which are not planned in advance, but rather conducted "after the
fact" to see which means are different from which other means. We will
cover both in this section.

Planned comparisons also go by the name of **contrasts**,
which technically refer to the weighted sum of means that define the planned
comparison. Let's use an example to explain what that means. Suppose that you
have four groups in your study and that one of your hypotheses is that groups 3
and 4 should differ from one another. Now this hypothesis, formulated before the
study is ever run, is that basis for a planned comparison. To test this
hypothesis, we need to set up a contrast in the following form. We will use the
letter C with a caret over it to indicate our contrast. The caret simply means that our contrast is based on estimates of population means
from on our samples, rather than being based on the actual population means. So
the equation for the contrast in its general form (for four groups) looks like
this. Each *w* in the equation is the weight that is multiplied by its
particular mean.

We can rewrite that equation for the contract in a more general
form, as shown below. In this equation, we sum across the number of groups *(k)*,
multiplying each of the means by the appropriate weight.

Now, how do we determine the set of appropriate weights for a given
contrast. The appropriate weights are those that define the comparison that you
want to make and at the same time sum to zero. In our example of a planned
comparison, we want to compare groups 3 and 4, but we must have a weight to
multiple against each of the four means. The way to * exclude* groups 1 and 2
from the analysis is to give both of them a weight of 0. Then if we give group 3
a weight of 1, the weight for group 4 would have to be a -1 in order for the sum
of the weights to equal zero, as shown here.

Any comparison that we can specify in words can be specified with a set of weights that meet the criterion that the sum of the weights is zero. For example, if we wanted to compare the average of groups 1 and 3 against the average of groups 2 and 4, we could use the weights +1, -1, +1, and -1 for groups 1 through 4, respectively. By the way, any set of weights that define the expected relationship and sum to zero will work for the contrast. So we could just as easily used +2, -2, +2, and -2 or -.25, +.25, -.25, and +.25. The signs can be reversed and the numbers can be any multiple or fraction of a set of weights that work. It is convenient to use integers, but not necessary.

We can also define other contrasts. For example, we might hypothesis that group 4 will be different from the average of groups 1 and 2. Because group 3 is excluded from this hypothesis, its weight must be zero. Groups 1 and 2 are to be averaged, so they must have the same weight. If we arbitrarily give them weights of +1 and +1, then the weight of group 4 must be -2, because that is the only weight that will produce of sum of weights that is equal to zero.

So getting back to our initial example of a planned comparison (comparing groups 3 and 4), our contrast is computed using the equation below.

To test this planned comparison, you will compute a *t*
using the following equation.

This equation looks complicated, but you are already familiar
with each of the terms. The numerator is the contrast that you just computed.
The denominator is a function of the Mean Square within groups MS_{w}), which is
routinely computed for the ANOVA. The denominator is also a function of the weights used in the computation of the
contrast, and the sample sizes for each of the groups. The computed value of *t*
has to be compared with the critical value, and to determine the critical value,
you must know the degrees of freedom and decide on your alpha level. The degrees
of freedom is equal to *N-k*, which happens to be df_{w}.
So you can get that value from the ANOVA summary table.

If you want to see how you set up a planned comparison using *SPSS
for Windows*, click on the link below. Use the browsers back arrow key to
return to this page.

Compute a Planned Comparison using SPSS |

USE THE BROWSER'SBACK ARROW KEY TO RETURN |

You need not restrict yourself to a single planned comparison, but there is a requirement that all planned comparisons be independent of one another. Independence in this case is defined statistically. If two comparisons are independent, the product of the weights for the two contrasts, summed across the groups, is equal to zero, as shown in the equation below.

This principle of independence is best illustrated with a couple of examples. Our example of a planned comparison used the following weights: 0, 0, +1, and -1. Suppose we wanted to test a second planned comparison that said that the average of groups 1 and 2 is different from the average of groups 3 and 4. Would this second planned comparison be independent of the first planned comparison? To test the second planned comparison, we might use weights of +1, +1, -1, and -1, although you learned that this set is one of many possible sets that would test this hypothesis. So are these two planned comparisons independent of one another. Plugging the values into the equation above, we get the following.

Now lets suppose that we are considering testing a third planned comparison that the average of groups 1 through 3 is different from group 4. A set of weights that would test this hypothesis is +1, +1, +1, and -3. Is this planned comparison independent of both the first and second planned comparisons. The two equations below will check on independence.

The sum of the product of the weights for the two planned comparisons in both cases is not equal to zero. So the third planned comparison is not independent of either the first or the second. Both the first and second planned comparisons in our example can be tested, because they are independent, but if we test either of those, we cannot test the third proposed planned comparison.

Post hoc tests are typically used to evaluate pairs of groups to see if they
are statistically significant from one another. Unlike planned comparisons,
these evaluations are not planned in advance, but rather represent a search
"after the fact" to see where the statistically significant differences
exist. The
number of such comparisons that can be done following a one-way ANOVA depends on
the number of groups. There is a simple formula for computing the number of
possible comparisons, which is shown below. The letter * k* in the formula is the
number of groups. So, if you have 4 groups, you have 6 possible comparisons
(4*3/2). If you have 8 groups, you have 28 possible comparisons (8*7/2).
Clearly, the number of possible 2-group comparisons increases rapidly as the
number of groups increases.

One of the problems with conducting so many comparisons is that you increase
the probability of finding some of the differences statistically significant by
chance alone. Statisticians refer to this problem as **inflating the Type I
error level**. Remember that Type I errors are rejecting the null hypothesis
when the null hypothesis is true. In other words, a Type I error is concluding
that the evidence suggests that the populations differ when in fact they do not
differ. We set the level of Type I error when we set the alpha level. So if
alpha is set at .05, it means that we will make a Type I error 5% of the time.
But that means that if you do 20 comparisons, you can expect one of them to be
significant by chance alone, and you will have no way of knowing which of the
significant differences you find represent that Type I error. Statisticians
address this problem in post hoc testing, when it is common to have multiple
comparisons, by making the criteria for significant more stringent. Post
hoc tests build this additional stringency into the procedures. That is why is
is absolutely * inappropriate* to use a standard *t*-test as a post hoc test.

There are more than a dozen post hoc tests that have been
published. Going over all of them is beyond the scope of this text. Instead, we
will introduce you to to the names of some of the more popular tests and
describe how they compare to one another. The post hoc tests fall into two broad
class: tests that do and do not assume homogeneity of variance. **Homogeneity
of variance** means that the variances are statistically equal in the groups,
which means that the differences in variances found are small enough to be due
to sampling error alone.

**Tests that Assume Homogeneity of Variance**. The largest
class of post hoc tests assumes that we have homogeneity of variance. The most
frequently used tests in this category are the **Tukey test **(sometimes
called Tukey's Honestly Significant Difference test or HSD), the **Scheffe test**,
the **Least Squared Difference test** (LSD), and the **Bonferroni test**.
All of these tests are variations on a *t*-test, in which some specific
steps are taken to control for the problem of inflating the level of Type I
error by doing so many multiple comparisons. The Bonferroni test does this in
the most explicit manner, essentially using a standard *t*-test, but
computing the critical value of *t* that will produce what is called an
experiment-wise alpha of a given level. An **experiment-wise alpha** of .05
means that the probability of making a Type I error anywhere in the experiment
is set at .05. The only way you can do that if you have multiple comparisons is
to use a more stringent alpha level for each of those comparisons. Exactly how
stringent will depend on how many comparisons you are making. The more
comparisons you make, the more stringent your alpha must be for each of them in
order to control the experiment-wise alpha level.

The other tests mentioned here, as well as half a dozen other published post hoc tests, each take a different approach to controlling the experiment-wise level of Type I error. The reason that so many post hoc tests exist is that there is no agreement on which test achieves this goal best. We can roughly rank order the tests on how conservative they are, meaning how likely they are to declare a given comparison significant. However, there is considerable debate about whether a given test is either too conservative or not conservative enough. That is why most computer analysis programs will give users a dozen or more choices for post hoc tests. Each researcher tends to have his or her preferred post hoc test.

**Tests that Do Not Assume Homogeneity of Variance**. If you
do not have homogeneity of variance, any statistical procedure that implicitly
assume homogeneity will be distorted by the violation of this assumption. More
importantly, the distortion will be in the direction of suggesting a difference
exists when in fact it does not exist. In other words, the violation of this
assumption increases the level of Type I error. Remember that we set the level
of Type I error (called alpha) low, because we want to avoid these kinds of
errors. So, from a statistical perspective, we have a serious problem.

There are several post hoc tests, the best known of which is **Dunnett's
C test**, that correct for the problem of not having homogeneity of variance.
These tests all tend to be more conservative than the tests that assume
homogeneity, but they also tend to be more accurate if the variances in the
groups actually do differ.

If you want to see how you set up a post hoc tests using *SPSS
for Windows*, click on the link below. Use the browsers back arrow key to
return to this page.

Compute Post Hoc Tests using SPSS |

USE THE BROWSER'SBACK ARROW KEY TO RETURN |