Kruskal-Wallis Test

Subchapter 14a.
The Kruskal-Wallis Test
for 3 or More Independent Samples

As a reminder, the assumptions of the one-way ANOVA for independent samples are

that the scale on which the dependent variable is measured has the properties of an equal interval scale;_T
that the k samples are independently and randomly drawn from the source population(s);_T
that the source population(s) can be reasonably supposed to have a normal distribution; and_T
that the k samples have approximately equal variances.

We noted in the main body of Chapter 14 that we need not worry very much about the first, third, and fourth of these assumptions when the samples are all the same size. For in that case the analysis of variance is quite robust, by which we mean relatively unperturbed by the violation of its assumptions. But of course, the other side of the coin is that when the samples are not all the same size, we do need to worry. In this case, should one or more of assumptions 1, 3, and 4 fail to be met, an appropriate non-parametric alternative to the one-way independent-samples ANOVA can be found in the Kruskal-Wallis Test.

I will illustrate the Kruskal-Wallis test with an example based on rating-scale data, since this is by far the most common situation in which unequal sample sizes would call for the use of a non-parametric alternative. In this particular case the number of groups is k=3. I think it will be fairly obvious how the logic and procedure would be extended in cases where k is greater than 3.

To assess the effects of expectation on the perception of aesthetic quality, an investigator randomly sorts 24 amateur wine aficionados into three groups, A, B, and C, of 8 subjects each. Each subject is scheduled for an individual interview. Unfortunately, one of the subjects of group B and two of group C fail to show up for their interviews, so the investigator must make do with samples of unequal size: n_a=8, n_b=7, and n_c=6, for a total of N=21. The subjects who do show up for their interviews are each asked to rate the overall quality of each of three wines on a 10-point scale, with "1" standing at the bottom of the scale and "10" at the top.

	Group
	A	B	C
	6.4 6.8 7.2 8.3 8.4 9.1 9.4 9.7	2.5 3.7 4.9 5.4 5.9 8.1 8.2	1.3 4.1 4.9 5.2 5.5 8.2
mean	8.2	5.5	4.9

As it happens, the three wines are the same for all subjects. The only difference is in the texture of the interview, which is designed to induce a relatively high expectation of quality in the members of group A; a relatively low expectation in the members of group C; and a merely neutral state, tending in neither the one direction nor the other, for the members of group B. At the end of the study, each subject's ratings are averaged across all three wines, and this average is then taken as the raw measure for that particular subject. The adjacent table shows these measures for each subject in each of the three groups.

¶Mechanics

The preliminaries of the Kruskal-Wallis test are much the same as those of the Mann-Whitney test described in Subchapter 11a. We begin by assembling the measures from all k samples into a single set of size N. These assembled measures are rank-ordered from lowest (rank#1) to highest (rank#N), with tied ranks included where appropriate; and the resulting ranks are then returned to the sample, A, B, or C, to which they belong and substituted for the raw measures that gave rise to them. Thus, the raw measures that appear in the following table on the left are replaced by their respective ranks, as shown in the table on the right.

Raw Measures			Ranked Measures
A	B	C	A	B	C
6.4 6.8 7.2 8.3 8.4 9.1 9.4 9.7	2.5 3.7 4.9 5.4 5.9 8.1 8.2	1.3 4.1 4.9 5.2 5.5 8.2	11 12 13 17 18 19 20 21	2 3 5.5 8 10 14 15.5	1 4 5.5 7 9 15.5	A, B, C Combined
sum of ranks			131	58	42	231
average of ranks			16.4	8.3	7.0	11

With the Kruskal-Wallis test, however, we take account not only of the sums of the ranks within each group, but also of the averages. Thus the following items of symbolic notation:


	T_A =	the sum of the n_a ranks in group A
	M_A =	the mean of the n_a ranks in group A

	T_B =	the sum of the n_b ranks in group B
	M_B =	the mean of the n_b ranks in group B

	T_C =	the sum of the n_c ranks in group C
	M_C =	the mean of the n_c ranks in group C

	T_all =	the sum of the N ranks in all groups combined
	M_all =	the mean of the N ranks in all groups combined

¶Logic and Procedure

·The Measure of Aggregate Group Differences

You will sometimes find the Kruskal-Wallis test described as an "analysis of variance by ranks." Although it is not really an analysis of variance at all, it does bear a certain resemblance to ANOVA up to a point. In both procedures, the first part of the task is to find a measure of the aggregate degree to which the group means differ. With ANOVA that measure is found in the quantity known as SS_bg, which is the between-groups sum of squared deviates. The same is true with the Kruskal-Wallis test, except that here the group means are based on ranks rather than on the raw measures. As a reminder that we are now dealing with ranks, we will symbolize this new version of the between-groups sum of squared deviates as SS_bg(R). The following table summarizes the mean ranks for the present example. Also included are the sums and the counts (n_a, n_b, n_c, and N) on which these means are based.

	A	B	C	All
counts	8	7	6	21
sums	131	58	42	231
means	16.4	8.3	7.0	11.0

In Chapters 13 and 14 you saw that the squared deviate for any particular group mean is equal to the squared difference between that group mean and the mean of the overall array of data, multiplied by the number of observations on which the group mean is based. Thus, for each of our current three groups

	A:	8(16.4—11.0)² = 233.3
	B:	7(8.3—11.0)² = 051.0
	C:	6(7.0—11.0)² = 096.0
		SS_bg(R) = 380.3

On analogy with the formulaic structures for SS_bg developed in Chapters 13 and 14, we can write the conceptual formula for SS_bg(R) as

SS_bg(R) =(

[n_g(M_g—M_all)²]

Here as well, the subscript "g"
means "any particular group."

and the computational formula as

SS_bg(R)

(T_g)²

n_g

—

(T_all)²

N_a

With k=3 samples, this latter structure would be equivalent to

SS_bg(R)

(T_A)²

n_a

(T_B)²

n_b

(T_C)²

n_c

—

(T_all)²

N_a

For k=4 it would be

SS_bg(R)

(T_A)²

n_a

(T_B)²

n_b

(T_C)²

n_c

(T_D)²

n_d

—

(T_all)²

N_a

And so forth for other values of k.

Here, in any event, is how it would work out for the present example. The discrepancy between what we get now and what we got a moment ago (380.3) is due to rounding error in the earlier calculation. As usual, it is the computational formula that is the less susceptible to rounding error, hence the more reliable.

	SS_bg(R)	=	(131)² 8	+	(58)² 7	+	(42)² 6	—	(231)² 21
		= 378.7

·The Null-Hypothesis Value of SS_bg(R)

The null hypothesis in this or any comparable situation involving several independent samples of ranked data is that the mean ranks of the k groups will not substantially differ. On this account, you might suppose that the null-hypothesis value of SS_bg(R), the aggregate measure of group differences, would be simply zero. A moment's reflection, however, will show why this cannot be so.

A	B	C
x x	x x	x x

Consider the very simple case where there are 3 groups, each containing 2 observations. By way of analogy, imagine you had six small cards representing the ranks "1," "2," "3," "4," "5," and "6." If you were to sort these cards into every possible combination of two ranks per group, you would find the total number of possible combinations to be


N! n_a! n_b! n_c!	=	6! 2! 2! 2!	= 90

And the values of SS_bg(R) produced by these 90 combinations would constitute the sampling distribution of SS_bg(R) for this particular case. Of these 90 possible combinations, a few (6) would yield values of SS_bg(R) equal to exactly zero. All the rest would produce values greater than zero. (It is mathematically impossible to have a sum of squared deviates less than zero.) Accordingly, the mean of this sampling distribution—the value that observed instances of SS_bg(R) will tend to approximate if the null hypothesis is true—is not zero, but something greater than zero.

In any particular case of this sort, the mean of the sampling distribution of SS_bg(R) is given by the formula


	(k—1) x	N(N+1) 12

which for the simple case just examined works out as


(3—1) x	6(6+1) 12	= 7.0

For our main example, we therefore know that the observed value of SS_bg(R)=378.7 belongs to a sampling distribution whose mean is equal to


(3—1) x	21(21+1) 12	= 77.0

All that now remains is to figure out how to turn this fact into a rigorous assessment of probability.

·The Kruskal-Wallis Statistic: H

In case you have been girding yourself for some heavy slogging of the sort encountered with the Mann-Whitney test, you can now relax, for the rest of the journey is quite an easy one. The Kruskal-Wallis procedure concludes by defining a ratio symbolized by the letter H, whose numerator is the observed value of SS_bg(R) and whose denominator includes a portion of the above formula for the mean of the sampling distribution of SS_bg(R). Note that most textbooks give a very different-looking formula for the calculation of H—a rather impenetrable structure to which we will return in a moment. This first version affords a much clearer sense of the underlying concepts.


H	=	SS_bg(R) N(N+1)/12

And now for the denouement. When each of the k samples includes at least 5 observations (that is, when n_a, n_b, n_c, etc., are all equal to or greater than 5), the sampling distribution of H is a very close approximation of the chi-square distribution for df=k—1. It is actually a fairly close approximation even when one or more of the samples includes as few as 3 observations.

For our present example, we can therefore calculate the value of H as


H	=	SS_bg(R) N(N+1)/12	=	378.7 21(21+1)/12	= 9.84

And then, treating this result as though it were a value of chi-square, we can refer it to the sampling distribution of chi-square with df=3—1=2. The following graph, borrowed from Chapter 8, will remind you of the outlines of this particular chi-square distribution. In brief: by the Kruskal-Wallis test, the observed aggregate difference among the three samples is significant a bit beyond the .01 level.

Theoretical Sampling Distribution of Chi-Square for df=2

·An Alternative Formula for the Calculation of H

I noted a moment ago that textbook accounts of the Kruskal-Wallis test usually give a different version of the formula for H. If you are a beginning student calculating H by hand, I would recommend using the version given above, as it gives you a clearer idea of just what H is measuring. Once you get the hang of things, however, you might find this alternative computational formula a bit more convenient.


H	=	12 N(N+1)	(	(T_g)² n_g	)	—	3(N+1)

In any event, as you can see below, this version yields exactly the same result as the other.

21(21+1)

(

(131)²

(58)²

(42)²

)

—

3(21+1)

9.84

The VassarStats web site has a page that will perform all steps of the Kruskal-Wallis test, including the rank-ordering of the raw measures.

End of Subchapter 14a.
Return to Top of Subchapter 14a
Go to Chapter 15 [One-Way Analysis of Variance for Correlated Samples]

Home

Click this link only if the present page does not appear in a frameset headed by the logo Concepts and Applications of Inferential Statistics