Lambda

Lambda: The Goodman-Kruskal Index of Predictive Association

To illustrate the meaning of lambda, suppose you had a total of n=137 instances of X sorted into three categories of A, with the following counts and proportions:

	Count	Proportion
A₁	53	.3869
A₂	39	.2847
A₃	45	.3285

If you were now to predict which of these categories some new instance of X might fall into, your best guess would be A₁, since that is the category with the largest number of items in the observed sample. In this case, assuming the proportions within the sample to be an unbiased reflection of the general population of X, the estimated probability of a correct prediction would be .3869 and the estimated probability of an error would be 1-.3869=.6131.

Now go back to the beginning and suppose that these same n=137 items had been concurrently sorted into three categories of B, with the following results cross- tabulated:

	B₁	B₂	B₃	Total
A₁	19	26	8	53
A₂	21	13	5	39
A₃	6	12	27	45
Total	46	51	40	137

In this circumstance your predictions could be more refined. Knowing a new instance of X to belong to category B₁, your best bet for its membership in A would A₂; knowing it to belong to B₂, your best bet would be A₁; and knowing it to belong to B₃, the best bet would be A₃. Now the estimated probability of a correct prediction would be (21+26+27)/137=.5401, and the estimated probability of an error would be 1-.5401=.4599. Without knowledge of the relationship between A and B, the probability of prediction error is .6131. With that knowledge, it reduces to .4599. The Goodman- Kruskal index is simply a measure of the proportion by which prediction error is reduced in such situations. In the present example, for the case of predicting A on the basis or B, it is

lambda_{[A from B]} = (.6131-.4599)/.6131 = .25

As indicated by the [A from B] subscript in the above formulation, lambda is asymmetric. If you were turn things around so as to make categorical predictions for B, your best bet in the absence of information about A would be B₂, since that is the B category with the largest number of instances. The initial estimated probability of error in this case would be (137-51)/137=.6277. Once you factor in the relationship between A and B, you could refine your guesses by predicting B₂ when X is A₁; B₁ when X is A₂; and B₃ when X is A₃. The estimated probability of correct prediction would now be (26+21+27)/137=.5401, the estimated probability of error would be 1-.5401=.4599, and the proportionate reduction in prediction error would accordingly be

lambda_{[B from A]} = (.6277-.4599)/.6277 = .2673