Central Limit Theorem

The

tendency of empirical events or theoretical entities to form a normal distribution is somewhat analogous to the tendency of water to run down a hill—it is simply the easiest and most natural way of going. In order to have water run down a hill, all we need is water and a hill. In order to have numerical values form a normal distribution, all we need is the summation—the combined additive result—of a multiplicity of random coincidences. This simple but very important principle is embodied on the formal side of probability theory by central limit theorem, which demonstrates mathematically that the sums of a sufficiently large multiplicity of random variates will tend to produce a normal distribution.

To illustrate, suppose you had a uniformly distributed population containing equal proportions (hence equally probable instances) of zeros, 1's, 2's, 3's, and 4's. If you were to draw a very large number of random samples from this population, each of size n=2, the possible combinations of drawn values and the sums they would yield for any particular sample would be as indicated in the following table, and the distribution of sample sums would accordingly approximate the triangular one shown in the adjacent graph.

Sums	Combinations
0	0,0
1	0,1 1,0
2	1,1 2,0 0,2
3	1,2 2,1 3,0 0,3
4	1,3 3,1 2,2 4,0 0,4
5	1,4 4,1 3,2 2,3
6	3,3 4,2 2,4
7	3,4 4,3
8	4,4


0	1	2	3	4	5	6	7	8
sum (n=2)


0	1	2	3	4	5	6	7	8	9	10	11	12
sum (n=3)

Increase the size of the samples to n=3, however, and the resulting distribution of sample sums begins to look a bit more like the familiar outlines of the normal distribution. Increase it to n=4, and the resemblance is closer. With n=5, it is closer still. In general, the larger the size of n, the more closely will the sums of random variates approximate the form of the normal distribution.

The programming for the present demonstration begins by creating inside your computer a uniformly distributed population of the sort described above, containing equal proportions of zeros, 1's, 2's, 3's, and 4's. Each time you click the button below labeled "100 Samples," your computer will draw 100 random samples from this population; each time you click the button labeled "1000 Samples," it will draw 1000 random samples. For each individual sample it will calculate the sum of the 5 random variates, and after each batch of samples (100 or 1000) it will graphically display the cumulative relative frequencies of sample sums. The results of the first thousand samples—perhaps even of the first several thousand—might not look much like a normal distribution. But keep clicking, and sooner or later the outlines of the graph in the lower frame will begin to assume the familiar bell-like shape.

A blank column indicates that the cumulative proportion of samples ending up with a particular sum (0, 1, 2, etc.) is either zero or too near to zero to represent graphically.

																					relative frequency (cumulative)
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
sum (n=5)

	Count:
If your computer is fairly fast, try this one==>

Home

Click this link only if you did not arrive here via the VassarStats main page.