Central Limit Theorem

March 29, 2018


Pierre-Simon Laplace, 1749 -1827

The central limit theorem states that the sum (or average) of sets of random variables will move toward a normal distribution as N increases. The French mathematician Laplace proved this for a number of general cases in 1810 [2]. Laplace also derived an expression for the standard deviation of the average of a set of random numbers, confirming what Gauss had assumed for his derivation of the normal distribution (visit the lesson on Confidence Intervals.)

Examples of the Central Limit Theorem

The simplest and most remarkable example of the central limit theorem is the coin toss. If a “true” coin is flipped N times, the probability of q heads occurring is given by Equation 11, which is called the binomial distribution.

(1)   \begin{equation*} P(q)=\frac{N!}{2^N(N-q)!q!} \end{equation*}

Equation 11

Figure 3.10 plots a histogram of the binomial distribution in comparison to the normal distribution for 6 coin tosses. There is good agreement between the two distributions, even with only 6 tosses (although the tails of the normal distribution extend beyond the possible values of q). The French mathematician De Moivre noticed this agreement in 1733 and used (2/πN)1/2 e-(2/N)(qN /2)2 as an approximation for the cumbersome calculation of Equation 11 for large N. However, he hadn’t generalized this to other cases.

Figure 10. Probabilities of q heads in 6 coin tosses.

Figure 3.10. Probability of q heads in 6 coin tosses.

Another example is the averaging of a random variable, x, uniformly distributed between -0.5 and 0.5 (which might be the range of uncertainty in measurement). By averaging 2, 3, and 4 of these random variables, the gradual convergence to the normal distribution can be seen in Figure 3.11.

Figure 11. PDF's of the Averages of Uniformly Distributed Random Variables.

Figure 3.11. PDF of the averages of uniformly distributed random variables.

When the magnitude of the PDF is plotted on a linear scale, it is not clear what is happening at the tails of the distribution. This can be corrected by plotting the magnitude on a logarithmic scale so the large percentage deviation between the average of four uniformly-distributed random variables and the normal distribution can be seen in the tails of the distribution. This is significant in cases where the extreme values of a signal are critical to understanding the behavior of a product under test; for example, fatigue analysis.