Normal (Gaussian) Distribution

March 29, 2018

The most common and widely used probability distribution in statistical analysis is the Normal (or Gaussian) distribution function. It is sometimes called the “Bell Curve.”  The PDF, p(x), is defined by Equation 9 and is shown in standard form in Figure 3.6, where μ is the mean and σ is the standard deviation of the variable, x. As a symmetrical distribution, the skewness equals zero, while the kurtosis equals 3 (excess kurtosis = 0).

(1)   \begin{equation*} p(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \end{equation*}

Equation 9


Figure 3.6. The Normal, or Gaussian, PDF; the “bell curve.”


Carl Friedrich Gauss, 1777 – 1855

It is named after the German mathematician Carl Friedrich Gauss who derived its basic form in his 1809 monogram on the motion of planets and comets. He found that by using the function e –kδ2 to represent the distribution of random errors δ  in measurements (where k is a scale factor) he could prove that the Least Squares method of fitting data (see Section 2.5) provided the best (most probable) result. He also used it to determine the statistics of the mean value of a series of measurements having this random error (jump to the lesson on Confidence Intervals).

Another form for representing a probability distribution is the cumulative distribution function (CDF) which plots the probability that x is less than a particular value X, which is often written as P(x < X). This is shown in Figure 3.7 for the Normal distribution. Mathematically it is given by Equation 10, where erf(…) is the error function.

(2)   \begin{equation*} P(x<X)=\int_{-\infty}^{X}p(x)dx=\frac{1}{2}[1+\text{erf}\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)] \end{equation*}

Equation 10

Graphically this is shown by the area P here:


The total area under the PDF curve is equal to 1 by definition.


Figure 3.7. The Normal cumulative distribution function.

The Normal distribution is completely defined by the mean value and the standard deviation, as seen in Eq. 9. Therefore, in many cases, these values are computed from a data set using Eqs. 1 & 4, and the Normal distribution is assumed. This can potentially lead to serious errors if the tails of the distribution (levels outside of  ± 2σ  from the mean) are used in calculations without first checking if at least the skewness and kurtosis of the data match a Normal distribution. An example of this is the turbulent pressure shown in Figure 3.3.

Another form for plotting the CDF converts the vertical axis scale to that which would produce a straight line for a Normal distribution. This is called a Normal Probability Plot. This makes it easier to visually determine if the data are Normally distributed. In the “old days” when things were plotted by hand on graph paper, there was a standard form for this, called “probability graph paper” (see Figure 3.8.) Since most computer graphing programs do not have this capability, a similar graph can be constructed by mapping the CDF values back to the Normal Z values and plotting these versus the actual Z values.

Figure 9a. Probability Graph Paper.

Figure 3.8. Probability graph paper.

Examples of Normal Probability Plots are shown in Figure 3.9 using the car vibration and turbulent pressure data from Figures 3.1 and 3.3. The straight diagonal line represents the Normal distribution. The car vibration data closely follows this, while the turbulent pressure data deviates significantly at the two ends of the distribution.


Figure 3.9. Normal Probability Plots of measured data in Figures 3.1 and 3.3.