Normal (Gaussian) Distribution

March 29, 2018

The most common probability distribution in statistical analysis is the normal (or “Gaussian”) distribution function. Another name for it is the bell curve. Equation 9 defines the PDF p(x).

(1)   \begin{equation*} p(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \end{equation*}

Equation 9

Figure 3.6 displays the PDF in standard form, where μ is the mean and σ is the standard deviation of the variable x. The distribution is symmetrical, so the skewness equals zero, and the kurtosis equals 3. Excess kurtosis equals 0.


Figure 3.6. A normal (Gaussian) PDF; the “bell curve.”

The mean value and the standard deviation define the normal distribution. In many cases, these values are computed from a data set using Equations 1 & 4, and normal distribution is assumed.

This assumption can potentially lead to error if the distribution’s tails (the levels outside ±2σ from the mean) are a part of the calculations, and the data’s skewness and kurtosis values do not match the normal distribution. An example of this is the turbulent pressure in Figure 3.3.


Carl Friedrich Gauss, 1777 – 1855

Historical Background

Gaussian distribution is named after the German mathematician Carl Friedrich Gauss. He derived the distribution’s basic form in his 1809 monogram on the motion of planets and comets.

Using the function e –kδ2 to represent the distribution of random errors δ in a measurement (where k is a scale factor), Gauss could prove that the least-squares method of fitting data generated the most probable result. He also used the distribution to statistically determine if the mean value of a series of measurements has this random error (visit the lesson on Confidence Intervals.)

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) is another function that represents a probability distribution. The CDF plots the probability that x is less than a set value X and is often written as P(x < X). Figure 3.7 displays the CDF for normal distribution.


Figure 3.7. The normal cumulative distribution function.

Mathematically, it is given by Equation 10, where erf(…) is the error function.

(2)   \begin{equation*} P(x<X)=\int_{-\infty}^{X}p(x)dx=\frac{1}{2}\left[1+\text{erf}\left(\frac{x-\mu}{\sqrt{2}\sigma}\right)\right] \end{equation*}

Equation 10

Graphically, the equation is represented by the area P in the following image. The total area under the PDF curve is equal to 1 by definition.


Probability Plots

Another way to plot the CDF is to convert the vertical axis scale, which produces a straight line for normal distribution. The result is called a normal probability plot. With this plot, it is easier to visually determine if data are normally distributed.

In the “old days” when things were plotted by hand on graph paper, there was a standard form called probability graph paper (Figure 3.8). As most computer graphing programs do not have this capability, a similar graph can be constructed by mapping the CDF values back to the normal Z values, then plotting these values versus the actual ones.

Figure 9a. Probability Graph Paper.

Figure 3.8. Probability graph paper.

Figure 3.9 is an example of normal probability plots using car vibration and turbulent pressure data from Figures 3.1 and 3.3. The straight diagonal line represents the normal distribution.


Figure 3.9. Normal probability plots of measured data from Figures 3.1 and 3.3.

The car vibration data closely follows the normal distribution, while the turbulent pressure data deviates significantly at the two ends of the distribution.