Normal (Gaussian) Distribution
March 29, 2018
The most common and widely used probability distribution in statistical analysis is the Normal (or Gaussian) distribution function. (It is sometimes called the “Bell Curve.”) The PDF, p(x), is defined by Eq. 9 and is shown in standard form in Fig. 6, where μ is the mean and σ is the standard deviation of the variable, x. Being a symmetrical distribution, the skewness = 0, while the kurtosis = 3 (excess kurtosis = 0).
It is named after the German mathematician Carl Friedrich Gauss who derived its basic form in his 1809 monogram on the motion of planets and comets. He found that by using the function e –kδ2 to represent the distribution of random errors δ in measurements (where k is a scale factor) he could prove that the Least Squares method of fitting data (see Section 2.5) provided the best (most probable) result. He also used it to determine the statistics of the mean value of a series of measurements having this random error (see lesson on Confidence Intervals).
Another form for representing a probability distribution is the cumulative distribution function (CDF) which plots the probability that x is less than a particular value X, which is often written as P(x < X). This is shown in Fig. 7 for the Normal distribution. Mathematically it is given by
where erf(…) is the error function.
Graphically this is shown by the area P here:
(The total area under the PDF curve is equal to 1 by definition.)
The Normal distribution is completely defined by the mean value and the standard deviation, as seen in Eq. 9. Therefore, in many cases these values are computed from a data set using Eqs. 1 & 4, and the Normal distribution is assumed. This can potentially lead to serious errors if the tails of the distribution (levels outside of ± 2σ from the mean) are used in calculations without first checking if at least the skewness and kurtosis of the data match a Normal distribution. An example of this is the turbulent pressure shown in Fig. 3.
Another form for plotting the CDF converts the vertical axis scale to that which would produce a straight line for a Normal distribution. This is called a Normal Probability Plot. This makes it easier to visually determine if the data is Normally distributed. In the “old days” when things were plotted by hand on graph paper, there was a standard form for this, called “probability graph paper” (see Fig. 8). Since most computer graphing programs do not have this capability, a similar graph can be constructed by mapping the CDF values back to the Normal Z values and plotting these versus the actual Z values.
Examples of Normal Probability Plots are shown in Fig. 9 using the car vibration and turbulent pressure data from Figs. 1 & 3. The straight diagonal line represents the Normal distribution. The car vibration data closely follows this, while the turbulent pressure data deviates significantly at the two ends of the distribution.