Normal (Gaussian) Distribution
March 29, 2018
The most common and widely used probability distribution in statistical analysis is the normal (or Gaussian) distribution function. It is sometimes called the “bell curve.” The PDF, p(x), is defined by Equation 9 and is shown in standard form in Figure 3.6, where μ is the mean and σ is the standard deviation of the variable, x. The distribution is symmetrical so the skewness equals zero and the kurtosis equals 3 (excess kurtosis = 0).
Gaussian distribution is named after the German mathematician Carl Friedrich Gauss. He derived the distribution’s basic form in his 1809 monogram on the motion of planets and comets. By using the function e –kδ2 to represent the distribution of random errors δ in measurements (where k is a scale factor), he could prove that the least-squares method of fitting data generated the most probable result. He also used the distribution to determine the statistics of the mean value of a series of measurements having this random error (visit the lesson on Confidence Intervals.)
Cumulative Distribution Function (CDF)
The cumulative distribution function (CDF) is another form for representing a probability distribution. The CDF plots the probability that x is less than a particular value X, which is often written as P(x < X). This is shown in Figure 3.7 for the normal distribution. Mathematically, it is given by Equation 10, where erf(…) is the error function.
Graphically this is shown by the area P here:
The total area under the PDF curve is equal to 1 by definition.
The normal distribution is defined by the mean value and the standard deviation (Equation 9). In many cases, these values are computed from a data set using Equations 1 & 4, and the normal distribution is assumed. This assumption can potentially lead to error if the tails of the distribution (levels outside of ±2σ from the mean) are used in calculations without first checking if the skewness and kurtosis of the data match a normal distribution. An example of this is the turbulent pressure shown in Figure 3.3.
The CDF can also be plotted by converting the vertical axis scale to produce a straight line for normal distribution. This is called a normal probability plot. This plot makes it easier to visually determine if the data are normally distributed. In the “old days” when things were plotted by hand on graph paper, there was a standard form for this called “probability graph paper” (see Figure 3.8.) Since most computer graphing programs do not have this capability, a similar graph can be constructed by mapping the CDF values back to the normal Z values and then plotting these versus the actual Z values.
Examples of normal probability plots are shown in Figure 3.9 using car vibration and turbulent pressure data from Figures 3.1 and 3.3. The straight diagonal line represents the normal distribution. The car vibration data closely follows the normal distribution, while the turbulent pressure data deviates significantly at the two ends of the distribution.