Confidence Intervals

March 29, 2018

When calculating the point estimate of a random variable (such as the mean value) it is possible to determine the probable error in the estimate. This can be done by evaluating a confidence interval, associated with a confidence level, for the estimate. It is important to understand that both the confidence interval and the confidence level go together. A higher confidence level requires a wider confidence interval. A lower confidence level allows a narrower confidence interval.

When summing independent random variables, the mean values and variances add. For example, when summing independent random variables A and B:

(1)   \begin{equation*} \overline{A+B}=\frac{1}{N}\sum(A+B)=\bar{A}+\bar{B} \end{equation*}

Equation 12

(2)   \begin{align*} \sigma_{A+B}^2=\frac{1}{N}\sum(A-\mu_{A}+B-\mu_{B})^2\\ =\frac{1}{N}\sum[(A-\mu_{A})^2+2(A-\mu_{A})(B-\mu_{B})+(B-\mu_{B})^2]\\ =\sigma_{A}^2+0+\sigma_{B}^2 \end{align*}

Equation 13

The result in Eq. 13 can be used to determine the standard deviation of the mean value, x, computed from N random variables, x, with a known standard deviation sx. The sum of the x will have a variance of  Nsx2 and, therefore, a standard deviation of \sqrt{N}s_{x}. Dividing by N to compute the standard deviation of the mean value, x, gives us Equation 14.

(3)   \begin{equation*} \sigma_{\bar{x}}=\frac{\sigma_{x}}{\sqrt{N}} \end{equation*}

Equation 14

This result is used to determine the confidence interval of an estimated mean value, assuming x is Normally distributed. It is important to realize that the confidence interval depends on the probability or confidence level of that interval. This is determined by the probability of the value  being within a certain number of standard deviations σ x from the true mean μ, as shown by the area P in this picture:

p(x)-2

The confidence level, P is plotted in Figure 3.12 as a function of z, the number of standard deviations σ x from the true mean μ. Therefore, a confidence interval of  ±1 σ x   has a confidence level of 68%.  (The 95% confidence interval is ±2 σ x , the 99.7% confidence interval is ±3 σ x , etc.)

[Note from Eq. 14 that  →σ x  as N→∞.]

confidenceintervals-figure-13

Figure 3.12. Confidence levels vs. confidence intervals for the estimated mean value.

Because of the Central Limit Theorem, the Normal distribution is almost always assumed for the estimation of the mean value. Using the example in the previous section of the mean value obtained from an average of four values, the standard deviation of the uniform distribution (-0.5, 0.5) is 0.144.  Assuming a Normal distribution, the 95% confidence interval would be ±0.288. In reality, the 95% confidence interval is 0.24.

When the values of the mean and standard deviation are obtained from a small data sample, the determination of the confidence interval is more complicated. In this case, it is common to use the t-distribution which gives larger confidence intervals for the same confidence level due to the increased uncertainty in the sample distribution.  For N > 3 samples, the variance of the sample mean is increased by a ratio of  (N – 1) / (N – 3)  compared to using Eq. 13.  Figure 3.13 shows the modified number (tvalue) of standard deviations σ x used for the confidence interval for different confidence levels as a function of the number of samples used. Then the estimate for the true mean is:

(4)   \begin{equation*} \mu=\bar{x}\pm t_{value}\sigma_{\bar{x}}=\bar{x}\pm t_{value}\frac{\sigma_{x}}{\sqrt{N}},\text{with the confidence level of }t_{value} \end{equation*}

Equation 15

For N > 50, the t-distribution converges to the Normal distribution and Figure 3.13 can be used.

figure14-confidenceintervals

Figure 3.13. Confidence intervals using the t-distribution for small sample sizes.

It should be noted that the confidence level is often omitted in the statement of the confidence interval. In many cases, a confidence level of 95% is assumed and a confidence interval of approximately ± 2σ  is used. This is true in many opinion polls when a ± % margin of error is appended to the percentage results given. In surveys with N respondents the standard deviation is assumed to be σ = 0.5/N1/2, so a margin of error of ± 4% (with a confidence level of 95%) simply means that approximately N = 1/(0.04)2 = 625 people were surveyed. (4% = 0.04 = 2σ = 1/ N1/2)