Statistical Analysis

June 2, 2021

Back to: Fundamentals of Signal Processing

Statistical analysis takes a random variable and conveys its likely characteristics.

Specifically, it characterizes random sample(s) drawn from a probability space (Ω, E, P) in terms of non-random statistics. Often, these statistics are defined in terms of the expectation operator E.

The expectation operator creates a map from a random variable to a specific number like 2.7 or 5. The statistics can be calculated analytically if the probability function P in (Ω, E, P) is known. However, in the real world, P usually isn’t known. In such cases, statistics may be estimated with various techniques.

Here are several takeaways about statistical analysis:

Statistical analysis is performed on a random variable or random process.
A random variable turns an experimental outcome into a number.

Random & Non-random Numbers

Random variables in the form of numbers allow engineers to perform mathematical operations on random variables. However, a mathematical operation on a random number is still a random number.

Before a decision can be made, engineers require a non-random number (statistic) that tells them about the underlying physical system that generated the random number. Typically, the operation that maps a random number (variable) to a non-random number is the expectation operator E.

Expectation Operator E

The expectation operator E is at the foundation of many statistics and statistical analyses. The expectation EX of a random variable X is the sum of each possible value of X weighted by the probability of the value occurring.

The expected value EX of a random variable X is:

(1) $\begin{equation*} \text{EX}=\sum\nolimits_{n}\text{x}_{n}\text{P}(\text{x}_{n}) \end{equation*}$

where […, x₁, x₂, x₃, x₄, x₅, …] are the possible values of X and P(x_n) is the probability that the value x_n will occur.

The expected value EX tells an engineer what to expect from X.

Example

If X is a random variable that represents the sum of a pair of dice, then the expected value EX of X is:

EX = 2(1/36) + 3(2/36) + 4(3/36) + … + 12(1/36) = 7

Therefore, we might expect a pair of dice to come up with a sum of 7.

Moments of Single Random Variables

Using the expectation operator, we can define some common statistics, or moments, of a single random variable X.

The nth moment of X is E[Xⁿ].
The 1st moment is the mean or μ.
The nth absolute moment of X is E[|X|ⁿ]. E[|X|²] is closely related to the Welch estimate of the PSD.

The nth central moment of X is E[(X-μ)ⁿ].

- E[(X-μ)²] is the variance of X or σ².
- E[(X-μ)³] is the skewness of X.
- E[(X-μ)⁴] is the kurtosis of X.

The nth absolute central moment of X is E[|X-μ|ⁿ].

Example: Moment Statistics

Using the dice pair example above:

(2) $\begin{equation*} \mu=E\lceil X\rceil=7 \end{equation*}$

(3) $\begin{equation*} \sigma^2=E\lceil(X-\mu)^2\rceil \end{equation*}$

(4) $\begin{equation*} \begin{split} =(2-\mu)^2(1/36)+(3-\mu)^2(2/36)+(4-\mu)^2(3/36)+(5-\mu)^2(4/36) \\ +(6-\mu)^2(5/36)+(7-\mu)^2(6/36)+(8-\mu)^2(5/36)+(9-\mu)^2(4/36) \\ +(10-\mu)^2(3/36)+(11-\mu)^2(2/36)+(12-\mu)^2(1/36) \end{split} \end{equation*}$

(5) $\begin{equation*} =\frac{(25\times1)+(16\times2)+(9\times3)+(4\times4)+(1\times5)+(0\times6)+(1\times5)+(4\times4)+(9\times3)+(16\times2)+(25\times1)}{36} \end{equation*}$

(6) $\begin{equation*} =210/36\approx5.83 \end{equation*}$

In this example, μ and σ² were calculated from the known probability function P. However, in most real-world applications, P is not known. In this case, the common solution is to take many samples and estimate statistics from them.

Arithmetic Mean

The arithmetic mean can be used when estimating the mean μ=E[X]. The arithmetic mean of the samples x₁, x₂, x₃, …, x_N; is defined as:

(7) $\begin{equation*} (1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}x_{n} \end{equation*}$

Is the arithmetic mean a good estimate of the true mean E[X]? The answer depends on how large N is and if the device that generated the samples is stable over time (if the process is wide-sense stationary or WSS).

If N is large and the process is WSS, then the arithmetic mean is, informally, a good estimate of E[X]. For example, the engineer can say the estimate is unbiased.

(1/N) times the sum of N random variables is a random variable:

(8) $\begin{equation*} \text{Y}=(1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}\text{X}_{n} \end{equation*}$

The expected value of Y is:

(9) $\begin{equation*} \begin{split} \text{EY}=\text{E}(1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}\text{X}_{n} \\ =(1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}\text{EX}_{n} \end{split} \end{equation*}$

If the process is wide-sense stationary, then EX_n are all the same and:

(10) $\begin{equation*} \begin{split} \text{EY}=\text{E}(1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}\text{X}_{n} \\ =(1/\text{N})\sum\nolimits_{\text{n}=1...\text{N}}\text{EX}_{n} \\ =(1/\text{N})\text{N EX}=\text{EX}=\mu \end{split} \end{equation*}$

The expected value of the arithmetic mean is the true mean. This means that the arithmetic mean is an unbiased estimate of the true mean.