Normalization in Statistics
In statistics, normalization refers to transforming a variable using the formula $$ z_i = \frac{x_i - \mu}{ \sigma } $$ in order to make it comparable to other variables.
Here, xi is a value from the distribution X that needs to be normalized, μ is the mean of X, and σ is the standard deviation of X.
Normalization is based on the process of standardizing a random variable X or a distribution of values.
The x-values from the distribution are transformed into z-values, commonly known as z-scores.
This process results in a new distribution Z, called the standard normal distribution, which has the following properties:
- The arithmetic mean of Z is zero
- The variance of Z is one
Apart from these changes, the Z distribution maintains the same overall shape as the original X distribution.
Why is normalization useful? Normalizing variables allows you to compare two different distributions. It also helps minimize systematic measurement errors during experiments.
A Practical Example
Let’s consider the following set of values, consisting of n=5 elements:
$$ X = \{ 18 \ , \ 22 \ , \ 24 \ , \ 26 \ , \ 30 \} $$
The arithmetic mean of these values is 24:
$$ \mu_x = \frac{\sum^n_i x_i}{n} =\frac{18+22+24+26+30}{5} = 24 $$
The variance of these values is calculated as follows:
$$ \sigma^2 = \frac{1}{n} \cdot \sum (x_i - \mu)^2 $$
$$ \sigma^2 = \frac{1}{5} \cdot [(18-24)^2 + (22-24)^2 + (24-24)^2 + (26-24)^2 + (30-24)^2 ] $$
$$ \sigma^2 = \frac{1}{5} \cdot [6^2 + 2^2 + 0^2 + (-2)^2 + (-6)^2 ] $$
$$ \sigma^2 = \frac{1}{5} \cdot [36 + 4 + 0 + 4 + 36] $$
$$ \sigma^2 = \frac{1}{5} \cdot 80 $$
$$ \sigma^2 = 16 $$
Thus, the standard deviation is:
$$ \sigma = \sqrt{16} = 4 $$
To normalize the values, we use the formula:
$$ z_i = \frac{x_i - \mu}{ \sigma } $$
Substituting the mean μ=24 and the standard deviation σ=4:
$$ z_i = \frac{x_i - 24}{ 4 } $$
Let’s now calculate the z-scores for the distribution X={18, 22, 24, 26, 30}:
$$ z_1 = \frac{x_1 - 24}{ 4 } = \frac{18 - 24}{ 4 } = \frac{-6}{4} = - 1.5 $$
$$ z_2 = \frac{x_2 - 24}{ 4 } = \frac{22 - 24}{ 4 } = \frac{-2}{4} = -0.5 $$
$$ z_3 = \frac{x_3 - 24}{ 4 } = 0 $$
$$ z_4 = \frac{x_4 - 24}{ 4 } = \frac{2}{4} = 0.5 $$
$$ z_5 = \frac{x_5 - 24}{ 4 } = \frac{6}{4} = 1.5 $$
Thus, we obtain the standardized normal distribution Z:
$$ Z = \{ - 1.5 \ , \ - 0.5 \ , \ 0 \ , \ 0.5 \ , \ 1.5 \} $$
The Z distribution retains the same characteristics as the original X distribution:
$$ X = \{ 18 \ , \ 22 \ , \ 24 \ , \ 26 \ , \ 30 \} $$
However, Z is now centered around a mean of zero and has a variance of one.
And so on.