Normalization in Statistics

In statistics, normalization refers to transforming a variable using the formula $$ z_i = \frac{x_i - \mu}{ \sigma } $$ in order to make it comparable to other variables.

Here, xi is a value from the distribution X that needs to be normalized, μ is the mean of X, and σ is the standard deviation of X.

Normalization is based on the process of standardizing a random variable X or a distribution of values.

The x-values from the distribution are transformed into z-values, commonly known as z-scores.

This process results in a new distribution Z, called the standard normal distribution, which has the following properties:

  • The arithmetic mean of Z is zero
  • The variance of Z is one

Apart from these changes, the Z distribution maintains the same overall shape as the original X distribution.

Why is normalization useful? Normalizing variables allows you to compare two different distributions. It also helps minimize systematic measurement errors during experiments.

    A Practical Example

    Let’s consider the following set of values, consisting of n=5 elements:

    $$ X = \{ 18 \ , \ 22 \ , \ 24 \ , \ 26 \ , \ 30 \} $$

    The arithmetic mean of these values is 24:

    $$ \mu_x = \frac{\sum^n_i x_i}{n} =\frac{18+22+24+26+30}{5} = 24 $$

    The variance of these values is calculated as follows:

    $$ \sigma^2 = \frac{1}{n} \cdot \sum (x_i - \mu)^2 $$

    $$ \sigma^2 = \frac{1}{5} \cdot [(18-24)^2 + (22-24)^2 + (24-24)^2 + (26-24)^2 + (30-24)^2 ] $$

    $$ \sigma^2 = \frac{1}{5} \cdot [6^2 + 2^2 + 0^2 + (-2)^2 + (-6)^2 ] $$

    $$ \sigma^2 = \frac{1}{5} \cdot [36 + 4 + 0 + 4 + 36] $$

    $$ \sigma^2 = \frac{1}{5} \cdot 80 $$

    $$ \sigma^2 = 16 $$

    Thus, the standard deviation is:

    $$ \sigma = \sqrt{16} = 4 $$

    To normalize the values, we use the formula:

    $$ z_i = \frac{x_i - \mu}{ \sigma } $$

    Substituting the mean μ=24 and the standard deviation σ=4:

    $$ z_i = \frac{x_i - 24}{ 4 } $$

    Let’s now calculate the z-scores for the distribution X={18, 22, 24, 26, 30}:

    $$ z_1 = \frac{x_1 - 24}{ 4 } = \frac{18 - 24}{ 4 } = \frac{-6}{4} = - 1.5 $$

    $$ z_2 = \frac{x_2 - 24}{ 4 } = \frac{22 - 24}{ 4 } = \frac{-2}{4} = -0.5 $$

    $$ z_3 = \frac{x_3 - 24}{ 4 } = 0 $$

    $$ z_4 = \frac{x_4 - 24}{ 4 } = \frac{2}{4} = 0.5 $$

    $$ z_5 = \frac{x_5 - 24}{ 4 } = \frac{6}{4} = 1.5 $$

    Thus, we obtain the standardized normal distribution Z:

    $$ Z = \{ - 1.5 \ , \ - 0.5 \ , \ 0 \ , \ 0.5 \ , \ 1.5 \} $$

    The Z distribution retains the same characteristics as the original X distribution:

    $$ X = \{ 18 \ , \ 22 \ , \ 24 \ , \ 26 \ , \ 30 \} $$

    However, Z is now centered around a mean of zero and has a variance of one.

    And so on.

     
     

    Please feel free to point out any errors or typos, or share suggestions to improve these notes. English isn't my first language, so if you notice any mistakes, let me know, and I'll be sure to fix them.

    FacebookTwitterLinkedinLinkedin
    knowledge base

    Statistics