Estimating a Percentage with a Confidence Interval

To estimate an unknown percentage $ p $ of a characteristic in a population, you can use the relative frequency $ f $ of that characteristic observed in a sample.

If data is collected from several samples of the same size, the percentage estimate is the arithmetic mean of the observed relative frequencies.

$$ F = \frac{f_1+f_2+...+f_n}{n} $$

The standard error of this estimate is calculated using the formula:

$$ \sigma_F =\sqrt{ \frac{f \cdot (1-f) }{n} } $$

Where $ f $ is the relative frequency observed in the sample, and $ n $ is the sample size.

This error quantifies the uncertainty of the estimate and decreases as the sample size $ n $ increases, since a larger $ n $ reduces uncertainty.

Instead of providing just a single point estimate, it's better to offer a confidence interval, which gives a range within which the true population percentage $ p $ is likely to fall, with a certain level of confidence (e.g., 95% or 99%).

The confidence interval is derived from the standard error and depends on the chosen confidence level.

Let’s assume a 95% confidence level.

We then look up the critical value $ z $ from the normal distribution. For a 95% confidence level, the critical value is $ z = 1.96 $.

Finally, we construct the confidence interval around the estimate.

$$ ( F - z \cdot \sigma_F \ \ , \ \ F + z \cdot \sigma_F ) $$

This gives us a reliable range of values for the estimate.

Note. If the lower bound $ z \cdot \sigma_F $ is negative, it should be considered zero.

    A Practical Example

    Suppose you want to estimate the percentage of people in a city who own an electric car, but you don't know the exact percentage in the population.

    You survey a sample of 200 people and find that 60 of them own an electric car.

    The sample frequency \( f \) would be:

    $$ f = \frac{60}{200} = 0,30 \text{ (or 30%)} $$

    Now, you want to estimate the confidence interval for the percentage of people who own an electric car in the population, using a 95% confidence level.

    First, calculate the standard error with the following formula:

    $$ \sigma_F = \sqrt{\frac{f \cdot (1 - f)}{n}} $$

    In this case, \( f = 0.30 \) is the observed frequency, and \( n = 200 \) is the sample size.

    $$ \sigma_F = \sqrt{\frac{0,30 \cdot (1 - 0,30)}{200}} $$

    $$ \sigma_F = \sqrt{\frac{0,30 \cdot 0,70}{200}} $$

    $$ \sigma_F = \sqrt{\frac{0,21}{200}} $$

    $$ \sigma_F = \sqrt{0,00105} $$

    $$ \sigma_F \approx 0,0324 $$

    To calculate the confidence interval, we use a 95% confidence level, which corresponds to a critical value \( z = 1.96 \).

    The confidence interval is given by the formula:

    $$ \left( f - z \cdot \sigma_F, f + z \cdot \sigma_F \right) $$

    Substitute the values:

    $$ \left( 0,30 - 1,96 \cdot 0,0324, 0,30 + 1,96 \cdot 0,0324 \right) $$

    $$ \left( 0,30 - 0,0635, 0,30 + 0,0635 \right) $$

    $$ \left( 0,2365, 0,3635 \right) $$

    So, based on this sample and a 95% confidence level, you can say that the percentage of people in the city who own an electric car is between 23.65% and 36.35%.

    The width of the interval, often called the margin, shrinks as the sample size $ n $ increases.

    And so on. 

     
     

    Please feel free to point out any errors or typos, or share suggestions to improve these notes. English isn't my first language, so if you notice any mistakes, let me know, and I'll be sure to fix them.

    FacebookTwitterLinkedinLinkedin
    knowledge base

    Inferential statistics