Estimating a Percentage with a Confidence Interval

To estimate an unknown percentage $ p $ of a characteristic in a population, you can use the relative frequency $ f $ of that characteristic observed in a sample.

If data is collected from several samples of the same size, the percentage estimate is the arithmetic mean of the observed relative frequencies.

$$ F = \frac{f_1+f_2+...+f_n}{n} $$

The standard error of this estimate is calculated using the formula:

$$ \sigma_F =\sqrt{ \frac{f \cdot (1-f) }{n} } $$

Where $ f $ is the relative frequency observed in the sample, and $ n $ is the sample size.

This error quantifies the uncertainty of the estimate and decreases as the sample size $ n $ increases, since a larger $ n $ reduces uncertainty.

Instead of providing just a single point estimate, it's better to offer a confidence interval, which gives a range within which the true population percentage $ p $ is likely to fall, with a certain level of confidence (e.g., 95% or 99%).

The confidence interval is derived from the standard error and depends on the chosen confidence level.

Let’s assume a 95% confidence level.

We then look up the critical value $ z $ from the normal distribution. For a 95% confidence level, the critical value is $ z = 1.96 $.

Finally, we construct the confidence interval around the estimate.

$$ ( F - z \cdot \sigma_F \ \ , \ \ F + z \cdot \sigma_F ) $$

This gives us a reliable range of values for the estimate.

Note. If the lower bound $ z \cdot \sigma_F $ is negative, it should be considered zero.

A Practical Example

Suppose you want to estimate the percentage of people in a city who own an electric car, but you don't know the exact percentage in the population.

You survey a sample of 200 people and find that 60 of them own an electric car.

The sample frequency $ f $ would be:

$$ f = \frac{60}{200} = 0,30 \text{ (or 30%)} $$

Now, you want to estimate the confidence interval for the percentage of people who own an electric car in the population, using a 95% confidence level.

First, calculate the standard error with the following formula:

$$ \sigma_F = \sqrt{\frac{f \cdot (1 - f)}{n}} $$

In this case, $ f = 0.30 $ is the observed frequency, and $ n = 200 $ is the sample size.

$$ \sigma_F = \sqrt{\frac{0,30 \cdot (1 - 0,30)}{200}} $$

$$ \sigma_F = \sqrt{\frac{0,30 \cdot 0,70}{200}} $$

$$ \sigma_F = \sqrt{\frac{0,21}{200}} $$

$$ \sigma_F = \sqrt{0,00105} $$

$$ \sigma_F \approx 0,0324 $$

To calculate the confidence interval, we use a 95% confidence level, which corresponds to a critical value $ z = 1.96 $.

The confidence interval is given by the formula:

$$ \left( f - z \cdot \sigma_F, f + z \cdot \sigma_F \right) $$

Substitute the values:

$$ \left( 0,30 - 1,96 \cdot 0,0324, 0,30 + 1,96 \cdot 0,0324 \right) $$

$$ \left( 0,30 - 0,0635, 0,30 + 0,0635 \right) $$

$$ \left( 0,2365, 0,3635 \right) $$

So, based on this sample and a 95% confidence level, you can say that the percentage of people in the city who own an electric car is between 23.65% and 36.35%.

The width of the interval, often called the margin, shrinks as the sample size $ n $ increases.

And so on.