Estimating a Percentage with a Confidence Interval
To estimate an unknown percentage $ p $ of a characteristic in a population, you can use the relative frequency $ f $ of that characteristic observed in a sample.
If data is collected from several samples of the same size, the percentage estimate is the arithmetic mean of the observed relative frequencies.
$$ F = \frac{f_1+f_2+...+f_n}{n} $$
The standard error of this estimate is calculated using the formula:
$$ \sigma_F =\sqrt{ \frac{f \cdot (1-f) }{n} } $$
Where $ f $ is the relative frequency observed in the sample, and $ n $ is the sample size.
This error quantifies the uncertainty of the estimate and decreases as the sample size $ n $ increases, since a larger $ n $ reduces uncertainty.
Instead of providing just a single point estimate, it's better to offer a confidence interval, which gives a range within which the true population percentage $ p $ is likely to fall, with a certain level of confidence (e.g., 95% or 99%).
The confidence interval is derived from the standard error and depends on the chosen confidence level.
Let’s assume a 95% confidence level.
We then look up the critical value $ z $ from the normal distribution. For a 95% confidence level, the critical value is $ z = 1.96 $.
Finally, we construct the confidence interval around the estimate.
$$ ( F - z \cdot \sigma_F \ \ , \ \ F + z \cdot \sigma_F ) $$
This gives us a reliable range of values for the estimate.
Note. If the lower bound $ z \cdot \sigma_F $ is negative, it should be considered zero.
A Practical Example
Suppose you want to estimate the percentage of people in a city who own an electric car, but you don't know the exact percentage in the population.
You survey a sample of 200 people and find that 60 of them own an electric car.
The sample frequency \( f \) would be:
$$ f = \frac{60}{200} = 0,30 \text{ (or 30%)} $$
Now, you want to estimate the confidence interval for the percentage of people who own an electric car in the population, using a 95% confidence level.
First, calculate the standard error with the following formula:
$$ \sigma_F = \sqrt{\frac{f \cdot (1 - f)}{n}} $$
In this case, \( f = 0.30 \) is the observed frequency, and \( n = 200 \) is the sample size.
$$ \sigma_F = \sqrt{\frac{0,30 \cdot (1 - 0,30)}{200}} $$
$$ \sigma_F = \sqrt{\frac{0,30 \cdot 0,70}{200}} $$
$$ \sigma_F = \sqrt{\frac{0,21}{200}} $$
$$ \sigma_F = \sqrt{0,00105} $$
$$ \sigma_F \approx 0,0324 $$
To calculate the confidence interval, we use a 95% confidence level, which corresponds to a critical value \( z = 1.96 \).
The confidence interval is given by the formula:
$$ \left( f - z \cdot \sigma_F, f + z \cdot \sigma_F \right) $$
Substitute the values:
$$ \left( 0,30 - 1,96 \cdot 0,0324, 0,30 + 1,96 \cdot 0,0324 \right) $$
$$ \left( 0,30 - 0,0635, 0,30 + 0,0635 \right) $$
$$ \left( 0,2365, 0,3635 \right) $$
So, based on this sample and a 95% confidence level, you can say that the percentage of people in the city who own an electric car is between 23.65% and 36.35%.
The width of the interval, often called the margin, shrinks as the sample size $ n $ increases.
And so on.