Confidence Intervals and Sample Size for Population Proportions

Confidence intervals for proportions
When you collect sample data, you rarely know the true population proportion . A confidence interval gives you a range of plausible values for based on what you observed in your sample.
The formula for a confidence interval for a population proportion:
- is the sample proportion (number of successes divided by )
- is the critical value from the standard normal distribution, determined by your confidence level
- is the sample size
Common critical values you should know:
| Confidence Level | |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
Conditions that must be met before using this formula:
- Random sample: The data must come from a random sampling method.
- Independence (10% condition): The population must be at least 10 times larger than the sample size (). This ensures that sampling without replacement doesn't meaningfully affect the results.
- Success-failure condition (Large Counts): and . This is the condition that actually justifies using the normal approximation. A generic "" rule does not apply here; what matters is having enough successes and enough failures in your sample.
The Central Limit Theorem is what makes this work: when these conditions are satisfied, the sampling distribution of is approximately normal, centered at with standard deviation .

Interpretation of margin of error
The margin of error (ME) is the "" part of your confidence interval:
It tells you how far your sample proportion could reasonably be from the true population proportion. A smaller margin of error means a more precise estimate. For example, a poll reporting "52% ± 2%" is much more useful than one reporting "52% ± 8%."
The standard error of the sample proportion is the piece inside the margin of error without the :
This measures the typical amount that varies from sample to sample. The margin of error is just the standard error scaled up by .
Three factors control the width of your interval:
- Confidence level: Higher confidence means a larger , which widens the interval. You're casting a wider net to be more sure you've captured .
- Sample size: Larger shrinks the standard error, narrowing the interval. Notice that is under a square root, so you need to quadruple the sample size to cut the margin of error in half.
- Sample proportion: The product is largest when , so proportions near 50% produce the widest intervals.
Interpreting a confidence level correctly: A 95% confidence level means that if you repeated the sampling process many times and built a confidence interval each time, about 95% of those intervals would contain the true population proportion. It does not mean there's a 95% probability that falls in your particular interval.

Sample size for proportion estimates
Before collecting data, you often need to determine how large your sample should be to achieve a desired margin of error . Setting the margin of error formula equal to and solving for :
Here's how to use this in practice:
- Choose your desired confidence level and find the corresponding .
- Choose your desired margin of error (e.g., 0.03 for ± 3%).
- Plug in an estimate for . If you have a prior study or pilot data, use that value. If you have no prior estimate, use . This is the conservative choice because is the maximum value of , so it guarantees your sample will be large enough.
- Always round up to the next whole number. If you get , you need 1068 people.
Example: You want a 95% confidence interval with a margin of error of 4%, and you have no prior estimate of .
You'd need a sample of at least 601 people.
Finite population correction: If your population size is known and your calculated sample size is more than 5% of , apply this adjustment:
This reduces the required sample size because sampling a large fraction of a finite population gives you more information than the standard formula accounts for. For instance, if and your initial calculation gives :
So you'd only need 463 people.
Statistical Inference and Hypothesis Testing
Confidence intervals are one form of statistical inference, which means using sample data to draw conclusions about a population. The other major form is hypothesis testing, where you test a specific claim about a population parameter.
In hypothesis testing for proportions, you formulate a null hypothesis (e.g., ) and an alternative hypothesis, then use sample data to calculate a test statistic and p-value. The p-value tells you how likely your observed result (or something more extreme) would be if the null hypothesis were true.
Confidence intervals and hypothesis tests are closely related: if a hypothesized value of falls outside your 95% confidence interval, you would reject that value at the significance level.