A confidence interval for a population proportion gives you a range of plausible values for the true proportion based on sample data. This is how statisticians move from "here's what we found in our sample" to "here's what we think is true about the whole population." This section focuses on building and interpreting these intervals, especially in contexts like estimating the proportion of residents born outside a country.

Confidence Intervals for Population Proportions

The sample proportion ( $\hat{p}$ ) is your point estimate for the true population proportion. On its own, though, a single number doesn't tell you how precise your estimate is. A confidence interval adds that context by giving a range of values that likely contains the true proportion, at a specified confidence level (typically 90%, 95%, or 99%).

The formula for a confidence interval for a proportion:

$\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

Here's what each piece means:

$\hat{p}$ is the sample proportion (number of "successes" divided by sample size)
$n$ is the sample size
$z^*$ is the critical value from the standard normal distribution, determined by your confidence level (1.645 for 90%, 1.96 for 95%, 2.576 for 99%)
$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ is the standard error of the sample proportion, which measures how much $\hat{p}$ would vary from sample to sample

The $z^* \times \text{standard error}$ portion is called the margin of error, and it's half the total width of the interval.

Interpretation matters. A 95% confidence interval means: if you repeated this sampling process many times, about 95% of the resulting intervals would contain the true population proportion. It does not mean there's a 95% probability the true proportion is in this specific interval.

Confidence intervals for population proportions, Estimating a Population Proportion (2 of 3) | Concepts in Statistics

Effects on Confidence Interval Width

Two main factors control how wide your interval is:

Sample size ( $n$ ):

Larger $n$ produces a narrower interval because the standard error shrinks ( $n$ is in the denominator under the square root)
Smaller $n$ produces a wider interval, meaning less precision

Confidence level:

Higher confidence (like 99%) uses a larger $z^*$ , which makes the interval wider. You're casting a wider net to be more sure you've captured the true value.
Lower confidence (like 90%) uses a smaller $z^*$ , giving a narrower interval but less certainty.

These two factors create a trade-off. If you want to keep a narrow interval and increase your confidence level, you need to increase your sample size. There's no free lunch: more precision at higher confidence requires more data.

Confidence intervals for population proportions, Introduction to Estimate the Difference Between Population Proportions | Concepts in Statistics

Inferences from Confidence Intervals

Confidence intervals let you draw conclusions about populations using sample data. A common application is estimating the proportion of a population with a particular characteristic, such as place of birth.

Example: Suppose you want to estimate the proportion of a city's residents who were born outside the country.

Collect a random sample of residents (say $n = 400$ ).
Count how many were born outside the country. If 92 out of 400 were, then $\hat{p} = 92/400 = 0.23$ .
Choose a confidence level (say 95%, so $z^* = 1.96$ ).
Calculate the interval: $0.23 \pm 1.96\sqrt{\frac{0.23(0.77)}{400}} = 0.23 \pm 0.041$ , giving roughly $(0.189,\ 0.271)$ .
Interpret: You're 95% confident that the true proportion of the city's residents born outside the country is between 18.9% and 27.1%.

Comparing two groups: You can also compare proportions between populations by calculating separate confidence intervals. If the intervals for two cities don't overlap, that suggests a meaningful difference in their proportions of foreign-born residents. Keep in mind, though, that overlapping intervals don't automatically mean the proportions are equal; a formal two-proportion test is more reliable for that comparison.

Statistical Foundations

These confidence intervals rely on the sampling distribution of $\hat{p}$ being approximately normal. That approximation works well when the sample is large enough. The standard rule of thumb is that both $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ should hold. When these conditions aren't met (small samples or proportions very close to 0 or 1), the normal approximation breaks down and other methods may be needed.

Confidence intervals and hypothesis testing are closely connected. If a hypothesized value falls outside your confidence interval, you'd reject it at the corresponding significance level. They're two sides of the same coin.