Comparing Two Independent Population Proportions
This technique tests whether two groups differ in the proportion of "successes" they produce. You'll use it whenever you're comparing rates or percentages across two independent samples, such as whether a new drug has a higher cure rate than a placebo, or whether two factories produce different rates of defective parts.
The workflow follows the same hypothesis testing framework you already know: set up hypotheses, check conditions, compute a test statistic, find a p-value, and make a decision. The new piece here is the pooled proportion, which combines both samples under the assumption that the null hypothesis is true.

Hypothesis Tests for Population Proportions
Setting up the hypotheses:
The null hypothesis always assumes no difference between the two population proportions:
Your alternative hypothesis depends on the research question:
- (two-tailed) — you're testing for any difference, higher or lower
- (left-tailed) — you suspect proportion 1 is lower
- (right-tailed) — you suspect proportion 1 is higher
Choosing a significance level:
Set before you collect data or run the test. Common choices are 0.01, 0.05, and 0.10. Remember, is the probability of committing a Type I error (rejecting when it's actually true). A smaller means you need stronger evidence to reject.
Checking conditions:
Before running the test, verify these assumptions so the normal approximation holds:
- The two samples are independent — drawn from separate populations with no overlap or influence on each other.
- The sample sizes are large enough. Check all four of these:
- and
- and This ensures at least 5 expected successes and 5 expected failures in each sample.
Making the decision:
- If p-value (or the test statistic falls beyond the critical value), reject .
- If p-value (or the test statistic does not reach the critical value), fail to reject .

Pooled Proportion and Test Statistic
Here's the step-by-step calculation process:
Step 1: Compute the pooled proportion.
The pooled proportion estimates the common proportion under by combining both samples:
where and are the number of successes, and and are the sample sizes.
Example: Sample 1 has 30 successes out of 100 observations. Sample 2 has 40 successes out of 120. Then .
You use the pooled proportion (rather than the individual sample proportions) in the standard error because you're assuming is true. Under that assumption, both samples come from populations with the same proportion, so pooling gives the best single estimate.
Step 2: Compute the standard error.
The standard error measures how much variability you'd expect in from sample to sample:
Larger sample sizes shrink the SE, giving you more precision and more power to detect real differences.
Step 3: Compute the test statistic.
The z-score tells you how many standard errors the observed difference sits from zero (the value predicts):
Example: If , , and , then .
Under , this test statistic follows a standard normal distribution .
Step 4: Find the p-value.
Use the standard normal distribution:
- Two-tailed: p-value
- Left-tailed: p-value
- Right-tailed: p-value

Interpreting Two-Proportion z-Tests
When you reject :
If the p-value is less than , you conclude there is sufficient evidence of a significant difference between and , in the direction your specifies.
Example: Testing at . You get a p-value of 0.02. Since , reject and conclude there is sufficient evidence that is greater than .
When you fail to reject :
If the p-value is greater than or equal to , you conclude there is not enough evidence to support a significant difference.
Example: Testing at . You get a p-value of 0.07. Since , fail to reject . There is not sufficient evidence of a difference between and .
A critical distinction: failing to reject does not prove that . It only means your data weren't convincing enough to rule out equality.
Contextual interpretation matters. Always tie your conclusion back to the real-world scenario. For instance, if you're comparing defective product rates at two plants and find a significant difference, that signals a need to investigate quality control at the worse-performing plant.
Also consider:
- Practical significance vs. statistical significance. A statistically significant difference might be very small in absolute terms. Evaluate the effect size (the actual difference ) to judge whether the difference matters in practice.
- Study limitations. Non-random sampling, differences in data collection between groups, or confounding variables can all undermine the reliability of your results.
Additional Considerations
- You can construct a confidence interval for to estimate the range of plausible values for the true difference. Note: the CI formula uses individual sample proportions in the standard error, not the pooled proportion (since you're no longer assuming is true).
- A power analysis before collecting data helps you determine the sample size needed to detect a meaningful difference at your chosen .
- The chi-square test for independence is an equivalent alternative when comparing two proportions. For a 2×2 contingency table, the chi-square statistic equals from the two-proportion z-test, and the results will match.