When you want to know whether two groups genuinely differ in some proportion (like the percentage of customers who click an ad on two different websites), you need a formal way to test that. A two-proportion z-test lets you determine whether the difference you observe in your samples reflects a real difference in the populations, or whether it could just be random variation.

Hypothesis Tests for Population Proportions

Setting Up Hypotheses

The null hypothesis always assumes the two population proportions are equal:

$H_0: p_1 = p_2$

Your alternative hypothesis depends on the research question:

Two-tailed: $H_a: p_1 \neq p_2$ (you're just checking if they differ in either direction)
Left-tailed: $H_a: p_1 < p_2$ (you suspect group 1's proportion is lower)
Right-tailed: $H_a: p_1 > p_2$ (you suspect group 1's proportion is higher)

Checking Conditions for Inference

Before running the test, you need to verify three conditions. If these aren't met, the test results can't be trusted.

Random sampling: Both samples must be selected randomly and independently of each other. This prevents bias from creeping in.
Independence (10% condition): Each sample size should be no more than 10% of its population. Formally: $n_1 \leq 0.1N_1$ and $n_2 \leq 0.1N_2$ . This ensures that sampling without replacement doesn't distort your results.
Success-failure condition (Large Counts): You need at least 10 expected successes and 10 expected failures in each sample so the normal approximation works. When checking this condition for a hypothesis test, use the pooled proportion (described below) rather than the individual sample proportions, since you're assuming $H_0$ is true:
- $n_1 \hat{p}_{\text{pooled}} \geq 10$ and $n_1(1 - \hat{p}_{\text{pooled}}) \geq 10$
- $n_2 \hat{p}_{\text{pooled}} \geq 10$ and $n_2(1 - \hat{p}_{\text{pooled}}) \geq 10$

Hypothesis tests for population proportions, Hypothesis Testing (3 of 5) | Concepts in Statistics

Choosing a Significance Level

Pick your significance level $\alpha$ before collecting data. Common choices:

$\alpha = 0.05$ (most common in intro courses)
$\alpha = 0.01$ (more conservative, harder to reject $H_0$ )
$\alpha = 0.10$ (more liberal, easier to reject $H_0$ )

The significance level $\alpha$ is the probability of committing a Type I error, which means rejecting $H_0$ when it's actually true.

Pooled Proportion and Test Statistic

Here's the step-by-step calculation process:

Step 1: Calculate each sample proportion.

$\hat{p}_1 = \frac{x_1}{n_1}$ and $\hat{p}_2 = \frac{x_2}{n_2}$

where $x_1$ and $x_2$ are the number of successes, and $n_1$ and $n_2$ are the sample sizes.

Step 2: Calculate the pooled proportion.

$\hat{p}_{\text{pooled}} = \frac{x_1 + x_2}{n_1 + n_2}$

You pool because under $H_0$ you're assuming both groups share the same true proportion. Combining the samples gives you the best single estimate of that shared proportion.

Step 3: Calculate the standard error.

$SE = \sqrt{\hat{p}_{\text{pooled}}(1 - \hat{p}_{\text{pooled}})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$

This measures how much variability you'd expect in the difference $\hat{p}_1 - \hat{p}_2$ from sample to sample.

Step 4: Calculate the z-test statistic.

$z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE}$

The numerator uses 0 because $H_0$ claims $p_1 - p_2 = 0$ . This test statistic tells you how many standard errors the observed difference is from zero.

This is a z-test, not a t-test. There are no degrees of freedom here. You compare your z-statistic to the standard normal distribution. The degrees of freedom formula $df = n_1 + n_2 - 2$ applies to two-sample t-tests for means, not proportions.

Worked Example

Suppose 120 out of 400 users clicked an ad on Website A, and 90 out of 350 clicked on Website B. Test whether the click rates differ at $\alpha = 0.05$ .

$\hat{p}_1 = \frac{120}{400} = 0.30$ , $\hat{p}_2 = \frac{90}{350} \approx 0.257$
$\hat{p}_{\text{pooled}} = \frac{120 + 90}{400 + 350} = \frac{210}{750} = 0.28$
$SE = \sqrt{0.28(0.72)\left(\frac{1}{400} + \frac{1}{350}\right)} = \sqrt{0.2016(0.00536)} \approx 0.0329$
$z = \frac{0.30 - 0.257}{0.0329} \approx 1.31$

With a two-tailed test at $\alpha = 0.05$ , the critical values are $\pm 1.96$ . Since 1.31 falls between them, you fail to reject $H_0$ . There isn't sufficient evidence that the click rates differ.

Hypothesis tests for population proportions, Distribution of Differences in Sample Proportions (5 of 5) | Concepts in Statistics

Interpreting Two-Proportion Test Results

Making the Decision

You have two equivalent approaches:

Critical value method: Reject $H_0$ if the z-statistic falls in the rejection region (beyond the critical value). Otherwise, fail to reject.
P-value method: Reject $H_0$ if the p-value is less than $\alpha$ . Otherwise, fail to reject.

Writing Your Conclusion

Always state your conclusion in context. Don't just say "reject" or "fail to reject."

If you reject $H_0$ : "There is sufficient evidence at the $\alpha = 0.05$ level to conclude that the proportion of users who click the ad differs between Website A and Website B."
If you fail to reject $H_0$ : "There is not sufficient evidence at the $\alpha = 0.05$ level to conclude that the proportion of users who click the ad differs between Website A and Website B."

Notice: failing to reject $H_0$ does not mean the proportions are equal. It means you don't have enough evidence to say they're different.

Statistical vs. Practical Significance

A result can be statistically significant but not practically meaningful. For example, if a drug reduces infection rates from 5.0% to 4.8%, that difference might be statistically significant with a large enough sample, but a 0.2 percentage point improvement may not justify the cost of switching treatments.

Always consider the magnitude of the difference alongside the p-value. Statistical significance tells you the difference is probably real; practical significance tells you whether it matters.

Additional Considerations

Confidence intervals for $p_1 - p_2$ complement hypothesis tests by giving a range of plausible values for the true difference. If the interval doesn't contain 0, that's consistent with rejecting $H_0$ .
Type II error occurs when you fail to reject $H_0$ even though the proportions truly are different. Its probability is denoted $\beta$ .
Power ( $1 - \beta$ ) is the probability of correctly rejecting a false $H_0$ . Larger sample sizes and larger true differences both increase power.