Fiveable

📊Honors Statistics Unit 10 Review

QR code for Honors Statistics practice questions

10.5 Hypothesis Testing for Two Means and Two Proportions

10.5 Hypothesis Testing for Two Means and Two Proportions

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Comparing Two Population Means and Proportions

Pep mascot
more resources to help you study

Two-sample tests for means vs. proportions

Two-sample hypothesis tests let you compare two independent groups and decide whether the difference you observe is statistically significant or just due to random sampling variability.

Two-sample t-test (for means) compares the means of two independent populations (e.g., average heights of men vs. women).

Assumptions:

  • Both populations are approximately normally distributed (or sample sizes are large enough for the CLT to kick in)
  • The two samples are independent of each other

The test statistic is:

t=(xˉ1xˉ2)(μ1μ2)s12n1+s22n2t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

where:

  • xˉ1,xˉ2\bar{x}_1, \bar{x}_2 = sample means
  • μ1,μ2\mu_1, \mu_2 = hypothesized population means (under H0H_0, the difference μ1μ2\mu_1 - \mu_2 is usually 0)
  • s12,s22s_1^2, s_2^2 = sample variances
  • n1,n2n_1, n_2 = sample sizes

Degrees of freedom can be calculated with the Welch-Satterthwaite formula (your calculator handles this), or you can use the conservative approach: df=min(n1,n2)1df = \min(n_1, n_2) - 1.

Two-sample z-test (for proportions) compares proportions from two independent populations (e.g., defect rates at two factories).

Conditions: both samples need to be large enough that np^5n\hat{p} \geq 5 and n(1p^)5n(1-\hat{p}) \geq 5 for each group.

The test statistic is:

z=(p^1p^2)(p1p2)p^(1p^)(1n1+1n2)z = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}

where:

  • p^1,p^2\hat{p}_1, \hat{p}_2 = sample proportions
  • p1,p2p_1, p_2 = hypothesized population proportions (under H0H_0, typically p1p2=0p_1 - p_2 = 0)
  • p^=x1+x2n1+n2\hat{p} = \frac{x_1 + x_2}{n_1 + n_2} = pooled sample proportion (the overall success rate across both samples combined)
  • x1,x2x_1, x_2 = number of successes in each sample

You use the pooled proportion because the null hypothesis assumes the two populations have the same proportion, so your best estimate of that shared proportion comes from combining both samples.

Both tests can be one-tailed or two-tailed depending on the research question.

Two-sample tests for means vs proportions, Comparing two means – Learning Statistics with R

Confidence intervals for population differences

Confidence intervals give you a range of plausible values for the true difference between two populations. They complement hypothesis tests by showing the size of the difference, not just whether one exists.

CI for the difference between two means:

(xˉ1xˉ2)±tα/2s12n1+s22n2(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \cdot \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

  • tα/2t_{\alpha/2} is the critical value from the t-distribution at your chosen confidence level
  • Degrees of freedom follow the same rule as the t-test above

CI for the difference between two proportions:

(p^1p^2)±zα/2p^1(1p^1)n1+p^2(1p^2)n2(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

  • zα/2z_{\alpha/2} is the critical value from the standard normal distribution (e.g., 1.96 for 95% confidence)
  • Notice that the CI formula does not use the pooled proportion. That's because you're no longer assuming the two proportions are equal (that was a null hypothesis assumption for the test).
Two-sample tests for means vs proportions, Distribution of Differences in Sample Proportions (5 of 5) | Concepts in Statistics

Additional considerations

  • Paired samples: When data points are naturally linked (e.g., before/after measurements on the same subjects), use a paired t-test instead. You compute the differences within each pair and run a one-sample t-test on those differences.
  • Effect size: Measures the magnitude of the difference between groups. A result can be statistically significant but have a tiny effect size, meaning it may not matter in practice.
  • Power analysis: Determines the sample size needed to detect a specific effect size at a given significance level. Larger samples give you more power to detect real differences.

Interpreting Results and Drawing Conclusions

Interpretation of statistical results

P-value interpretation tells you how strong the evidence is against the null hypothesis.

  • A small p-value (typically < 0.05) means the observed difference would be unlikely if H0H_0 were true. You reject H0H_0 and conclude there's a significant difference. Example: p = 0.003 for a height comparison suggests men and women really do differ in average height.
  • A large p-value (≥ 0.05) means you don't have enough evidence to reject H0H_0. This does not prove the populations are the same; it just means you can't rule out that the difference is due to chance.

Confidence interval interpretation offers another way to assess significance.

  • If the CI for the difference does not contain 0, the difference is significant at that confidence level. Example: a 95% CI of (2.1 cm, 5.8 cm) for the difference in mean heights doesn't include 0, so the difference is significant.
  • If the CI does contain 0, you cannot conclude there's a significant difference.

The p-value approach and the confidence interval approach will always agree at the same significance/confidence level. If your 95% CI excludes 0, your two-tailed test at α=0.05\alpha = 0.05 will reject H0H_0, and vice versa.

Statistical significance vs. practical significance is a distinction you should always make when drawing conclusions.

  • Statistical significance tells you the difference is unlikely to be from chance alone.
  • Practical significance asks whether the difference actually matters in context. A study with thousands of participants might find a statistically significant difference of 0.3 cm in average height. That's real, but probably irrelevant for any practical purpose.

When writing your conclusion, tie it back to the original research question and note any limitations: Were the samples truly random? Could confounding variables explain the result? Is the sample representative of the population you care about?