When you want to know whether two groups truly differ on some measurement, you need more than just eyeballing the sample averages. The two-sample z-test gives you a formal way to decide whether an observed difference between two group means is statistically significant or likely just sampling variability. This test applies specifically when the population standard deviations are already known, which is uncommon in practice but forms the foundation for understanding more general two-sample procedures.

more resources to help you study

practice questions

Test Statistic for Two Population Means

The z-test statistic measures how far the observed difference between sample means falls from the hypothesized difference, scaled by the standard error. Here's the formula:

$z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$

Each component:

$\bar{x}_1$ and $\bar{x}_2$ : the sample means from group 1 and group 2
$\mu_1 - \mu_2$ : the hypothesized difference between the population means (often 0, meaning "no difference")
$\sigma_1$ and $\sigma_2$ : the known population standard deviations for each group
$n_1$ and $n_2$ : the sample sizes drawn from each population

The numerator captures how far the observed difference is from what you'd expect under $H_0$ . The denominator is the standard error of the difference, which accounts for variability in both groups and both sample sizes. A larger $|z|$ means the observed difference is harder to explain by chance alone.

Test statistic for population means, Comparing two means – Learning Statistics with R

Sampling Distribution of the Difference in Means

The quantity $\bar{x}_1 - \bar{x}_2$ has its own sampling distribution. When the population standard deviations are known, this distribution is normal (not approximately normal) if both populations are normal. If the populations aren't normal, the Central Limit Theorem still makes the distribution approximately normal as long as both sample sizes are large ( $n_1 \geq 30$ and $n_2 \geq 30$ ).

Two key properties of this distribution:

Mean: $\mu_{\bar{x}_1 - \bar{x}_2} = \mu_1 - \mu_2$
Standard error: $\sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$

These hold as long as the two samples are independent, meaning the selection of individuals in one sample doesn't influence the other.

Test statistic for population means, Estimating the Difference in Two Population Means | Concepts in Statistics

Conducting the Hypothesis Test

Step 1: State the hypotheses.

Null hypothesis: $H_0: \mu_1 - \mu_2 = d_0$ (where $d_0$ is the hypothesized difference, usually 0)
Alternative hypothesis (pick one based on the research question):
- Two-tailed: $H_a: \mu_1 - \mu_2 \neq d_0$
- Left-tailed: $H_a: \mu_1 - \mu_2 < d_0$
- Right-tailed: $H_a: \mu_1 - \mu_2 > d_0$

Step 2: Choose a significance level $\alpha$ (commonly 0.05 or 0.01).

Step 3: Calculate the test statistic using the z-formula above.

Step 4: Find the critical value(s) or p-value.

Test type	Critical value(s)	P-value calculation
Two-tailed	$\pm z_{\alpha/2}$	$2P(Z > \\|z\\|)$
Left-tailed	$-z_{\alpha}$	$P(Z < z)$
Right-tailed	$z_{\alpha}$	$P(Z > z)$

Step 5: Make your decision and interpret.

Critical value approach: Reject $H_0$ if the test statistic falls in the rejection region (beyond the critical value). Otherwise, fail to reject $H_0$ .
P-value approach: Reject $H_0$ if the p-value is less than $\alpha$ . Otherwise, fail to reject $H_0$ .

Always state your conclusion in context. For example: "At the 0.05 significance level, there is sufficient evidence to conclude that the mean test score for School A differs from that of School B."

Worked Example

Suppose you're comparing average SAT math scores between two school districts. You know $\sigma_1 = 40$ and $\sigma_2 = 35$ . You collect samples of $n_1 = 50$ and $n_2 = 45$ students, finding $\bar{x}_1 = 520$ and $\bar{x}_2 = 505$ . Test whether the means differ at $\alpha = 0.05$ .

$H_0: \mu_1 - \mu_2 = 0$ vs. $H_a: \mu_1 - \mu_2 \neq 0$
$\alpha = 0.05$
Calculate the test statistic:

$z = \frac{(520 - 505) - 0}{\sqrt{\frac{40^2}{50} + \frac{35^2}{45}}} = \frac{15}{\sqrt{\frac{1600}{50} + \frac{1225}{45}}} = \frac{15}{\sqrt{32 + 27.22}} = \frac{15}{\sqrt{59.22}} = \frac{15}{7.696} \approx 1.95$

For a two-tailed test at $\alpha = 0.05$ , the critical values are $\pm 1.96$ . The p-value is $2P(Z > 1.95) \approx 2(0.0256) = 0.0512$ .
Since $|z| = 1.95 < 1.96$ (or equivalently, $p\text{-value} = 0.0512 > 0.05$ ), you fail to reject $H_0$ . There is not sufficient evidence at the 0.05 level to conclude the mean SAT math scores differ between the two districts.

Notice how close this was to the boundary. A slightly larger sample or a slightly bigger difference would have tipped the result the other way.

Additional Considerations

Confidence intervals offer a complementary perspective. A $(1 - \alpha) \times 100\%$ confidence interval for $\mu_1 - \mu_2$ is:

$(\bar{x}_1 - \bar{x}_2) \pm z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$

If this interval contains 0 (when testing $d_0 = 0$ ), you'd fail to reject $H_0$ , consistent with the hypothesis test.

Why z and not t? This procedure uses the z-distribution because the population standard deviations are known. When $\sigma_1$ and $\sigma_2$ are unknown and estimated from the samples, you switch to a t-test, which accounts for the extra uncertainty in estimating those parameters. Degrees of freedom become relevant in that t-test setting, not here.