When you want to test whether two groups truly differ on some measured outcome, and you happen to know the population standard deviations for both groups, you can use a two-sample $z$ -test. This situation is uncommon in practice (you rarely know $\sigma$ ), but it builds the foundation for the more common $t$ -test procedures you'll see next.

The core question: Is the difference we observe between two sample means large enough to conclude the populations themselves differ, or could it just be sampling variability?

Test Statistic for Two Population Means

The test statistic measures how far the observed difference between sample means falls from the hypothesized difference, scaled by the standard error:

$z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$

Here's what each piece represents:

$\bar{x}_1$ and $\bar{x}_2$ : the sample means from group 1 and group 2
$\mu_1 - \mu_2$ : the hypothesized difference between population means under the null hypothesis (usually 0, meaning "no difference")
$\sigma_1$ and $\sigma_2$ : the known population standard deviations
$n_1$ and $n_2$ : the sample sizes for each group

Because the population standard deviations are known, this test statistic follows a standard normal distribution (the $z$ -distribution) when the null hypothesis is true.

How to use the result: Compare your calculated $z$ -value to the critical value set by your significance level. For a two-tailed test at $\alpha = 0.05$ , the critical value is $\pm 1.96$ . If $|z| > 1.96$ , you reject the null hypothesis. You can also compare the $p$ -value to $\alpha$ directly.

Quick example: Suppose you're comparing average test scores between two schools. School A has $\bar{x}_1 = 78$ , $n_1 = 50$ , $\sigma_1 = 10$ . School B has $\bar{x}_2 = 74$ , $n_2 = 45$ , $\sigma_2 = 12$ . Under $H_0: \mu_1 - \mu_2 = 0$ :

Calculate the numerator: $(78 - 74) - 0 = 4$
Calculate the denominator: $\sqrt{\frac{10^2}{50} + \frac{12^2}{45}} = \sqrt{\frac{100}{50} + \frac{144}{45}} = \sqrt{2 + 3.2} = \sqrt{5.2} \approx 2.28$
Compute $z$ : $z = \frac{4}{2.28} \approx 1.75$
Since $|1.75| < 1.96$ , you fail to reject $H_0$ at $\alpha = 0.05$

Test statistic for population means, Hypothesis Test for a Difference in Two Population Means (1 of 2) | Statistics for the Social ...

Sampling Distribution of Mean Differences

The sampling distribution of $\bar{x}_1 - \bar{x}_2$ describes all possible differences between sample means you could get if you repeated the sampling process over and over. Three key properties:

Center: The mean of this distribution equals the true difference between population means: $\mu_{\bar{x}_1 - \bar{x}_2} = \mu_1 - \mu_2$
Spread: The standard deviation of this distribution (the standard error) is: $\sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$
Shape: The distribution is approximately normal when both sample sizes are large (typically $n_1 > 30$ and $n_2 > 30$ ), or when both populations are themselves normally distributed

This all assumes the two samples are independent, meaning the observations in one sample don't influence or relate to the observations in the other.

Test statistic for population means, 10.2 Two Population Means with Known Standard Deviations | Introduction to Statistics

Standard Error of the Difference

The standard error tells you how much the difference $\bar{x}_1 - \bar{x}_2$ tends to vary from sample to sample:

$\sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$

This is the denominator of the $z$ -test statistic, and it plays a critical role. A smaller standard error means your estimate of the difference is more precise, which makes it easier to detect a real difference if one exists. Two things shrink the standard error:

Larger sample sizes ( $n_1$ and $n_2$ in the denominator)
Smaller population standard deviations ( $\sigma_1$ and $\sigma_2$ in the numerator)

This is why researchers try to collect large samples: it tightens the standard error and gives the test more ability to pick up on real effects.

Statistical Considerations

Pooled standard deviation: When you have reason to believe both populations share the same variance ( $\sigma_1^2 = \sigma_2^2$ ), you can pool them into a single estimate. In the known- $\sigma$ case this simplifies the formula, but it requires that equal-variance assumption to hold.
Statistical power: This is the probability of correctly rejecting a false null hypothesis (detecting a real difference when one exists). Power increases with larger sample sizes, larger true effect sizes, and higher significance levels. Low power means you might miss a real difference, so planning adequate sample sizes before collecting data matters.