The sampling distribution for the difference in two sample means, $\bar{x}_1 - \bar{x}_2$ , is centered at $\mu_1 - \mu_2$ with a standard deviation of $\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$ . It is approximately normal when both population distributions are normal or when both sample sizes are at least 30, which lets you find probabilities about how two group means compare.

Why This Matters for the AP Statistics Exam

This topic sets up the logic behind comparing two groups, which shows up later when you build confidence intervals and run hypothesis tests for the difference of two means. On the exam you may need to find the mean and standard deviation of x̄1 - x̄2, check whether a normal model is reasonable, calculate a probability about the difference, and interpret what that probability means in context. Getting comfortable here makes the two-sample inference work in Unit 4 much clearer.

You work with these ideas through:

Choosing the right formula for the spread of a difference.
Checking the normality conditions before using a normal model.
Calculating and interpreting probabilities with correct units and context.

more resources to help you study

practice multiple choice FRQ practice & scoring cheatsheets score calculator key terms

Key Takeaways

The mean of the sampling distribution is μ(x̄1-x̄2) = μ1 - μ2.
The standard deviation is σ(x̄1-x̄2) = √(σ1²/n1 + σ2²/n2). You add the variances, then take the square root.
Variances add for independent samples even when you are subtracting the means. This is the "Pythagorean Theorem of Statistics" idea.
The model is normal if both populations are normal, or approximately normal if both sample sizes are at least 30 (Central Limit Theorem).
Sampling without replacement makes the true standard deviation slightly smaller, but the difference is negligible when each sample is less than 10% of its population.
Always interpret probabilities and parameters with units and in the context of the two specific populations.

Formulas

To find the standard deviation of the difference in sample means, divide each population variance by its sample size, add those values, then take the square root. Just like with proportions, the "Pythagorean Theorem of Statistics" applies here too: variances add for independent samples, even though you are subtracting the means.

For two independent samples from populations with means μ1 and μ2 and standard deviations σ1 and σ2:

Mean: μ(x̄1-x̄2) = μ1 - μ2
Standard deviation: σ(x̄1-x̄2) = √(σ1²/n1 + σ2²/n2)

If you sample without replacement, the true standard deviation is a bit smaller than this formula gives. As long as each sample is less than 10% of its population, that difference is negligible.

Source: AP Statistics Formula Sheet

Normal Condition: Central Limit Theorem

When you are working with differences between sample means, you can use the sampling distribution of the difference to make inferences about the difference between the population means.

If the two population distributions can be modeled with a normal distribution, then the sampling distribution of the difference in sample means x̄1 - x̄2 can also be modeled with a normal distribution. This lets you use normal-based techniques, such as confidence intervals and hypothesis tests, to compare the two population means based on sample data.

If the two population distributions cannot be modeled with a normal distribution, the sampling distribution of x̄1 - x̄2 can still be approximately normal when both samples are large enough. This is the Central Limit Theorem: the sampling distribution of a sample mean becomes approximately normal as the sample size increases, regardless of the population's shape. So if both samples have sizes of at least 30, you can still use normal-based techniques to compare the two population means.

How to Use This on the AP Statistics Exam

Problem Solving

Identify the two populations and label their means and standard deviations (μ1, σ1, μ2, σ2).
Find the center: μ1 - μ2.
Find the spread: √(σ1²/n1 + σ2²/n2). Square each standard deviation, divide by the matching sample size, add, then take the square root.
Check normality: are both populations normal, or are both sample sizes at least 30?
If a normal model fits, convert to a z-score and find the probability.

Free Response

When a question asks you to describe the sampling distribution, hit all three parts: shape (normal and why), center (μ1 - μ2), and spread (the standard deviation formula with numbers plugged in). Showing the formula and your substitutions is important for clear exam work.

Common Trap

Do not subtract standard deviations or add the standard deviations directly. You combine the variances (σ1²/n1 + σ2²/n2) and only then take the square root.

Practice Problem

Suppose that you are a publisher trying to compare the sales of two different genres of books: romance novels and science fiction novels. You decide to use random samples of 50 romance novels and 50 science fiction novels from your inventory, and you collect data on the number of copies sold for each book. After analyzing the data, you find that the sample mean number of copies sold for romance novels is 500 copies with a standard deviation of 100 copies, and the sample mean number of copies sold for science fiction novels is 400 copies with a standard deviation of 150 copies.

a) Explain what the sampling distribution for the difference in sample means represents and why it is useful in this situation.

b) Suppose that the true population mean number of copies sold for romance novels is actually 550 copies and the true population mean number of copies sold for science fiction novels is actually 450 copies. Describe the shape, center, and spread of the sampling distribution for the difference in sample means in this case.

c) Explain why the Central Limit Theorem applies to the sampling distribution for the difference in sample means in this situation.

d) Discuss one potential source of bias that could affect the results of this study, and explain how it could influence the estimate of the difference in population means.

Answer

a) The sampling distribution for the difference in sample means represents the distribution of possible values for the difference between the sample means if the study were repeated many times. It is useful here because it lets us make inferences about the difference between the population means for the two genres based on the sample data.

b) With true population means of 550 copies (romance) and 450 copies (science fiction), the sampling distribution for the difference in sample means would be approximately normal, centered at 550 - 450 = 100 copies, with a spread that depends on the sample sizes and the variability of the two populations.

c) The Central Limit Theorem applies here because both sample sizes (n1 = 50 and n2 = 50, each greater than 30) are large enough for the distribution of the difference to be approximately normal, even if the populations are not normally distributed.

d) One potential source of bias is self-selection bias, which happens when certain groups are more or less likely to be represented. For example, if romance novel readers are more likely to buy books from certain retailers or to belong to certain book clubs, the romance sample could be biased toward higher sales and overestimate the population mean.

On the other hand, if science fiction readers are more likely to buy books online or to belong to certain online communities, that sample could be biased toward lower sales and underestimate the population mean. Either way, this could distort the estimate of the difference in population means between the two genres.

Common Misconceptions

Adding standard deviations instead of variances. You cannot add σ values directly. Add the variances (σ1²/n1 + σ2²/n2) first, then take the square root.
Subtracting the spreads because you subtract the means. Even though the center is μ1 - μ2, the variability still grows. Variances add for independent samples regardless of subtraction.
Thinking one sample of 30 is enough. For a difference, both sample sizes need to be at least 30 to lean on the Central Limit Theorem when populations are not normal.
Forgetting the independence requirement. This formula assumes the two samples are independent. Paired data uses a different approach, so check the design before applying it.
Skipping context and units. A probability or parameter only means something when you state it with units and tie it to the specific populations being compared.

Vocabulary

The following words are mentioned explicitly in the AP® course framework for this topic.

Term	Definition
difference in sample means	The result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂.
independent populations	Two populations from which samples are drawn such that the selection from one population does not affect the selection from the other.
normal distribution	A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
parameter	A numerical summary that describes a characteristic of an entire population.
population distribution	The distribution of all values of a variable across the entire population.
population mean	The average of all values in an entire population, denoted as μ.
population means	The average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviation	A measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution.
probability	The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
sample mean	The average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sample size	The number of observations or data points collected in a sample, denoted as n.
sampling distribution	The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling with replacement	A sampling method in which an item selected from a population can be selected again in subsequent draws.
sampling without replacement	A sampling method in which an item selected from a population cannot be selected again in subsequent draws.
standard error	The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.

Frequently Asked Questions

What is the sampling distribution of the difference between two sample means?

It is the distribution of values of x-bar1 minus x-bar2 from repeated independent samples. AP Statistics uses it to compare two population means using the center, spread, shape, and context.

What is the mean of the sampling distribution for differences in means?

The mean of the sampling distribution is mu1 minus mu2. In words, the expected difference between sample means equals the difference between the two population means.

What is the standard deviation formula for x-bar1 minus x-bar2?

For independent samples, the standard deviation is the square root of sigma1 squared over n1 plus sigma2 squared over n2. You add the variances, then take the square root.

Why do variances add when means are subtracted?

For independent samples, variability from both sample means contributes to the spread of the difference. That is why you add variances even though the center is a subtraction.

When is the sampling distribution approximately normal?

It is normal if both population distributions can be modeled as normal. If not, it is approximately normal when both sample sizes are at least 30.

How should I describe this distribution on the AP Stats exam?

Give shape, center, spread, and context. Say whether the normal model is justified, state the mean as mu1 minus mu2, use the standard deviation formula, and interpret the result with units for the two populations.