The sampling distribution for the difference in sample proportions (p̂₁ − p̂₂) is the distribution of all possible differences between proportions from two independent random samples. Its mean is p₁ − p₂, its standard deviation is √(p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂), and it's approximately normal when all four np counts are at least 10.
Imagine taking a random sample from two separate populations, computing the proportion of "successes" in each, and subtracting one from the other. Now imagine repeating that over and over. The pattern those differences make is the sampling distribution of p̂₁ − p̂₂. It tells you which differences are typical and which would be surprising if you only knew the true population proportions p₁ and p₂.
The CED gives you three things to nail down (LO 5.6.A). The center of this distribution is exactly p₁ − p₂, meaning the difference in sample proportions is an unbiased estimator of the difference in population proportions. The spread is σ(p̂₁−p̂₂) = √(p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂). Notice you ADD the two variance pieces under the radical even though you're subtracting proportions, because combining two random quantities adds uncertainty either way. Finally, the shape is approximately normal as long as all four counts (n₁p₁, n₁(1−p₁), n₂p₂, and n₂(1−p₂)) are at least 10 (LO 5.6.B). One more wrinkle from the CED: that standard deviation formula assumes sampling with replacement. If you sample without replacement, it slightly overstates the spread, but the difference is negligible when each sample is less than 10% of its population.
This is Topic 5.6, the capstone of Unit 5 (Sampling Distributions) for categorical data. It directly supports three learning objectives. AP Stats 5.6.A asks you to compute the mean and standard deviation of the distribution, AP Stats 5.6.B asks you to check whether it's approximately normal, and AP Stats 5.6.C asks you to interpret those parameters in context with units and the specific populations named. Beyond Unit 5, this distribution is the engine behind every two-proportion confidence interval and significance test in Unit 6. When you eventually compute a z-statistic comparing two proportions, you are standing on this sampling distribution. If you can't describe its center, spread, and shape, the inference procedures later won't make sense, you'll just be plugging into formulas.
Keep studying AP Statistics Unit 5
Sample Proportion (Unit 5)
This whole topic is the one-sample p̂ distribution from Topic 5.5, doubled. Each p̂ has its own mean and variance, and subtracting them gives a new distribution whose variance is the SUM of the two individual variances. If you understand one p̂, you understand p̂₁ − p̂₂.
Central Limit Theorem (Unit 5)
The CLT is why the normal approximation works here at all. Each sample proportion is really a sample mean of 0s and 1s, so with large enough samples each p̂ is approximately normal, and the difference of two independent normals is also normal.
Standard Error (Units 5-6)
The formula in Topic 5.6 uses the true p₁ and p₂, which you almost never know in real life. In Unit 6 you swap in p̂₁ and p̂₂ and the result gets renamed the standard error. Same formula, estimated inputs, new name.
Categorical Variable (Unit 1)
Proportions only exist for categorical data, like "exercises regularly: yes or no." If the variable is quantitative, you compare means instead, which sends you to Topic 5.8 and a totally different set of conditions.
On the AP exam this shows up two main ways. Multiple-choice questions hand you a comparison scenario, like the proportion of adults who exercise regularly in City A versus City B with sample sizes such as 400 and 300, then ask you to find the mean or standard deviation of p̂₁ − p̂₂, compute a probability using the normal model, or pick out the scenario where the distribution is NOT approximately normal (that one is a pure large-counts check: find the setup where some np or n(1−p) falls below 10). On free-response, this content usually appears inside two-proportion inference problems, where checking the normality conditions and interpreting the standard deviation in context are graded steps. No released FRQ asks about this sampling distribution in isolation, but the skills it builds (state the parameters, verify the conditions, interpret in context) are exactly what two-proportion FRQs reward.
The one-sample version (Topic 5.5) describes how p̂ varies around p with standard deviation √(p(1−p)/n). The difference version stacks two of those together. The classic mistake is subtracting the variability because you're subtracting proportions. Wrong. Variances add: σ² for the difference is p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂. Two sources of sampling variability mean more total uncertainty, never less. Also remember you now need four large-counts checks instead of two.
The mean of the sampling distribution of p̂₁ − p̂₂ is exactly p₁ − p₂, so the difference in sample proportions is an unbiased estimator of the difference in population proportions.
The standard deviation is √(p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂), and you add the two variance terms under the radical even though the statistic is a difference.
The distribution is approximately normal only when all four counts are at least 10: n₁p₁, n₁(1−p₁), n₂p₂, and n₂(1−p₂).
If you sample without replacement, the formula slightly overstates the standard deviation, but the difference is negligible when each sample is less than 10% of its population.
Interpretations earn credit only with context, so always name the populations, the variable, and use "difference in proportions" language rather than bare numbers.
This distribution is the foundation for two-proportion z-intervals and z-tests in Unit 6, where the same formula reappears as the standard error.
It's the distribution of all possible values of p̂₁ − p̂₂ from repeated independent random samples of two populations. It's centered at p₁ − p₂, has standard deviation √(p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂), and is approximately normal when all four np-type counts are at least 10. This is Topic 5.6 in AP Stats.
No, and this is the most common error. You add the variances, then take the square root. Combining two independent random samples increases total uncertainty whether you add or subtract the statistics, so the spread of the difference is always larger than either sample's spread alone.
The one-sample distribution (Topic 5.5) tracks how a single p̂ varies around p with standard deviation √(p(1−p)/n). The two-sample version combines two of those, so its variance is the sum of both and it requires four large-counts checks instead of two (n₁p₁, n₁(1−p₁), n₂p₂, n₂(1−p₂), all at least 10).
When all four expected counts are at least 10. For example, with n₁ = 400 and p₁ = 0.45, you check 400(0.45) = 180 and 400(0.55) = 220, then repeat for the second sample. If any one count falls below 10, the normal approximation fails. MCQs love hiding one failing condition in a list of scenarios.
Not quite. The Topic 5.6 formula uses the true population proportions p₁ and p₂, which gives the actual standard deviation. In Unit 6 inference you don't know those, so you plug in p̂₁ and p̂₂, and the estimated version is called the standard error.