Fiveable

๐Ÿ“ŠAP Statistics Unit 5 Review

QR code for AP Statistics practice questions

5.6 Sampling Distributions for Differences in Sample Proportions

5.6 Sampling Distributions for Differences in Sample Proportions

Written by the Fiveable Content Team โ€ข Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examโ€ขWritten by the Fiveable Content Team โ€ข Last updated June 2026
๐Ÿ“ŠAP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

When you compare two groups by subtracting their sample proportions, the result p^1โˆ’p^2\hat{p}_1 - \hat{p}_2 has its own sampling distribution. Its center is the true difference p1โˆ’p2p_1 - p_2, its standard deviation is p1(1โˆ’p1)n1+p2(1โˆ’p2)n2\sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}, and it is approximately normal when all four large-counts checks pass.

Why This Matters for the AP Statistics Exam

This topic is the bridge between single-proportion sampling distributions and the two-sample proportion inference you will do later in Unit 6. Before you can build a confidence interval or run a test for the difference between two population proportions, you need to know the center, spread, and shape of the distribution of pฬ‚โ‚ - pฬ‚โ‚‚.

On the exam you may be asked to find these parameters, check whether the normal model applies, calculate a probability for an observed difference, or interpret what the distribution means in context. Showing the formula, your large-counts checks, and a contextual interpretation is important for clear exam work.

Key Takeaways

  • The mean of the distribution of pฬ‚โ‚ - pฬ‚โ‚‚ is the difference in population proportions: ฮผ(pฬ‚โ‚-pฬ‚โ‚‚) = pโ‚ - pโ‚‚.
  • The standard deviation is ฯƒ(pฬ‚โ‚-pฬ‚โ‚‚) = โˆš(pโ‚(1-pโ‚)/nโ‚ + pโ‚‚(1-pโ‚‚)/nโ‚‚). Variances add, then take the square root.
  • The model is approximately normal only when all four counts are large: nโ‚pโ‚ โ‰ฅ 10, nโ‚(1-pโ‚) โ‰ฅ 10, nโ‚‚pโ‚‚ โ‰ฅ 10, nโ‚‚(1-pโ‚‚) โ‰ฅ 10.
  • For proportions you check the large-counts (success-failure) condition, not the Central Limit Theorem. CLT applies to means.
  • The two samples must come from two independent populations.
  • When sampling without replacement, the true standard deviation is slightly smaller, but the difference is negligible if each sample is less than 10% of its population.

How the Distribution Works

The phrase "variances add" is the key to all difference distributions. Even though you subtract the two sample proportions, you add their variances before taking a square root for the standard deviation.

Center (mean):

ฮผ(pฬ‚โ‚-pฬ‚โ‚‚) = pโ‚ - pโ‚‚

Spread (standard deviation):

ฯƒ(pฬ‚โ‚-pฬ‚โ‚‚) = โˆš(pโ‚(1-pโ‚)/nโ‚ + pโ‚‚(1-pโ‚‚)/nโ‚‚)

Each piece p(1-p)/n is the variance of one group's sample proportion. You add those two variances, then square root the total. Some people call this the "Pythagorean Theorem of statistics" because you combine two squared pieces under a single square root.

Shape: The distribution of pฬ‚โ‚ - pฬ‚โ‚‚ is approximately normal when all four of these are met:

  • nโ‚pโ‚ โ‰ฅ 10
  • nโ‚(1 - pโ‚) โ‰ฅ 10
  • nโ‚‚pโ‚‚ โ‰ฅ 10
  • nโ‚‚(1 - pโ‚‚) โ‰ฅ 10

If any expected count falls below 10, the distribution can be skewed and the normal model may not be safe.

Difference in Sample Proportions Formulas

Source: AP Statistics Formula Sheet

Notation and Formulas for Probability Distributions

This notation table is worth saving for quick reference.

How to Use This on the AP Statistics Exam

Problem Solving

  1. Identify the two population proportions and sample sizes.

  2. Find the center: pโ‚ - pโ‚‚.

  3. Find the spread by adding the two variances p(1-p)/n, then square rooting.

  4. Check all four large-counts conditions to confirm approximate normality.

  5. If a probability is asked, standardize the observed difference with a z-score and use the normal model.

  6. Interpret your answer in context, with units and a clear reference to both populations.

Common Trap

A frequent mistake is taking the square root of each group separately and then adding the standard deviations. That is wrong. You must add the variances first, then take one square root of the total.

Practice Problem

Suppose you are comparing the proportion of people in two cities who support a new public transportation system. You use simple random samples of 1000 people from each city. You find that 600 of the 1000 respondents from City A support the system, and 700 of the 1000 respondents from City B support the system.

a) Calculate the sample proportions of respondents who support the new system in each city.

b) Explain what the sampling distribution for the difference in sample proportions represents and why it is useful here.

c) Suppose the true population proportion in City A is 0.6 and in City B is 0.7. Describe the shape, center, and spread of the sampling distribution for the difference in sample proportions.

d) Explain why the difference in sample proportions can be modeled as approximately normal in this situation.

e) Discuss one potential source of bias that could affect the results, and explain how it could influence the estimate. (Hint: think about how this differs when working with two samples instead of one.)

Answer

a) City A: 600/1000 = 0.6. City B: 700/1000 = 0.7.

b) The sampling distribution for the difference in sample proportions represents the distribution of possible values of pฬ‚โ‚ - pฬ‚โ‚‚ if the study were repeated many times. It is useful because it lets you make inferences about the difference between the two population proportions based on the sample data.

c) If you define the difference as City A minus City B, the center is 0.6 - 0.7 = -0.1. If you define it as City B minus City A, the center is 0.7 - 0.6 = 0.1. The spread is the same either way: โˆš(0.6(0.4)/1000 + 0.7(0.3)/1000) โ‰ˆ โˆš(0.00024 + 0.00021) โ‰ˆ 0.0212. The shape is approximately normal because all four large-counts conditions are met.

d) All four expected counts are large: 1000(0.6) = 600, 1000(0.4) = 400, 1000(0.7) = 700, and 1000(0.3) = 300, all well above 10. With large counts satisfied for both groups, the distribution of pฬ‚โ‚ - pฬ‚โ‚‚ is approximately normal.

e) Nonresponse bias is one possibility. If supporters in City A are more likely to respond, that sample could overestimate support there. If people in City B who oppose the system are more likely to respond, that sample could underestimate support there. With two samples, bias in either group can distort the estimated difference, so you have to watch the response patterns in both cities, not just one.

Common Misconceptions

  • Adding standard deviations instead of variances. Always add the two variances p(1-p)/n first, then square root once. Standard deviations do not add directly.
  • Using the Central Limit Theorem for proportions. For proportions, you confirm normality with the large-counts (success-failure) checks. CLT is the justification you use for sample means.
  • Checking only one group's counts. All four conditions must pass: both successes and failures, for both groups.
  • Forgetting the independence requirement. The two samples must come from two independent populations for these formulas to apply.
  • Ignoring the without-replacement adjustment. If a sample is 10% or more of its population, the true standard deviation is smaller than the formula gives. Below 10%, you can ignore the difference.
  • Treating the sign of the difference as fixed. pฬ‚โ‚ - pฬ‚โ‚‚ and pฬ‚โ‚‚ - pฬ‚โ‚ have the same spread but opposite-signed centers. Be consistent about which group you label first.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

approximately normal

A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.

categorical variable

A variable that takes on values that are category names or group labels rather than numerical values.

difference in proportions

The difference between two population proportions, calculated as pโ‚ - pโ‚‚, used to compare the prevalence of a characteristic across two populations.

difference in sample proportions

The difference between two sample proportions (pฬ‚โ‚ - pฬ‚โ‚‚) used to compare proportions from two different samples.

independent populations

Two populations from which samples are drawn such that the selection from one population does not affect the selection from the other.

mean of the sampling distribution

The expected value of a sample statistic; for sample proportions, ฮผpฬ‚ = p.

normality conditions

The requirements that must be met for a sampling distribution to be approximately normal, such as nโ‚pโ‚ โ‰ฅ 10, nโ‚(1-pโ‚) โ‰ฅ 10, nโ‚‚pโ‚‚ โ‰ฅ 10, and nโ‚‚(1-pโ‚‚) โ‰ฅ 10.

parameter

A numerical summary that describes a characteristic of an entire population.

population proportion

The true proportion or percentage of a characteristic in an entire population, typically denoted as p.

probability

The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.

sample proportion

The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (pฬ‚).

sample size

The number of observations or data points collected in a sample, denoted as n.

sampling distribution

The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.

sampling with replacement

A sampling method in which an item selected from a population can be selected again in subsequent draws.

sampling without replacement

A sampling method in which an item selected from a population cannot be selected again in subsequent draws.

standard deviation of the sampling distribution

The measure of variability in a sampling distribution; for sample proportions, ฯƒpฬ‚ = โˆš(p(1-p)/n).

Frequently Asked Questions

What is the sampling distribution for a difference in sample proportions?

It is the distribution of possible values of p-hat 1 minus p-hat 2 from repeated samples from two independent populations. It shows how the difference between two sample proportions varies from sample to sample.

What is the mean of p-hat 1 minus p-hat 2?

The mean of the sampling distribution is p1 - p2, the difference between the two population proportions. The sign depends on which group you label first.

What is the standard deviation formula for a difference in sample proportions?

The standard deviation is the square root of p1(1 - p1)/n1 plus p2(1 - p2)/n2. You add the variances first, then take one square root.

When is p-hat 1 minus p-hat 2 approximately normal?

The distribution is approximately normal when all four large-counts checks pass: n1p1, n1(1 - p1), n2p2, and n2(1 - p2) are each at least 10.

Why do variances add when subtracting sample proportions?

Independent random quantities combine by adding variances. So even though the statistic is a difference, the spread uses the sum of the two variances before taking the square root.

How is AP Stats 5.6 tested?

AP Stats 5.6 can ask you to find the center and spread, check the large-counts condition, use a normal model to calculate probability, or interpret p-hat 1 minus p-hat 2 in context.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal โ†’ update your plan โ†’ choose Yearlyโ†’ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying โ†’