---
title: "AP Stats 5.6: Difference in Sample Proportions"
description: "Review AP Stats 5.6 sampling distributions for differences in sample proportions, including the mean, standard deviation, large-counts condition, normal model, and p-hat notation."
canonical: "https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW"
type: "study-guide"
subject: "AP Statistics"
unit: "Unit 5 – Sampling Distributions"
lastUpdated: "2026-06-09"
---

# AP Stats 5.6: Difference in Sample Proportions

## Summary

Review AP Stats 5.6 sampling distributions for differences in sample proportions, including the mean, standard deviation, large-counts condition, normal model, and p-hat notation.

## Guide

When you compare two groups by subtracting their [sample proportions](/ap-stats/key-terms/sample-proportion "fv-autolink"), the result $\hat{p}_1 - \hat{p}_2$ has its own sampling distribution. Its [center](/ap-stats/key-terms/center "fv-autolink") is the true difference $p_1 - p_2$, its standard deviation is $\sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$, and it is approximately normal when all four large-counts checks pass.

## Why This Matters for the AP Statistics Exam

This topic is the bridge between single-proportion [sampling distributions](/ap-stats/unit-5 "fv-autolink") and the two-sample proportion inference you will do later in Unit 6. Before you can build a [confidence interval](/ap-stats/key-terms/confidence-interval "fv-autolink") or run a test for the difference between two population proportions, you need to know the center, spread, and shape of the distribution of p̂₁ - p̂₂.

On the exam you may be asked to find these [parameters](/ap-stats/key-terms/parameter "fv-autolink"), check whether the normal model applies, calculate a [probability](/ap-stats/unit-4/intro-probability/study-guide/gfnBWfyMANOxF3vWLrbA "fv-autolink") for an observed difference, or interpret what the distribution means in context. Showing the formula, your large-counts checks, and a contextual interpretation is important for clear exam work.

## Key Takeaways

- The mean of the distribution of p̂₁ - p̂₂ is the [difference in population proportions](/ap-stats/key-terms/difference-in-population-proportions "fv-autolink"): μ(p̂₁-p̂₂) = p₁ - p₂.
- The standard deviation is σ(p̂₁-p̂₂) = √(p₁(1-p₁)/n₁ + p₂(1-p₂)/n₂). Variances add, then take the square root.
- The model is approximately normal only when all four counts are large: n₁p₁ ≥ 10, n₁(1-p₁) ≥ 10, n₂p₂ ≥ 10, n₂(1-p₂) ≥ 10.
- For proportions you check the large-counts (success-failure) condition, not the Central Limit Theorem. CLT applies to means.
- The two samples must come from two independent populations.
- When [sampling without replacement](/ap-stats/key-terms/sampling-without-replacement "fv-autolink"), the true standard deviation is slightly smaller, but the difference is negligible if each [sample](/ap-stats/unit-3/intro-planning-study/study-guide/YR5NI5ejwMAQ2dglm67s "fv-autolink") is less than 10% of its population.

## How the Distribution Works

The phrase "variances add" is the key to all difference distributions. Even though you subtract the two sample proportions, you add their variances before taking a square root for the standard deviation.

**Center (mean):**

μ(p̂₁-p̂₂) = p₁ - p₂

**Spread (standard deviation):**

σ(p̂₁-p̂₂) = √(p₁(1-p₁)/n₁ + p₂(1-p₂)/n₂)

Each piece p(1-p)/n is the [variance](/ap-stats/key-terms/variance "fv-autolink") of one group's sample proportion. You add those two variances, then square root the total. Some people call this the "Pythagorean Theorem of [statistics](/ap-stats/key-terms/statistic "fv-autolink")" because you combine two squared pieces under a single square root.

**Shape:** The distribution of p̂₁ - p̂₂ is approximately normal when all four of these are met:

- n₁p₁ ≥ 10
- n₁(1 - p₁) ≥ 10
- n₂p₂ ≥ 10
- n₂(1 - p₂) ≥ 10

If any [expected count](/ap-stats/key-terms/expected-count "fv-autolink") falls below 10, the distribution can be [skewed](/ap-stats/unit-9/confidence-intervals-for-slope-regression-model/study-guide/YsvXWrndemJrI2kBF3Wn "fv-autolink") and the normal model may not be safe.

![Difference in Sample Proportions Formulas](https://storage.googleapis.com/static.prod.fiveable.me/ap-images/ap%20stats/Unit%205/Difference%20in%20Sample%20Proportions%20Formulas.png)

Source: [AP Statistics Formula Sheet](https://apcentral.collegeboard.org/pdf/statistics-formula-sheet-and-tables-2020.pdf)

![Notation and Formulas for Probability Distributions](https://storage.googleapis.com/static.prod.fiveable.me/ap-images/ap%20stats/Unit%205/Notation%20and%20Formulas%20for%20Probability%20Distributions.png)

This notation table is worth saving for quick reference.

## How to Use This on the AP Statistics Exam

### Problem Solving

1. Identify the two population proportions and [sample sizes](/ap-stats/key-terms/sample-size "fv-autolink").
2. Find the center: p₁ - p₂.

3. Find the spread by adding the two variances p(1-p)/n, then square rooting.
4. Check all four large-counts conditions to confirm approximate [normality](/ap-stats/key-terms/normality "fv-autolink").
5. If a probability is asked, [standardize](/ap-stats/key-terms/standardize "fv-autolink") the observed difference with a [z-score](/ap-stats/key-terms/z-score "fv-autolink") and use the normal model.
6. Interpret your answer in context, with units and a clear reference to both populations.

### Common Trap

A frequent mistake is taking the square root of each group separately and then adding the standard deviations. That is wrong. You must add the variances first, then take one square root of the total.

## Practice Problem

Suppose you are comparing the proportion of people in two cities who support a new public transportation system. You use simple [random samples](/ap-stats/key-terms/random-sample "fv-autolink") of 1000 people from each city. You find that 600 of the 1000 respondents from City A support the system, and 700 of the 1000 respondents from City B support the system.

**a)** Calculate the sample proportions of respondents who support the new system in each city.

**b)** Explain what the [sampling distribution for the difference in sample proportions](/ap-stats/key-terms/sampling-distribution-for-the-difference-in-sample-proportions "fv-autolink") represents and why it is useful here.

**c)** Suppose the true population proportion in City A is 0.6 and in City B is 0.7. Describe the shape, center, and spread of the sampling distribution for the difference in sample proportions.

**d)** Explain why the difference in sample proportions can be modeled as approximately normal in this situation.

**e)** Discuss one potential source of bias that could affect the results, and explain how it could influence the estimate. (Hint: think about how this differs when working with two samples instead of one.)

### Answer

**a)** City A: 600/1000 = 0.6. City B: 700/1000 = 0.7.

**b)** The sampling distribution for the difference in sample proportions represents the distribution of possible values of p̂₁ - p̂₂ if the study were repeated many times. It is useful because it lets you make inferences about the difference between the two population proportions based on the sample data.

**c)** If you define the difference as City A minus City B, the center is 0.6 - 0.7 = -0.1. If you define it as City B minus City A, the center is 0.7 - 0.6 = 0.1. The spread is the same either way: √(0.6(0.4)/1000 + 0.7(0.3)/1000) ≈ √(0.00024 + 0.00021) ≈ 0.0212. The shape is approximately normal because all four large-counts conditions are met.

**d)** All four expected counts are large: 1000(0.6) = 600, 1000(0.4) = 400, 1000(0.7) = 700, and 1000(0.3) = 300, all well above 10. With large counts satisfied for both groups, the distribution of p̂₁ - p̂₂ is approximately normal.

**e)** Nonresponse bias is one possibility. If supporters in City A are more likely to respond, that sample could overestimate support there. If people in City B who oppose the system are more likely to respond, that sample could underestimate support there. With two samples, bias in either group can distort the estimated difference, so you have to watch the response [patterns](/ap-stats/unit-2/introducing-statistics-are-variables-related/study-guide/Mh7Se81sjpqSYhL2ihl1 "fv-autolink") in both cities, not just one.

## Common Misconceptions

- **Adding standard deviations instead of variances.** Always add the two variances p(1-p)/n first, then square root once. Standard deviations do not add directly.
- **Using the Central Limit Theorem for proportions.** For proportions, you confirm normality with the large-counts (success-failure) checks. CLT is the justification you use for sample means.
- **Checking only one group's counts.** All four conditions must pass: both successes and failures, for both groups.
- **Forgetting the independence requirement.** The two samples must come from two independent populations for these formulas to apply.
- **Ignoring the without-replacement adjustment.** If a sample is 10% or more of its population, the true standard deviation is smaller than the formula gives. Below 10%, you can ignore the difference.
- **Treating the sign of the difference as fixed.** p̂₁ - p̂₂ and p̂₂ - p̂₁ have the same spread but opposite-signed centers. Be consistent about which group you label first.

## Related AP Statistics Guides

- [Unit 5 Overview: Sampling Distributions](/ap-stats/unit-5/review/study-guide/DTw89sv8RD3Eq3WC58AB)
- [5.1 Introducing Statistics: Why Is My Sample Not Like Yours?](/ap-stats/unit-5/why-is-my-sample-not-like-yours/study-guide/Mrybsi6gfieJDqF2LNju)
- [5.5 Sampling Distributions for Sample Proportions](/ap-stats/unit-5/sampling-distributions-for-sample-proportions/study-guide/Ezxev8MPpv3mFKjV4Gq3)
- [5.2 The Normal Distribution, Revisited](/ap-stats/unit-5/normal-distribution-revisited/study-guide/dx4vMcx3WjSw68f1Ov66)
- [5.3 The Central Limit Theorem](/ap-stats/unit-5/central-limit-theorem/study-guide/DPmpebCrsJBYfpSgOKn3)
- [5.4 Biased and Unbiased Point Estimates](/ap-stats/unit-5/biased-unbiased-point-estimates/study-guide/eZ5sR9XOkLB1o9KKpMHF)

## Vocabulary

- **approximately normal**: A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
- **categorical variable**: A variable that takes on values that are category names or group labels rather than numerical values.
- **difference in proportions**: The difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations.
- **difference in sample proportions**: The difference between two sample proportions (p̂₁ - p̂₂) used to compare proportions from two different samples.
- **independent populations**: Two populations from which samples are drawn such that the selection from one population does not affect the selection from the other.
- **mean of the sampling distribution**: The expected value of a sample statistic; for sample proportions, μp̂ = p.
- **normality conditions**: The requirements that must be met for a sampling distribution to be approximately normal, such as n₁p₁ ≥ 10, n₁(1-p₁) ≥ 10, n₂p₂ ≥ 10, and n₂(1-p₂) ≥ 10.
- **parameter**: A numerical summary that describes a characteristic of an entire population.
- **population proportion**: The true proportion or percentage of a characteristic in an entire population, typically denoted as p.
- **probability**: The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
- **sample proportion**: The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
- **sample size**: The number of observations or data points collected in a sample, denoted as n.
- **sampling distribution**: The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
- **sampling with replacement**: A sampling method in which an item selected from a population can be selected again in subsequent draws.
- **sampling without replacement**: A sampling method in which an item selected from a population cannot be selected again in subsequent draws.
- **standard deviation of the sampling distribution**: The measure of variability in a sampling distribution; for sample proportions, σp̂ = √(p(1-p)/n).

## FAQs

### What is the sampling distribution for a difference in sample proportions?

It is the distribution of possible values of p-hat 1 minus p-hat 2 from repeated samples from two independent populations. It shows how the difference between two sample proportions varies from sample to sample.

### What is the mean of p-hat 1 minus p-hat 2?

The mean of the sampling distribution is p1 - p2, the difference between the two population proportions. The sign depends on which group you label first.

### What is the standard deviation formula for a difference in sample proportions?

The standard deviation is the square root of p1(1 - p1)/n1 plus p2(1 - p2)/n2. You add the variances first, then take one square root.

### When is p-hat 1 minus p-hat 2 approximately normal?

The distribution is approximately normal when all four large-counts checks pass: n1p1, n1(1 - p1), n2p2, and n2(1 - p2) are each at least 10.

### Why do variances add when subtracting sample proportions?

Independent random quantities combine by adding variances. So even though the statistic is a difference, the spread uses the sum of the two variances before taking the square root.

### How is AP Stats 5.6 tested?

AP Stats 5.6 can ask you to find the center and spread, check the large-counts condition, use a normal model to calculate probability, or interpret p-hat 1 minus p-hat 2 in context.

## Structured Data

```json
{"@context":"https://schema.org","@type":"FAQPage","inLanguage":"en","mainEntity":[{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#what-is-the-sampling-distribution-for-a-difference-in-sample-proportions","name":"What is the sampling distribution for a difference in sample proportions?","acceptedAnswer":{"@type":"Answer","text":"It is the distribution of possible values of p-hat 1 minus p-hat 2 from repeated samples from two independent populations. It shows how the difference between two sample proportions varies from sample to sample."}},{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#what-is-the-mean-of-p-hat-1-minus-p-hat-2","name":"What is the mean of p-hat 1 minus p-hat 2?","acceptedAnswer":{"@type":"Answer","text":"The mean of the sampling distribution is p1 - p2, the difference between the two population proportions. The sign depends on which group you label first."}},{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#what-is-the-standard-deviation-formula-for-a-difference-in-sample-proportions","name":"What is the standard deviation formula for a difference in sample proportions?","acceptedAnswer":{"@type":"Answer","text":"The standard deviation is the square root of p1(1 - p1)/n1 plus p2(1 - p2)/n2. You add the variances first, then take one square root."}},{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#when-is-p-hat-1-minus-p-hat-2-approximately-normal","name":"When is p-hat 1 minus p-hat 2 approximately normal?","acceptedAnswer":{"@type":"Answer","text":"The distribution is approximately normal when all four large-counts checks pass: n1p1, n1(1 - p1), n2p2, and n2(1 - p2) are each at least 10."}},{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#why-do-variances-add-when-subtracting-sample-proportions","name":"Why do variances add when subtracting sample proportions?","acceptedAnswer":{"@type":"Answer","text":"Independent random quantities combine by adding variances. So even though the statistic is a difference, the spread uses the sum of the two variances before taking the square root."}},{"@type":"Question","@id":"https://fiveable.me/ap-stats/unit-5/sampling-distributions-for-differences-sample-proportions/study-guide/VOvA8du6YHMjhEwB7lEW#how-is-ap-stats-56-tested","name":"How is AP Stats 5.6 tested?","acceptedAnswer":{"@type":"Answer","text":"AP Stats 5.6 can ask you to find the center and spread, check the large-counts condition, use a normal model to calculate probability, or interpret p-hat 1 minus p-hat 2 in context."}}]}
```
