Two-sample t-test

A two-sample t-test is the AP Statistics significance test for comparing the means of two independent groups on a quantitative variable. It tests H₀: μ₁ = μ₂ using the statistic t = (x̄₁ − x̄₂)/√(s₁²/n₁ + s₂²/n₂), with degrees of freedom found by technology.

Verified for the 2027 AP Statistics examLast updated June 2026

What is Two-sample t-test?

A two-sample t-test answers one question. Are the means of two populations actually different, or could the gap between two sample means just be random chance? You use it when you have one quantitative variable measured on two independent groups, either from two separate random samples or from the two arms of a randomized experiment (CED 7.8.A).

The null hypothesis says there's no difference, written H₀: μ₁ − μ₂ = 0 or H₀: μ₁ = μ₂. The alternative is one-sided (μ₁ > μ₂ or μ₁ < μ₂) or two-sided (μ₁ ≠ μ₂), depending on the research question (CED 7.8.B). Before computing anything, you verify conditions. The data must come from random samples or random assignment, each sample must be at most 10% of its population if sampling without replacement, and the sampling distribution of x̄₁ − x̄₂ must be approximately normal, which means either both sample sizes are over 30 or the sample data show no strong skew or outliers (CED 7.8.C). Then the test statistic t = ((x̄₁ − x̄₂) − 0)/√(s₁²/n₁ + s₂²/n₂) follows an approximate t-distribution. Let your calculator find the degrees of freedom. The exact df is messy, but it always lands between the smaller of n₁ − 1 and n₂ − 1 and the total n₁ + n₂ − 2 (CED 7.9.A).

Why Two-sample t-test matters in AP Statistics

This test lives in Unit 7 (Inference for Quantitative Data: Means), specifically Topics 7.8 and 7.9, covering learning objectives 7.8.A, 7.8.B, 7.8.C, 7.9.A, 7.9.B, and 7.9.C. It's the capstone procedure of the unit because comparing two groups is what real studies actually do. Almost nobody researches one mean in isolation. It also stars in the skills-focus topics (7.10 and 8.7), where the exam hands you a scenario and asks you to pick the right inference procedure. The decision tree there hinges on two questions you should burn into memory. Is the variable quantitative or categorical? Is there one group, two independent groups, or paired data? Quantitative plus two independent groups means two-sample t-test, full stop.

How Two-sample t-test connects across the course

Matched Pairs t-Test (Unit 7)

This is the closest relative and the biggest trap. If the two sets of measurements are linked (same person before and after, twins, the same plot of land), the groups aren't independent, so you take differences and run a one-sample t-test instead. Random assignment to two separate groups means two-sample; paired-up data means matched pairs.

Confidence Interval for a Difference of Means (Unit 7)

The test and the interval are two views of the same inference. A two-sided test at α = 0.05 rejects H₀ exactly when the 95% confidence interval for μ₁ − μ₂ misses zero. The exam loves swapping between them, like asking what a p-value of 0.047 implies at a 99% confidence level (it doesn't reject, since 0.047 > 0.01).

Chi-Square Tests (Unit 8)

Topic 8.7 forces you to choose between procedures, and the dividing line is the variable type. Comparing mean scores between two groups is a two-sample t-test. Comparing the distribution of a categorical variable across groups is a chi-square test for homogeneity. Read the response variable first, then pick.

Randomized Experiments and Control Groups (Unit 3)

Unit 3 design choices decide whether a two-sample t-test is even legal. Random assignment to two treatments satisfies the independence condition and, as a bonus, lets you conclude cause and effect, something a survey-based two-sample test can never do.

Is Two-sample t-test on the AP Statistics exam?

Multiple choice hits this term three ways. You'll identify the correct hypotheses and test statistic from summary data (like n₁ = 45, x̄₁ = 72, s₁ = 8 versus n₂ = 40, x̄₂ = 76, s₂ = 10), interpret a p-value correctly, and catch bad conclusions. A classic stem gives a p-value of 0.078 and a researcher claiming 'strong evidence' of a difference, and you have to recognize that 0.078 > 0.05 means fail to reject H₀.

On the free response, this is a four-step inference question. The 2018 FRQ Q4 compared mean ACL surgery recovery times between two groups, and the 2025 FRQ Q6 used a randomized experiment on reading comprehension by time of day. To earn full credit you state hypotheses in context with defined parameters, name the procedure and check all conditions, report the t-statistic, df, and p-value, and write a conclusion that compares p to α and answers the research question in context (CED 7.9.C). Interpreting the p-value means starting with 'assuming the true population means are equal...' (CED 7.9.B). Skipping conditions or giving a context-free conclusion is the most common way points evaporate.

Two-sample t-test vs Matched pairs t-test

Both compare two sets of quantitative measurements, but the structure of the data decides which one you run. A two-sample t-test needs two independent groups, like 30 employees randomly assigned to Program A and 32 different employees assigned to Program B. A matched pairs design links each value in one set to a value in the other (before/after on the same person, twins split between treatments), so you subtract to get one list of differences and run a one-sample t-test on those differences. Quick check that works on almost every problem. If the two sample sizes could be different, it's two-sample. If every observation has a natural partner, it's paired.

Key things to remember about Two-sample t-test

  • Use a two-sample t-test when you compare the means of one quantitative variable across two independent groups from random samples or a randomized experiment.

  • The null hypothesis is always H₀: μ₁ = μ₂ (no difference), and the alternative can be one-sided or two-sided depending on the research question.

  • Before testing, check three conditions: randomness, the 10% condition for each sample if sampling without replacement, and approximate normality (both n > 30, or no strong skew or outliers in the sample data).

  • The test statistic is t = (x̄₁ − x̄₂)/√(s₁²/n₁ + s₂²/n₂), and you should let technology find the degrees of freedom, which fall between the smaller of n₁ − 1 and n₂ − 1 and n₁ + n₂ − 2.

  • Interpret the p-value as the probability of getting a difference in sample means at least as extreme as the one observed, assuming the population means are actually equal.

  • If the p-value ≤ α, reject H₀ and say there is convincing evidence of a difference; if the p-value > α, fail to reject H₀, but never say you 'accept' it or that the means are 'proven equal.'

Frequently asked questions about Two-sample t-test

What is a two-sample t-test in AP Stats?

It's the significance test for whether two population means differ, used when you have a quantitative variable measured on two independent groups. It tests H₀: μ₁ = μ₂ with the statistic t = (x̄₁ − x̄₂)/√(s₁²/n₁ + s₂²/n₂), covered in Topics 7.8 and 7.9.

Does a p-value above 0.05 prove the two means are equal?

No. A p-value like 0.078 means you fail to reject H₀, which is a statement about lacking convincing evidence, not proof of equality. Saying the test 'shows the means are the same' is a classic point-losing error on the AP exam.

How is a two-sample t-test different from a matched pairs t-test?

Two-sample requires two independent groups, like separate people randomly assigned to two treatments. Matched pairs handles linked data, like before-and-after scores for the same person, where you test the mean of the differences with a one-sample t-test.

How do I find the degrees of freedom for a two-sample t-test?

Use technology (your calculator's 2-SampTTest reports it automatically). The CED says the df falls between the smaller of n₁ − 1 and n₂ − 1 and the total n₁ + n₂ − 2, so a hand answer using the smaller of n₁ − 1 and n₂ − 1 is a safe conservative choice.

Has a two-sample t-test appeared on AP Stats FRQs?

Yes. The 2018 FRQ Q4 compared mean ACL surgery recovery times between two groups, and the 2025 FRQ Q6 involved a randomized experiment on time of day and reading comprehension. Both reward the full four-step process: hypotheses, conditions, calculations, and a conclusion in context.