Sample Data in AP Statistics

Sample data is the set of observations collected from a subset of a population, used to calculate statistics (like p̂ or b) that estimate unknown population parameters (like p or β). On the AP Stats exam, all inference, from confidence intervals to t-tests for slope, is built on random sample data.

Verified for the 2027 AP Statistics examLast updated June 2026

What is the Sample Data?

Sample data is what you actually have in front of you. You almost never get to measure an entire population, so you collect data from a smaller group (the sample) and use it to make claims about the bigger group. The numbers you compute from sample data are called statistics (a sample proportion p̂, a sample mean x̄, a sample slope b), and they serve as your best guesses for the unknown parameters of the population (p, μ, β).

Here's the part that drives all of AP Stats inference: sample data varies. Take a different random sample and you'd get slightly different numbers. That's why the CED says a confidence interval "either contains the population proportion or it does not, because each interval is based on random sample data, which varies from sample to sample" (Topic 6.3). The whole machinery of margins of error, confidence levels, and p-values exists to quantify how much your sample data could reasonably bounce around. And none of it works unless the sample data was collected properly, meaning a random sample or randomized experiment, because randomness is what lets you generalize from the sample to the population.

Why the Sample Data matters in AP Statistics

Sample data shows up explicitly in two CED learning objectives. In Topic 6.3 (Unit 6), AP Stats 6.3.A requires you to interpret a confidence interval with a reference to the sample taken, and AP Stats 6.3.C connects sample size to interval width (width is proportional to 1/√n, so bigger samples give narrower intervals). In Topic 9.4 (Unit 9), AP Stats 9.4.C requires you to verify that the data came from a random sample or randomized experiment, and that n ≤ 10% of N when sampling without replacement, before running a t-test for a slope. In other words, sample data isn't just one topic. It's the raw material that every inference procedure in Units 6 through 9 runs on, and checking how it was collected is a graded step on FRQs.

How the Sample Data connects across the course

Population (Units 1 & 3)

Sample data and population are two ends of the same arrow. The sample is what you measure; the population is what you want to know about. Every inference question on the exam asks you to travel from the first to the second.

Confidence Interval (Unit 6)

A confidence interval is sample data plus an honest admission of uncertainty. You take a statistic like p̂ and add a margin of error to account for sample-to-sample variability, which is exactly why intervals from different samples land in different places.

10% Condition (Units 6-9)

When you sample without replacement, the observations aren't perfectly independent. The 10% condition (n ≤ 10% of N) says that if your sample data is a small enough slice of the population, you can treat it as independent anyway. You check this for proportions in Unit 6 and again for slopes in Topic 9.4.

Bias and Sampling Method (Unit 3)

Sample data is only as good as the method that produced it. A biased sampling method (like a voluntary response survey) makes the statistics systematically wrong, and no amount of fancy inference in Units 6-9 can fix that.

Is the Sample Data on the AP Statistics exam?

The phrase "a random sample of..." opens a huge share of inference problems, and it's never decoration. The 2017 FRQ described "a random sample of 207 men and women," the 2022 FRQ used "a random sample of 920 teenagers," and the 2024 FRQ asked Julio to design a sampling plan to estimate a mean price. Multiple choice questions follow the same pattern, like a quality inspector taking a random sample of 200 or 300 products and building a confidence interval for the defective proportion. You need to do three things with sample data: (1) cite it when checking conditions (random sample? n ≤ 10% of N? large counts?), (2) reference it when interpreting an interval ("based on this sample of 300 products..."), and (3) reason about how changing the sample size changes the margin of error. Skipping the random-sample check or interpreting an interval without mentioning the sample are classic ways to lose points.

The Sample Data vs Population data

Sample data comes from the subset you measured; population data would mean measuring everyone, which almost never happens. The giveaway is in the symbols. Statistics from sample data use p̂, x̄, and b; parameters from the population use p, μ, and β. If you have data on the entire population, there's nothing to infer, so confidence intervals and significance tests don't apply.

Key things to remember about the Sample Data

  • Sample data is the subset of the population you actually measured, and the statistics you compute from it (p̂, x̄, b) estimate unknown population parameters (p, μ, β).

  • Sample data varies from sample to sample, which is the entire reason confidence intervals and significance tests exist.

  • Inference is only valid if the sample data came from a random sample or randomized experiment; this is a graded condition check on FRQs.

  • Larger samples produce narrower confidence intervals because the width is proportional to 1/√n.

  • When sampling without replacement, sample data can be treated as independent only if n is at most 10% of the population size.

  • Interpreting a confidence interval should always reference the actual sample taken, not just spit out the numbers.

Frequently asked questions about the Sample Data

What is sample data in AP Statistics?

Sample data is the set of observations collected from a subset of a population. You use it to compute statistics like a sample proportion p̂ or a sample slope b, which estimate population parameters you can't measure directly.

What's the difference between sample data and population data?

Sample data comes from the part of the population you actually measured; population data would require measuring every individual. Statistics (p̂, x̄, b) describe samples, while parameters (p, μ, β) describe populations, and inference is the bridge between them.

Does a bigger sample mean a more accurate confidence interval?

Mostly yes, with a catch. A larger sample size shrinks the margin of error (width is proportional to 1/√n), so the interval gets narrower. But a big biased sample is still wrong; the data has to come from a random sample for the interval to mean anything.

Can you do a significance test if the data isn't from a random sample?

Not validly. The CED requires data from a random sample or randomized experiment to check independence, including for the t-test for a slope in Topic 9.4. Without random selection, you can't generalize your results to the population.

Why do you need the 10% condition for sample data?

When you sample without replacement, each pick slightly changes the remaining population, so observations aren't perfectly independent. If n ≤ 10% of N, that effect is small enough to ignore, which is why you check this condition in both Unit 6 and Unit 9 inference.