Sample data is the set of observations collected from a subset of a population, used to calculate statistics (like p̂ or b) that estimate unknown population parameters (like p or β). On the AP Stats exam, all inference, from confidence intervals to t-tests for slope, is built on random sample data.
Sample data is what you actually have in front of you. You almost never get to measure an entire population, so you collect data from a smaller group (the sample) and use it to make claims about the bigger group. The numbers you compute from sample data are called statistics (a sample proportion p̂, a sample mean x̄, a sample slope b), and they serve as your best guesses for the unknown parameters of the population (p, μ, β).
Here's the part that drives all of AP Stats inference: sample data varies. Take a different random sample and you'd get slightly different numbers. That's why the CED says a confidence interval "either contains the population proportion or it does not, because each interval is based on random sample data, which varies from sample to sample" (Topic 6.3). The whole machinery of margins of error, confidence levels, and p-values exists to quantify how much your sample data could reasonably bounce around. And none of it works unless the sample data was collected properly, meaning a random sample or randomized experiment, because randomness is what lets you generalize from the sample to the population.
Sample data shows up explicitly in two CED learning objectives. In Topic 6.3 (Unit 6), AP Stats 6.3.A requires you to interpret a confidence interval with a reference to the sample taken, and AP Stats 6.3.C connects sample size to interval width (width is proportional to 1/√n, so bigger samples give narrower intervals). In Topic 9.4 (Unit 9), AP Stats 9.4.C requires you to verify that the data came from a random sample or randomized experiment, and that n ≤ 10% of N when sampling without replacement, before running a t-test for a slope. In other words, sample data isn't just one topic. It's the raw material that every inference procedure in Units 6 through 9 runs on, and checking how it was collected is a graded step on FRQs.
Keep studying AP Statistics Unit 6
Population (Units 1 & 3)
Sample data and population are two ends of the same arrow. The sample is what you measure; the population is what you want to know about. Every inference question on the exam asks you to travel from the first to the second.
Confidence Interval (Unit 6)
A confidence interval is sample data plus an honest admission of uncertainty. You take a statistic like p̂ and add a margin of error to account for sample-to-sample variability, which is exactly why intervals from different samples land in different places.
10% Condition (Units 6-9)
When you sample without replacement, the observations aren't perfectly independent. The 10% condition (n ≤ 10% of N) says that if your sample data is a small enough slice of the population, you can treat it as independent anyway. You check this for proportions in Unit 6 and again for slopes in Topic 9.4.
Bias and Sampling Method (Unit 3)
Sample data is only as good as the method that produced it. A biased sampling method (like a voluntary response survey) makes the statistics systematically wrong, and no amount of fancy inference in Units 6-9 can fix that.
The phrase "a random sample of..." opens a huge share of inference problems, and it's never decoration. The 2017 FRQ described "a random sample of 207 men and women," the 2022 FRQ used "a random sample of 920 teenagers," and the 2024 FRQ asked Julio to design a sampling plan to estimate a mean price. Multiple choice questions follow the same pattern, like a quality inspector taking a random sample of 200 or 300 products and building a confidence interval for the defective proportion. You need to do three things with sample data: (1) cite it when checking conditions (random sample? n ≤ 10% of N? large counts?), (2) reference it when interpreting an interval ("based on this sample of 300 products..."), and (3) reason about how changing the sample size changes the margin of error. Skipping the random-sample check or interpreting an interval without mentioning the sample are classic ways to lose points.
Sample data comes from the subset you measured; population data would mean measuring everyone, which almost never happens. The giveaway is in the symbols. Statistics from sample data use p̂, x̄, and b; parameters from the population use p, μ, and β. If you have data on the entire population, there's nothing to infer, so confidence intervals and significance tests don't apply.
Sample data is the subset of the population you actually measured, and the statistics you compute from it (p̂, x̄, b) estimate unknown population parameters (p, μ, β).
Sample data varies from sample to sample, which is the entire reason confidence intervals and significance tests exist.
Inference is only valid if the sample data came from a random sample or randomized experiment; this is a graded condition check on FRQs.
Larger samples produce narrower confidence intervals because the width is proportional to 1/√n.
When sampling without replacement, sample data can be treated as independent only if n is at most 10% of the population size.
Interpreting a confidence interval should always reference the actual sample taken, not just spit out the numbers.
Sample data is the set of observations collected from a subset of a population. You use it to compute statistics like a sample proportion p̂ or a sample slope b, which estimate population parameters you can't measure directly.
Sample data comes from the part of the population you actually measured; population data would require measuring every individual. Statistics (p̂, x̄, b) describe samples, while parameters (p, μ, β) describe populations, and inference is the bridge between them.
Mostly yes, with a catch. A larger sample size shrinks the margin of error (width is proportional to 1/√n), so the interval gets narrower. But a big biased sample is still wrong; the data has to come from a random sample for the interval to mean anything.
Not validly. The CED requires data from a random sample or randomized experiment to check independence, including for the t-test for a slope in Topic 9.4. Without random selection, you can't generalize your results to the population.
When you sample without replacement, each pick slightly changes the remaining population, so observations aren't perfectly independent. If n ≤ 10% of N, that effect is small enough to ignore, which is why you check this condition in both Unit 6 and Unit 9 inference.
Connect this key term to the AP exam workflow: review the course, practice questions, and check related study tools.
Review units, study guides, and course resources.
Check this vocabulary in multiple-choice context.
Apply key concepts in written AP responses.
Estimate the exam score you are working toward.
Review the highest-yield facts before practice.
Put the full course together before test day.