The data you collect can only be trusted if the method used to collect it relies on chance. When sampling or assigning treatments skips randomness, bias creeps in and your conclusions may not apply to the population you care about.

Why This Matters for the AP Statistics Exam

This topic is the foundation for everything in AP Statistics Unit 3 and a big part of why later inference works at all. Once you reach confidence intervals and hypothesis tests in Units 6 through 9, every conclusion depends on whether the data were collected in a trustworthy way. On the exam, you are often asked to decide whether results can be generalized to a population or whether a cause-and-effect claim is valid, and that decision traces straight back to how the data were gathered.

You will use this thinking on both multiple-choice questions and free-response questions. A common task is reading a study description, identifying how the data were collected, and explaining whether the conclusion holds up. Getting comfortable with this idea early makes the rest of the unit much easier.

more resources to help you study

practice multiple choice FRQ practice & scoring cheatsheets score calculator key terms

Key Takeaways

Data collection methods that do not rely on chance lead to untrustworthy conclusions.
Random selection helps a sample represent the population, which is what lets you generalize results.
Bias means certain responses are systematically favored over others, which skews conclusions.
A non-random method (like convenience or voluntary response) builds in bias before you even analyze the data.
The main question to ask about any study: were the data collected in a way that supports the conclusion being made?
Misleading graphs and chosen displays can also distort the truth, so check how data are shown, not just how they were collected.

How Data Can Tell the Truth or Mislead

The central idea is simple to state: methods for data collection that do not rely on chance result in untrustworthy conclusions. If a study does not use chance to choose who is in the sample, the sample is likely to be biased and not representative of the population. When that happens, the results may not be generalizable, so any conclusion about the larger population is shaky.

Chance matters in two different places, and the exam wants you to keep them straight:

Random selection is about choosing who is in your sample. It is what lets you generalize results from a sample back to the population.
Random assignment is about how treatments are handed out in an experiment. It is what lets you argue a treatment caused a difference.

You will go deeper on both later in this unit. For now, the takeaway is that without chance somewhere in the process, you cannot trust what the data seem to say.

How Displays Can Distort Data

Even accurate data can mislead if the display is manipulated. Changing the scale on an axis, cutting off the vertical axis so differences look huge, or leaving values as raw counts when percentages would be fairer can all push an audience toward one conclusion. Being able to spot these tricks helps you catch weak arguments and judge whether a claim is actually supported.

Source: Maarten Grootendorst

How to Use This on the AP Statistics Exam

MCQ

Expect short study descriptions where you decide whether a conclusion is justified. Ask yourself: was chance used to select the sample? If not, the sample may be biased, and you should be cautious about generalizing. Watch for words that signal non-random methods, like "volunteers," "people who chose to respond," or "shoppers who happened to be there."

Free Response

When a prompt asks whether it is appropriate to generalize results, give a clear yes or no and then back it up with specific evidence. Do not just say "the sample is biased." Explain why, and when possible, say whether the sample result is likely too high or too low compared to the true population value. Tie your reasoning to the exact context of the question instead of using generic statements.

Common Trap

Mixing up random selection and random assignment is one of the most common precision errors. Use random selection language when you talk about generalizing to a population. Use random assignment language when you talk about cause and effect in experiments. Saying the wrong one can cost you even when your overall idea is right.

Comparing Trustworthy and Untrustworthy Collection

These examples are illustrations of the concept, not required AP terms you must memorize for this topic. The point is to notice the pattern: chance-based methods support trustworthy conclusions, while self-selected or convenience-based methods do not.

Examples that tend to produce untrustworthy results:

A poll that only surveys subscribers of one news outlet, so the sample is not representative of all voters.
A product survey that only includes people who already bought the product, so it misses everyone else's opinion.
A study that only includes participants who volunteered to try a new treatment, so willing participants may differ from the general population.

Examples that tend to produce more trustworthy results:

A national poll that uses random selection to choose a representative sample of voters.
A medical study that uses random assignment to split participants into treatment and control groups, so differences in outcomes can be linked to the treatment.
A survey that uses a chance-based method to choose a sample that reflects the population's makeup.

Notice the difference: when chance is built into how data are collected, conclusions are more reliable. When it is not, bias is likely and conclusions may not generalize.

Common Misconceptions

"A big sample fixes bias." Size does not fix a flawed method. A huge convenience sample is still biased because non-random selection bakes in the problem no matter how many people respond.
"Bias means someone was unfair on purpose." In statistics, bias just means certain responses are systematically favored. It can happen without anyone intending it.
"Random selection and random assignment are the same thing." They are not. Random selection supports generalizing to a population; random assignment supports cause-and-effect claims.
"If the data are real, the conclusion must be valid." Real, accurate numbers can still mislead through poor collection methods or distorted graphs.
"Voluntary response samples are fine if a lot of people respond." A large group of self-selected responders is still not representative of the population.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term	Definition
chance	Randomness or probability-based selection used in data collection to reduce bias and ensure representativeness.
data collection methods	The procedures and techniques used to gather information or data from a population or sample.

Frequently Asked Questions

Why does chance matter in data collection?

Chance-based data collection helps reduce bias. Methods that do not rely on chance produce untrustworthy conclusions because the sample may not represent the population.

What is bias in AP Statistics?

Bias means a data collection method systematically favors some outcomes or responses over others. It can happen even if nobody is trying to be misleading.

Why are convenience samples untrustworthy?

Convenience samples use people or items that are easy to reach rather than randomly selected. That method can miss important parts of the population and create bias.

What is the difference between random selection and random assignment?

Random selection chooses who is in a sample and supports generalizing to a population. Random assignment assigns treatments and supports cause-and-effect conclusions.

Can a large sample still be biased?

Yes. A large sample can still be biased if the collection method is flawed. Sample size does not fix a non-random or unrepresentative method.

How are data collection methods tested on the AP Statistics exam?

AP Statistics often asks whether a conclusion is justified, whether results can be generalized, or whether a study supports cause and effect. The answer depends on how data were collected.