Planning a study starts with two big choices: what type of study you are running and who you are studying. An observational study watches and records without imposing treatments, while an experiment assigns treatments to units. Those choices determine whether you can generalize to a population, make a cause-and-effect claim, or only describe an association.

Why This Matters for the AP Statistics Exam

This topic sets up everything in Unit 3, which makes up about 12 to 15 percent of the AP Statistics exam. You will need to tell observational studies apart from experiments, decide when it is fair to generalize sample results to a population, and explain why an observational study cannot establish cause and effect. Multiple-choice questions often describe a study and ask you to label it, and free-response questions ask you to justify a clear yes or no decision about generalizing or about causation. Using precise language here pays off later when you reach inference in Units 6 through 9.

more resources to help you study

practice multiple choice FRQ practice & scoring cheatsheets score calculator key terms

Key Takeaways

A population is the entire group you want to learn about; a sample is a subset of that population.
In an observational study, treatments are not imposed; in an experiment, treatments are assigned to experimental units.
Observational studies can be retrospective (looking back at existing data) or prospective (following individuals into the future).
A sample survey is a type of observational study that collects data from a sample to learn about a population.
You can only generalize to the population your sample was selected from, and only when that sample is random or otherwise representative.
Observational studies cannot establish causal relationships, often because of confounding variables.

Population of Generalization

The population of generalization is the group you are allowed to make a conclusion about. In AP Statistics, that population depends on how the sample was selected. If the sample was randomly selected from all students at one school, you may generalize to that school, not to all students everywhere. If the sample was not random or otherwise representative, generalization is much weaker.

Key Vocabulary

Population: all items or subjects of interest, the entire group you want information about.
Sample: a subset of the population that you actually collect data from.
Census: a study that collects data from every individual in a population. This is often expensive and time consuming, which is why most studies use samples instead. Governments use censuses to gather information like age, employment, and education across an entire population.
Sample survey: a type of observational study that collects data from a sample chosen to represent a specific population.
Observational study: a study where treatments are not imposed. Investigators either examine existing data (retrospective) or follow individuals forward in time (prospective).
Experiment: a study where different conditions, called treatments, are assigned to experimental units (participants or subjects).
Confounding variable: a variable whose effect on the response cannot be separated from the explanatory variable, which can make two variables look related when the link is unclear.

Observational Study or Experiment?

The fastest way to identify a study type is to ask one question: did the researchers impose treatments?

If treatments were assigned to units, it is an experiment.
If researchers only watched and recorded without assigning treatments, it is an observational study.

Inside observational studies, the timing matters:

Retrospective: investigators look back at data that already exists.
Prospective: investigators follow a sample into the future and collect data as it happens.

A sample survey is one common type of observational study. You pick a sample, ask questions or measure something, and try to learn about the larger population.

Generalizing and the Limits of Observational Studies

Two rules control what you are allowed to claim:

Generalizing: It is only appropriate to generalize to a population when the sample is randomly selected or otherwise representative. A sample is only generalizable to the population it was actually selected from. If you sample only one school, your conclusions apply to that school, not to all students everywhere.
Causation: You cannot determine a causal relationship from an observational study. You can describe a relationship or association between two variables, but you cannot say one caused the other.

The main reason observational studies fall short on causation is confounding. Confounding happens when two variables are tangled together so their separate effects on the response cannot be told apart. Because nobody assigned treatments, you can never be sure the relationship you see is not really being driven by some other variable.

How to Use This on the AP Statistics Exam

MCQ

Read the study description and immediately ask whether treatments were imposed. That single question usually tells you experiment versus observational study.
Watch for prospective versus retrospective wording (following people forward versus looking back at past data).
If a question asks what you can conclude from an observational study, the answer involves association, not causation.

Free Response

When asked if you can generalize, give a clear yes or no, then back it up. Point to whether the sample was random or representative and to which population it came from.
When asked about causation, state plainly that an observational study cannot establish cause and effect, then name a plausible confounding variable in the context of the problem.
Tie every explanation to the specific situation in the prompt. Generic reasoning rarely supports a stronger score.

Common Trap

Do not say an observational study "proves" or "shows" that one variable causes another. The strongest claim you can make is association.
Do not generalize to a wider group than the sample represents.

Practice Questions

(1) You are a statistician working for a public health research organization. You are considering conducting a study to investigate the relationship between air pollution and respiratory illness.

You are considering two different study designs: an observational study and an experiment. In the observational study, you will collect data on the levels of air pollution and the incidence of respiratory illness for a sample of individuals living in different neighborhoods. In the experiment, you will randomly assign individuals to different neighborhoods and measure the levels of air pollution and the incidence of respiratory illness for each group.

Which of the following study designs is more likely to produce reliable and valid results? Explain your answer.

(2) Going off the same set-up as number (1), which of the following variables could potentially confound the relationship between air pollution and respiratory illness? Explain your answer.

A. Age

B. Gender

C. Smoking status

D. Exercise habits

E. Income

Answers

(1) The experimental study design is more likely to produce reliable and valid results because it involves the random assignment of individuals to different groups. This minimizes the influence of any confounding variables that may be related to both the levels of air pollution and the incidence of respiratory illness.

In contrast, the observational study design does not involve any manipulation of the variables being studied. This means that any observed relationship between air pollution and respiratory illness may be confounded by other factors that are related to both variables. For example, individuals living in neighborhoods with higher levels of air pollution may also be more likely to smoke or have other risk factors for respiratory illness, making it difficult to separate the effects of air pollution from these other factors.

Overall, the experimental study design is preferred because it allows researchers to more confidently attribute any observed relationships to the variables of interest, rather than to confounding variables.

(2) Trick question. All of the variables listed could potentially confound the relationship between air pollution and respiratory illness.

Age: Older individuals may be more susceptible to respiratory illness and may also be more likely to live in neighborhoods with higher levels of air pollution.
Gender: Men and women may differ in their susceptibility to respiratory illness, and may also differ in their exposure to air pollution.
Smoking status: Smokers may be more likely to develop respiratory illness, and may also be more likely to live in neighborhoods with higher levels of air pollution.
Exercise habits: Individuals who exercise regularly may have a lower risk of respiratory illness and may also be more likely to live in neighborhoods with lower levels of air pollution.
Income: Individuals with lower incomes may be more likely to live in neighborhoods with higher levels of air pollution, and may also have a higher risk of respiratory illness.

When answering AP Statistics questions about confounding variables, think carefully about other variables that are tied to both the explanatory and response variables but were not accounted for in the study.

Common Misconceptions

"An observational study can prove causation if the relationship is strong." No. Strength of association does not turn an observational study into proof of cause and effect. Only a well-designed experiment supports a causal claim.
"A census and a sample survey are the same thing." A census collects data from every individual in the population. A sample survey only collects data from a subset.
"If a sample is large, you can generalize to any population." Sample size does not fix the problem. You can only generalize to the population the sample was drawn from, and only if it is random or representative.
"Retrospective and prospective are types of experiments." They are types of observational studies, based on whether you look back at existing data or follow individuals forward in time.
"Confounding only matters in experiments." Confounding is exactly why observational studies cannot establish causation, so it matters a lot when you interpret them.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term	Definition
causal relationships	A relationship between variables where one variable directly causes changes in another variable.
experiment	A study in which different conditions or treatments are assigned to experimental units to investigate cause-and-effect relationships.
experimental unit	The participants or subjects to which treatments are assigned in an experiment.
generalizations	Conclusions or statements about a larger population based on data from a sample.
observational study	A study in which treatments are not imposed; investigators examine data from a sample to investigate a topic of interest about the population.
population	The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
prospective	An observational study approach where investigators follow a sample of individuals into the future, collecting data over time.
randomly selected	A sampling method where every member of the population has an equal chance of being chosen.
representative sample	A sample that accurately reflects the characteristics and composition of the population from which it was drawn.
retrospective	An observational study approach where investigators examine data from a sample of individuals based on past information.
sample	A subset of individuals or items selected from a population for the purpose of data collection and analysis.
sample survey	A type of observational study that collects data from a sample to learn about the population from which the sample was taken.
treatment	Different conditions assigned to experimental units in an experiment.
variable	A characteristic that changes from one individual to another in a set of data.

Frequently Asked Questions

What is the population of generalization in AP Statistics?

The population of generalization is the group you are allowed to make a conclusion about. It depends on where the sample came from and whether that sample was random or representative.

What is the difference between a population and a sample?

A population is the entire group of interest. A sample is the subset you actually collect data from.

What is an observational study?

An observational study records data without imposing treatments. It can describe associations, but it cannot establish cause and effect.

What is the difference between an observational study and an experiment?

In an experiment, researchers assign treatments to experimental units. In an observational study, researchers only observe or measure without assigning treatments.

When can you generalize results to a population?

You can generalize when the sample is randomly selected or otherwise representative of the population. You can only generalize to the population the sample was selected from.

Can an observational study prove causation?

No. Observational studies can show association, but they cannot prove causation because confounding variables may explain the relationship.

2,589 studying →