Fiveable

📊AP Statistics Unit 3 Review

QR code for AP Statistics practice questions

3.7 Inference and Experiments

📊AP Statistics
Unit 3 Review

3.7 Inference and Experiments

Written by the Fiveable Content Team • Last updated September 2025
Verified for the 2026 exam
Verified for the 2026 examWritten by the Fiveable Content Team • Last updated September 2025
📊AP Statistics
Unit & Topic Study Guides
Pep mascot

Drawing conclusions from the data and analyzing possible areas of error helps create a valid inference about the population from which the sample was chosen in the context of well-designed experiments. 

Statistical inference is a method of using data to make conclusions about a larger population. In statistical inference, we attribute our conclusions based on the data to the distribution from which the data were collected. This means that we assume that the sample we have collected is representative of the larger population and that the conclusions we draw from the sample can be generalized to the population.

For example, if we collect data on the height of a sample of 100 people and calculate the mean height, we can use statistical inference to make conclusions about the mean height of the entire population of people. We do this by assuming that the sample of 100 people is representative of the larger population and that the mean height we calculated for the sample is the same as the mean height of the population.

Statistical inference allows us to make conclusions about a population based on a sample, even if we do not have access to the entire population. This is an important tool in research, as it allows us to study small samples of people or other entities and draw conclusions about the larger population.

Inferences for Studies/Samples

Sampling variability refers to the fact that different random samples of the same size from the same population can produce different estimates of a population parameter, such as the mean or standard deviation. This variability is a natural occurrence in statistical sampling and is due to the fact that each sample is a unique subset of the population.

Larger samples tend to produce more accurate estimates that are closer to the true population value than smaller random samples. This is because larger samples are more representative of the population and are less likely to be affected by sampling error. Sampling error is the difference between the estimate obtained from a sample and the true population value.

The larger the sample size, the smaller the sampling error is likely to be.

Pep mascot
more resources to help you study

Inferences for Experiments

Random assignment of treatments to experimental units is a key aspect of experimental research design. It involves randomly assigning subjects or other experimental units to different treatment conditions in order to control for extraneous variables. By randomly assigning subjects to different conditions, the researcher can be confident that any observed differences between the groups are due to the treatment rather than other factors.

Random assignment allows researchers to conclude that some observed changes are so large as to be unlikely to have occurred by chance. Such changes are said to be statistically significant, which means that they are likely to be real rather than due to random variation.

If the experimental units used in an experiment are representative of some larger group of units, the results of the experiment can be generalized to the larger group. Random selection of experimental units gives a better chance that the units will be representative of the larger group, which increases the validity of the study. Random selection of units ensures that the data will be representative of the designated population.

We'll learn more about how to determine if differences are enough to be considered statistically significant in Unit 6 and Unit 7.  Notes:

  • Inference about a population can be made only if the individuals from a population taking part in the study were randomly selected.
  • A well designed experiment that randomly assigns experimental units to treatments allows inferences about cause and effect.
Courtesy of Pinterest

Practice Problem

A researcher is interested in studying the effectiveness of a new study technique on college students' grades. The researcher plans to recruit 100 students from a large university and randomly assign them to either the control group or the experimental group. The control group will receive the traditional study technique, while the experimental group will receive the new study technique.

At the end of the study, the researcher collects data on the students' grades and finds that the mean grade of the experimental group is significantly higher than the mean grade of the control group. The researcher concludes that the new study technique is more effective than the traditional technique.

Based on the experimental design described above, can the researcher generalize the results of the study to the larger population of college students? Explain your answer.

Answer

It's possible that the researcher could generalize the results of the study to the larger population of college students if the experimental design was well-controlled and the sample of 100 students was representative of the larger population.

One key factor to consider when determining whether the results of a study can be generalized to a larger population is the sampling method used. If the researcher used a random sampling method to recruit the students for the study, it is more likely that the sample of 100 students is representative of the larger population of college students. This would increase the validity of the study and allow the researcher to make more reliable conclusions about the effectiveness of the new study technique.

However, there are other factors that could affect the generalizability of the study's results. For example, if the experimental group and control group were not well-matched on important characteristics such as age, gender, or ability level, it could affect the results of the study. Additionally, if the study was conducted over a short period of time or in a limited location, it could limit the generalizability of the results.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

TermDefinition
experimental unitThe participants or subjects to which treatments are assigned in an experiment.
generalizeThe process of extending conclusions from an experiment conducted on a sample to a larger population.
random assignmentThe process of randomly allocating experimental units to different treatment groups to ensure unbiased distribution and reduce bias.
random samplingA method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population.
representativeA characteristic of a sample that accurately reflects the key features and distribution of the larger population from which it was drawn.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
statistically significantA result indicating that an observed difference is large enough that it is unlikely to have occurred by chance alone.
treatmentDifferent conditions assigned to experimental units in an experiment.

Frequently Asked Questions

How do I know if my experiment results are statistically significant?

You check statistical significance by asking whether the observed effect is unlikely to have happened by chance. Concretely: set hypotheses, choose α (common choices: 0.05 or 0.01), compute a test statistic and p-value (or run a randomization/simulation). If p-value < α, the result is statistically significant—you’d reject H0 and (with random assignment) can infer a causal effect (CED VAR-3.E.2–3). But also check conditions: was treatment randomly assigned? Is sample size and independence OK? Are assumptions for the test met (normal/t or large-sample z for proportions)? Report effect size and confidence interval so readers see practical importance (statistical ≠ practical significance). Consider replication and internal/external validity. For an AP-aligned walkthrough and examples, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and practice problems (https://library.fiveable.me/practice/ap-statistics).

What's the difference between random assignment and random selection in experiments?

Random assignment and random selection are different tools with different goals. - Random assignment: You randomly give treatments to experimental units (subjects). This balances unknown confounding variables across groups, so if you see a statistically significant difference it’s evidence the treatment caused the effect—that’s internal validity and causal inference (CED: VAR-3.E.2–3). Use in experiments, along with control groups, blinding, replication, and blocking. - Random selection: You randomly pick experimental units from a larger population. That helps make your sample representative so you can generalize results to that population—that’s external validity (CED: VAR-3.E.4). Random selection matters for inference to the population (confidence intervals, p-values) but by itself doesn’t let you claim causation unless you also randomize treatments. Quick rule: random assignment → cause (internal validity); random selection → generalize (external validity). For more on experiments and inference, check the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and the Unit 3 overview (https://library.fiveable.me/ap-statistics/unit-3). Practice more with problems at (https://library.fiveable.me/practice/ap-statistics).

When can I say that one treatment actually caused the effect I observed?

You can claim a treatment caused an effect when the study is a well-designed experiment that rules out other explanations. Key things to check (AP CED VAR-3.E): - Random assignment of treatments to experimental units—this breaks links to confounding variables and lets you make causal claims. - A control or placebo group (when appropriate) so you have a baseline. - Statistical significance—the observed difference is unlikely to be due to chance (p-value < α), so the effect is real from the sampled distribution (VAR-3.E.2–3). - Blinding/double-blind and replication—these reduce bias and increase internal validity. - If you want to generalize beyond your sample, the experimental units should be representative of the larger group (random selection helps)—that’s about external validity (VAR-3.E.4). If any of these are missing (no random assignment, major confounders, tiny sample, or lack of control), you should not conclude causation. For a quick AP refresher, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) or the Unit 3 overview (https://library.fiveable.me/ap-statistics/unit-3). For practice, try problems at (https://library.fiveable.me/practice/ap-statistics).

I'm confused about statistical inference - what does it mean to attribute conclusions to the distribution?

That phrase means your conclusion isn’t just about the people or numbers you actually measured—it’s a claim about the whole distribution (the population) that your sample came from. In AP terms: you use data (a sample) plus randomization and probability (sampling or random assignment) to make a statement about a population parameter (mean, proportion, slope) or about differences between treatment groups. If random assignment/random selection and other conditions hold, a small p-value or a large effect tells you the observed result is unlikely under the null distribution, so you infer the population distribution is different (statistical significance → evidence). Remember causal claims require a well-designed experiment (random assignment) and generalization needs representative units (external validity). For a concise walk-through of these ideas, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For more practice applying this, check the unit overview (https://library.fiveable.me/ap-statistics/unit-3) and practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I interpret the results of a well-designed experiment step by step?

Step-by-step cheat sheet to interpret a well-designed experiment (AP-style): 1. Check design basics: Was there random assignment to treatments, a control/placebo, blinding, and replication? Random assignment is what lets you make causal claims (VAR-3.E.2–3). 2. Verify conditions for inference: independence, appropriate sample size, and that randomization was done correctly. If these fail, your inference is weaker. 3. Look at the test result: compare the p-value to α (often 0.05). If p < α, the difference is statistically significant—unlikely due to chance (VAR-3.E.2). 4. State the conclusion in context: e.g., “There is convincing evidence that Treatment A causes a change in mean recovery time.” Use the language of cause only if random assignment was used (VAR-3.E.3). 5. Evaluate practical significance and effect size: a tiny but significant difference might not matter in real life. 6. Consider validity and scope: internal validity (was study controlled?) vs external validity (can you generalize?). Random selection of units strengthens generalization (VAR-3.E.4). 7. Check for confounding, interactions, or blocks that change interpretation. 8. Recommend replication and look for consistency across studies. For a quick review, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For more practice problems, try Fiveable’s practice bank (https://library.fiveable.me/practice/ap-statistics).

What's the formula or process for determining if differences between treatment groups are statistically significant?

Use a formal hypothesis test comparing treatments and decide by the p-value. Steps (short): 1. State hypotheses. For two treatments: H0: μ1 = μ2 (no effect) vs Ha: μ1 ≠ μ2 (two-sided) or directional as appropriate. For >2 groups use ANOVA: H0: all group means equal. 2. Check conditions from the CED: random assignment (allows causal inference), independence/replication, and approximate normality or large n (or similar spreads for ANOVA). 3. Compute test statistic: - Two-sample t (unequal variances): t = (x̄1 − x̄2) / sqrt(s1^2/n1 + s2^2/n2). - Two-sample pooled t (if equal σ): use pooled SE. - ANOVA: F = MSbetween / MSwithin. 4. Find p-value (calculator or tables). AP exam supplies tables/formulas and you may use a graphing calculator. 5. Decision: if p-value < α (common α = 0.05), reject H0 → difference is statistically significant. With proper random assignment, significance is evidence the treatment caused the effect (VAR-3.E). For practice and topic review see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and more practice problems (https://library.fiveable.me/practice/ap-statistics).

Can someone explain why random assignment lets me conclude that changes didn't happen by chance?

Random assignment doesn’t make chance go away—it makes the groups comparable so differences are likely caused by the treatment, not preexisting stuff. By randomly assigning units to treatments, you’re spreading out (on average) known and unknown confounding variables equally across groups. That gives internal validity: if groups start the same, any big difference after the experiment is probably due to the treatment. “Probably” is checked with inference: we use a sampling distribution (or randomization/simulation) to ask how likely the observed difference would be if the treatment had no effect. If that probability (the p-value) is very small, the difference is statistically significant (VAR-3.E.2–3.E.3). Replication and control/placebo/blinding strengthen this conclusion. For AP review, see Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and practice questions (https://library.fiveable.me/practice/ap-statistics).

When do I use random assignment vs random selection and what's the difference?

Random assignment vs. random selection—short version: they do different jobs. - Random assignment: you randomly give treatments to experimental units (e.g., flip a coin to decide who gets Drug A or placebo). This controls confounding, gives internal validity, and lets you make causal claims when you find statistically significant differences (CED VAR-3.E.2–3). Use it in experiments. - Random selection: you randomly pick units from a population (e.g., a simple random sample of 500 adults). That helps your sample be representative so you can generalize results to the larger group (external validity). Use it for surveys and when you want population-level inference. Key AP distinctions: experiments need random assignment for causal inference; random selection improves generalizability but doesn’t by itself prove causation. Good experiments often use both (randomly select units, then randomly assign treatments). For more on inference and experiments, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For extra practice, check Unit 3 and practice problems (https://library.fiveable.me/ap-statistics/unit-3) and (https://library.fiveable.me/practice/ap-statistics).

I don't understand when I can generalize my experiment results to a larger population.

You can generalize your experiment’s results to a larger population only when the experimental units are representative of that population. Random assignment of treatments (randomization) gives you internal validity and lets you make causal claims about the units in your study (VAR-3.E.2–3), but it doesn’t automatically make those units representative of some broader group. To get external validity (generalizability) you need random selection or another design that ensures your sample mirrors the population you care about (VAR-3.E.4). So: if you randomly selected participants from the target population and then randomly assigned treatments, you can both claim causation and generalize. If you only randomized assignment (e.g., volunteers from one clinic), you can claim causation for those participants but be cautious about generalizing. For more AP-aligned notes and examples, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I know if my experimental units are representative of the larger group I want to study?

You check representativeness by comparing your experimental units to the larger target population on key characteristics and on how they were chosen. If units were randomly selected from the target population (random selection), you have a much better chance of representativeness and can generalize results (external validity). If you only randomly assigned treatments to units you already had (random assignment) you get strong causal claims (internal validity) but not automatic generalizability. Practical checks: compare demographics, baseline measures, and the sampling frame to the population; look for selection bias (who was excluded or self-selected?); ensure sample size and replication are adequate; and consider stratifying or blocking to capture known subgroups. If the sample isn’t representative, qualify conclusions—say the causal effect holds for your sample but may not generalize. For AP framing, this is VAR-3.E: random selection improves external validity; random assignment supports causal inference. See the Topic 3.7 study guide for examples (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For extra practice, check unit review and practice sets on Fiveable (https://library.fiveable.me/ap-statistics/unit-3 and https://library.fiveable.me/practice/ap-statistics).

What makes a difference between treatment groups evidence that the treatment caused the effect?

You can only claim the treatment caused the effect in a well-designed experiment—and two things are key. 1) Random assignment of treatments to experimental units. That breaks links between lurking variables and treatment groups, so differences are unlikely to be due to preexisting differences. This is the CED’s core: randomization gives internal validity and supports causal inference (VAR-3.E.2, VAR-3.E.3). 2) The observed difference is statistically significant. That means the difference is large enough that, under the null hypothesis of no treatment effect, you'd rarely see it by chance (p-value < α, commonly 0.05). If p < α and the study was randomized, you can say the treatment likely caused the effect. Also check: use controls/placebo, blind or double-blind procedures to reduce bias, replicate with enough sample size, and avoid confounding by proper design. For AP review, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and practice problems (https://library.fiveable.me/practice/ap-statistics).

Why does random selection give me a better chance that my units will be representative?

Random selection gives you a better chance of a representative sample because it removes systematic bias and gives every unit a known, nonzero chance of being chosen. That matters for generalizing results (external validity): VAR-3.E.4 in the CED says random selection makes it more likely your experimental units reflect the larger population so your conclusions can be applied beyond the sample. Mechanically, random selection spreads out who you pick so over many possible samples the sample statistic centers on the true population parameter (think sampling distribution and unbiasedness). Nonrandom choices can over- or under-represent groups (selection bias), which skews estimates and ruins generalization even if your experiment has perfect internal validity (random assignment gives causal claims). For more AP-aligned review, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). Want practice? Fiveable has lots of practice problems for Unit 3 (https://library.fiveable.me/practice/ap-statistics).

I keep getting confused about causation vs correlation in experiments - how do I tell the difference?

Short answer: correlation = two variables move together; causation = one variable produces a change in another. How to tell the difference for AP Stats? Use the design. - If you have a well-designed experiment with random assignment of treatments to experimental units (and ideally a control/placebo, blinding, replication, and blocking to reduce confounding), then statistically significant differences between treatment groups are evidence the treatments caused the effect (VAR-3.E.2–3). Random assignment gives internal validity. - If the data come from an observational study (no random assignment), an association can be real but confounding variables may explain it—you cannot claim causation, only correlation. - Always check: was assignment random? Was there a control? Could a confound explain the link? Is the result statistically significant? For more on experimental inference and wording for AP questions, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For broader review and lots of practice problems, visit the unit page (https://library.fiveable.me/ap-statistics/unit-3) and practice bank (https://library.fiveable.me/practice/ap-statistics).

What are the steps to go from collecting experimental data to making conclusions about cause and effect?

Start with a clear design: define experimental units, treatments (including a control/placebo), and response. Use random assignment of treatments to units (and blocking if needed) so groups are comparable—that’s the key to internal validity and causal inference (VAR-3.E.2–3). Collect data with replication (enough units per treatment) and use blinding/double-blind to reduce bias. Check assumptions and analyze: compute appropriate test statistics, p-values, or confidence intervals and compare to your α to decide statistical significance (VAR-3.E.1–3). If differences are statistically significant and the experiment was well randomized and controlled, you can infer a causal effect. For external validity (generalizing), make sure experimental units are representative of the larger population (random selection helps; VAR-3.E.4). Finally, report effect size, uncertainty, and any possible confounding or interactions; replicate the study if possible. For a compact review of these steps tied to the AP CED, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v). For more unit review and practice, check Unit 3 (https://library.fiveable.me/ap-statistics/unit-3) and practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I know if the changes I observed in my experiment are too large to have happened by chance?

You check whether changes are “too large to have happened by chance” with a hypothesis test and a p-value. Set up H0 (no treatment effect) and Ha (there is an effect), choose a significance level (α—commonly 0.05 or 0.01), and use your experiment’s test statistic to get a p-value. If p ≤ α, the result is statistically significant: the observed change is unlikely under H0 and—because you used random assignment—you can infer the treatment likely caused the effect (CED VAR-3.E.2–3). If p > α, the change could plausibly be due to chance; consider sample size/power before concluding “no effect.” Always report context, the α you used, and whether random assignment and other conditions were met. For AP-style practice and walkthroughs on inference in experiments, see the Topic 3.7 study guide (https://library.fiveable.me/ap-statistics/unit-3/inference-experiments/study-guide/ijQtfZ5uUJiFJtYjB74v) and try problems at (https://library.fiveable.me/practice/ap-statistics).