Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Statistical errors are at the heart of what makes hypothesis testing both powerful and risky. When you conduct a significance test, you're making a decision under uncertainty, and that means you can be wrong in predictable ways. The AP Statistics exam tests your ability to distinguish between Type I errors (false positives) and Type II errors (false negatives), understand what factors influence their probabilities, and explain the real-world consequences of each in context. You'll also need to recognize how sampling variability, bias, and confounding can undermine the validity of statistical conclusions.
Don't just memorize that "Type I = rejecting a true null." Understand why we set significance levels, how sample size affects power, and what trade-offs researchers face when designing studies. The exam frequently asks you to identify which error is more serious in a given context or to explain what would happen if you changed . Master the underlying logic, and you'll be ready for both multiple-choice questions and FRQs that ask you to interpret errors in real scenarios.
When you perform a hypothesis test, you're making a binary decision: reject or fail to reject . Since you never know the true state of reality, either decision could be wrong.
Compare: Type I vs. Type II errors both involve incorrect conclusions, but Type I means seeing something that isn't there while Type II means missing something that is. On FRQs, always connect the error type to the specific context (e.g., "concluding the new drug lowers blood pressure when it actually doesn't" for Type I, or "failing to conclude the drug works when it really does" for Type II).
The probabilities of Type I and Type II errors aren't fixed. They depend on choices you make and characteristics of your study.
Power = represents the probability of correctly rejecting a false null hypothesis. In plain terms, it's your test's ability to detect a real effect when one exists.
Four factors increase power:
A target power of 0.80 is conventional, meaning you want at least an 80% chance of detecting a real effect.
Compare: Increasing vs. increasing both increase power, but increasing also increases Type I error risk while increasing reduces uncertainty without that trade-off. If an FRQ asks how to increase power without raising false positive risk, sample size is your answer.
These errors occur before you even run a hypothesis test. They threaten the validity of your entire study.
Sampling error is the random variation between a sample statistic and the true population parameter. Every sample you draw will give slightly different results, and that's expected.
Selection bias is a systematic tendency for the sample to differ from the population in a particular direction. Your results can't be generalized because your sample doesn't represent the population.
Measurement error is inaccuracy in how variables are recorded, affecting both the validity and reliability of your data.
Compare: Sampling error vs. selection bias is a distinction the AP exam frequently tests. Sampling error is random variation that decreases with larger , while selection bias is systematic distortion that persists regardless of sample size. If a question asks what problem cannot be solved by collecting more data, think bias.
Even with good data and correct calculations, you can draw wrong conclusions about what the results mean.
A confounding variable influences both the explanatory and response variables, creating a false appearance of causation. For example, ice cream sales and drowning rates both increase in summer. Temperature is the confounder; ice cream doesn't cause drowning.
Simpson's Paradox occurs when a trend reverses or disappears when data is combined across groups. What's true for each subgroup isn't true for the whole dataset.
This happens because of unequal group sizes or a lurking variable related to both the grouping and the outcome. A classic example: a treatment might appear worse overall but actually perform better within every age group, because sicker (older) patients disproportionately received that treatment.
Regression to the mean describes how extreme observations tend to be followed by less extreme ones, purely due to natural variability.
Compare: Confounding and Simpson's Paradox both involve hidden variables distorting conclusions, but confounding obscures a true relationship (or creates a false one), while Simpson's Paradox can actually reverse an apparent relationship when you look at subgroups.
Running many tests on the same data inflates your overall error rate in ways that single-test reasoning doesn't capture.
Each test at has a 5% false positive rate individually. But if you run 20 tests and none of the effects are real, you'd expect about 1 false positive by chance alone.
The family-wise error rate accumulates: with independent tests, the probability of at least one Type I error is approximately . For 20 tests at , that's about , or a 64% chance of at least one false positive.
Corrections like Bonferroni address this by using for each individual test, keeping the overall false positive rate near your desired .
Survivorship bias occurs when you analyze only "successful" cases while ignoring failures, leading to systematically optimistic conclusions.
Compare: Multiple comparison error inflates false positives from running too many tests, while survivorship bias creates false conclusions by examining an incomplete dataset. Both require thinking beyond the data right in front of you.
| Concept | Best Examples |
|---|---|
| Hypothesis testing decisions | Type I Error, Type II Error |
| Error probability factors | Significance level (), Power (), Sample size |
| Sampling and measurement | Sampling error, Selection bias, Measurement error |
| Relationship interpretation | Confounding, Simpson's Paradox, Regression to the mean |
| Multiple testing issues | Multiple comparison error, Survivorship bias |
| Increases power | Larger , larger effect size, larger , smaller variability |
| Cannot be fixed by larger | Selection bias, Confounding, Measurement error |
| Requires randomization to address | Confounding (for causal claims) |
A researcher sets instead of . How does this change affect the probability of Type I error? The probability of Type II error? Explain the trade-off.
Which two errors both involve systematic problems that cannot be reduced by increasing sample size? What distinguishes them from sampling error?
A study finds that a tutoring program improves test scores, but students were enrolled in the program after scoring below average. What statistical phenomenon might explain the improvement even if the program has no effect?
Compare and contrast confounding and Simpson's Paradox. In what way do both involve "hidden" variables, and how do their effects on conclusions differ?
FRQ-style: A pharmaceutical company tests 20 different drug compounds for effectiveness, using for each test. If none of the drugs actually work, approximately how many false positives would you expect? What adjustment could the company make to control the overall Type I error rate?