upgrade
upgrade

📊AP Statistics

Types of Statistical Errors

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical errors are at the heart of what makes hypothesis testing both powerful and risky. When you conduct a significance test, you're making a decision under uncertainty—and that means you can be wrong in predictable ways. The AP Statistics exam tests your ability to distinguish between Type I errors (false positives) and Type II errors (false negatives), understand what factors influence their probabilities, and explain the real-world consequences of each in context. You'll also need to recognize how sampling variability, bias, and confounding can undermine the validity of statistical conclusions.

Don't just memorize that "Type I = rejecting a true null"—understand why we set significance levels, how sample size affects power, and what trade-offs researchers face when designing studies. The exam loves asking you to identify which error is more serious in a given context or to explain what would happen if you changed α. Master the underlying logic, and you'll be ready for both multiple-choice questions and FRQs that ask you to interpret errors in real scenarios.


Errors in Hypothesis Testing Decisions

When you perform a hypothesis test, you're making a binary decision: reject H0H_0 or fail to reject H0H_0. Since you never know the true state of reality, either decision could be wrong.

Type I Error (False Positive)

  • Rejecting H0H_0 when it's actually true—you conclude there's an effect or difference when none exists
  • Probability equals α\alpha (the significance level), which you choose before conducting the test
  • Real-world consequence: taking unnecessary action based on false evidence, such as approving an ineffective drug or implementing a costly policy change

Type II Error (False Negative)

  • Failing to reject H0H_0 when it's actually false—you miss a real effect that exists in the population
  • Probability denoted by β\beta, and power = 1β1 - \beta represents your ability to detect true effects
  • Real-world consequence: missed opportunities or continued harm, such as failing to detect a disease outbreak or not recognizing an effective treatment

Compare: Type I vs. Type II errors—both involve incorrect conclusions, but Type I means seeing something that isn't there while Type II means missing something that is. On FRQs, always connect the error type to the specific context (e.g., "concluding the new drug works when it doesn't" for Type I).


Factors That Influence Error Probabilities

The probabilities of Type I and Type II errors aren't fixed—they depend on choices you make and characteristics of your study.

Significance Level (α\alpha)

  • Directly sets the Type I error rate—choosing α=0.05\alpha = 0.05 means you accept a 5% chance of rejecting a true null
  • Trade-off with Type II error: decreasing α\alpha makes it harder to reject H0H_0, which increases β\beta
  • Context determines choice: use smaller α\alpha (like 0.01) when false positives are especially costly

Statistical Power

  • Power = 1β1 - \beta represents the probability of correctly rejecting a false null hypothesis
  • Increased by: larger sample size nn, larger effect size, larger α\alpha, and smaller variability
  • Target power of 0.80 is conventional, meaning you want at least an 80% chance of detecting a real effect

Sample Size Effect

  • Larger nn decreases standard error, making it easier to detect true effects and reducing Type II error
  • Does not change α\alpha directly—that's set by the researcher before the test
  • Power calculations help determine the minimum sample size needed to detect a meaningful effect

Compare: Increasing α\alpha vs. increasing nn—both increase power, but increasing α\alpha also increases Type I error risk while increasing nn reduces uncertainty without that trade-off. If an FRQ asks how to increase power without raising false positive risk, sample size is your answer.


Errors from Data Collection Problems

These errors occur before you even run a hypothesis test—they threaten the validity of your entire study.

Sampling Error

  • Random variation between sample statistic and population parameter—this is expected and unavoidable
  • Reduced by larger sample sizes but never eliminated entirely
  • Not the same as bias: sampling error averages out over many samples, while bias does not

Selection Bias

  • Systematic tendency for the sample to differ from the population—results cannot be generalized
  • Caused by non-random sampling: convenience samples, voluntary response, or undercoverage of certain groups
  • Cannot be fixed by increasing sample size—only proper random sampling prevents this error

Measurement Error

  • Inaccuracy in how variables are recorded, affecting both validity and reliability of data
  • Sources include: faulty instruments, unclear survey questions, or inconsistent data collection procedures
  • Threatens internal validity even when sampling is done correctly

Compare: Sampling error vs. selection bias—sampling error is random variation that decreases with larger nn, while selection bias is systematic distortion that persists regardless of sample size. The AP exam frequently tests whether students understand this distinction.


Errors in Interpreting Relationships

Even with good data and correct calculations, you can draw wrong conclusions about what the results mean.

Confounding Error

  • A lurking variable influences both the explanatory and response variables, creating a false appearance of causation
  • Only randomized experiments can establish causation because random assignment balances confounders
  • In observational studies: always consider alternative explanations for observed associations

Simpson's Paradox

  • A trend reverses or disappears when data is aggregated across groups—what's true for parts isn't true for the whole
  • Caused by unequal group sizes or a confounding variable related to both the grouping and the outcome
  • Solution: examine data at appropriate levels of stratification before drawing conclusions

Regression to the Mean

  • Extreme observations tend to be followed by less extreme ones—not because of any treatment effect
  • Mistaken for causation when an intervention is applied after extreme values are observed
  • Example: students who score very low on one test often improve on the next, even without tutoring

Compare: Confounding vs. Simpson's Paradox—both involve hidden variables distorting conclusions, but confounding obscures a relationship while Simpson's Paradox can actually reverse an apparent relationship when you look at subgroups.


Errors from Multiple Testing

Running many tests on the same data inflates your overall error rate in ways that single-test reasoning doesn't capture.

Multiple Comparison Error

  • Each test at α=0.05\alpha = 0.05 has a 5% false positive rate, but running 20 tests gives roughly one false positive by chance
  • Family-wise error rate accumulates: with kk independent tests, overall Type I error ≈ 1(1α)k1 - (1-\alpha)^k
  • Corrections like Bonferroni (use α/k\alpha/k for each test) help control the overall false positive rate

Survivorship Bias

  • Analyzing only "successful" cases while ignoring failures leads to systematically optimistic conclusions
  • Common in business and finance: studying only companies that survived ignores those that failed using similar strategies
  • Prevention: ensure your sample includes all relevant cases, not just visible successes

Compare: Multiple comparison error vs. survivorship bias—multiple comparison error inflates false positives from running too many tests, while survivorship bias creates false conclusions by examining an incomplete dataset. Both require thinking beyond the data you can see.


Quick Reference Table

ConceptBest Examples
Hypothesis testing decisionsType I Error, Type II Error
Error probability factorsSignificance level (α\alpha), Power (1β1-\beta), Sample size
Sampling and measurementSampling error, Selection bias, Measurement error
Relationship interpretationConfounding, Simpson's Paradox, Regression to the mean
Multiple testing issuesMultiple comparison error, Survivorship bias
Increases powerLarger nn, larger effect size, larger α\alpha, smaller variability
Cannot be fixed by larger nnSelection bias, Confounding, Measurement error
Requires randomization to addressConfounding (for causal claims)

Self-Check Questions

  1. A researcher sets α=0.01\alpha = 0.01 instead of α=0.05\alpha = 0.05. How does this change affect the probability of Type I error? The probability of Type II error? Explain the trade-off.

  2. Which two errors both involve systematic problems that cannot be reduced by increasing sample size? What distinguishes them from sampling error?

  3. A study finds that a tutoring program improves test scores, but students were enrolled in the program after scoring below average. What statistical phenomenon might explain the improvement even if the program has no effect?

  4. Compare and contrast confounding and Simpson's Paradox. In what way do both involve "hidden" variables, and how do their effects on conclusions differ?

  5. FRQ-style: A pharmaceutical company tests 20 different drug compounds for effectiveness, using α=0.05\alpha = 0.05 for each test. If none of the drugs actually work, approximately how many false positives would you expect? What adjustment could the company make to control the overall Type I error rate?