๐Ÿ“ŠAP Statistics

Types of Statistical Errors

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical errors are at the heart of what makes hypothesis testing both powerful and risky. When you conduct a significance test, you're making a decision under uncertainty, and that means you can be wrong in predictable ways. The AP Statistics exam tests your ability to distinguish between Type I errors (false positives) and Type II errors (false negatives), understand what factors influence their probabilities, and explain the real-world consequences of each in context. You'll also need to recognize how sampling variability, bias, and confounding can undermine the validity of statistical conclusions.

Don't just memorize that "Type I = rejecting a true null." Understand why we set significance levels, how sample size affects power, and what trade-offs researchers face when designing studies. The exam frequently asks you to identify which error is more serious in a given context or to explain what would happen if you changed ฮฑ\alpha. Master the underlying logic, and you'll be ready for both multiple-choice questions and FRQs that ask you to interpret errors in real scenarios.


Errors in Hypothesis Testing Decisions

When you perform a hypothesis test, you're making a binary decision: reject H0H_0 or fail to reject H0H_0. Since you never know the true state of reality, either decision could be wrong.

Type I Error (False Positive)

  • Rejecting H0H_0 when it's actually true. You conclude there's an effect or difference when none exists.
  • Probability equals ฮฑ\alpha (the significance level), which you choose before conducting the test.
  • Real-world consequence: taking unnecessary action based on false evidence, such as approving an ineffective drug or implementing a costly policy change that does nothing.

Type II Error (False Negative)

  • Failing to reject H0H_0 when it's actually false. You miss a real effect that exists in the population.
  • Probability denoted by ฮฒ\beta, and power = 1โˆ’ฮฒ1 - \beta represents your ability to detect true effects.
  • Real-world consequence: missed opportunities or continued harm, such as failing to detect a disease outbreak or not recognizing an effective treatment.

Compare: Type I vs. Type II errors both involve incorrect conclusions, but Type I means seeing something that isn't there while Type II means missing something that is. On FRQs, always connect the error type to the specific context (e.g., "concluding the new drug lowers blood pressure when it actually doesn't" for Type I, or "failing to conclude the drug works when it really does" for Type II).


Factors That Influence Error Probabilities

The probabilities of Type I and Type II errors aren't fixed. They depend on choices you make and characteristics of your study.

Significance Level (ฮฑ\alpha)

  • Directly sets the Type I error rate. Choosing ฮฑ=0.05\alpha = 0.05 means you accept a 5% chance of rejecting a true null.
  • Trade-off with Type II error: decreasing ฮฑ\alpha makes it harder to reject H0H_0, which increases ฮฒ\beta. Think of it this way: the stricter your standard for "convincing evidence," the more likely you are to miss a real effect.
  • Context determines your choice. Use a smaller ฮฑ\alpha (like 0.01) when false positives are especially costly, such as convicting an innocent person. Use a larger ฮฑ\alpha when missing a real effect is the bigger concern.

Statistical Power

Power = 1โˆ’ฮฒ1 - \beta represents the probability of correctly rejecting a false null hypothesis. In plain terms, it's your test's ability to detect a real effect when one exists.

Four factors increase power:

  • Larger sample size (nn): reduces variability in your sampling distribution
  • Larger true effect size: a bigger real difference is easier to detect
  • Larger ฮฑ\alpha: a less strict threshold makes rejection easier (but raises Type I error risk)
  • Smaller population variability (ฯƒ\sigma): less noise means the signal stands out more

A target power of 0.80 is conventional, meaning you want at least an 80% chance of detecting a real effect.

Sample Size Effect

  • Larger nn decreases standard error, making it easier to detect true effects and reducing Type II error.
  • Does not change ฮฑ\alpha directly. The significance level is set by the researcher before the test.
  • Power calculations help determine the minimum sample size needed to detect a meaningful effect before you collect data.

Compare: Increasing ฮฑ\alpha vs. increasing nn both increase power, but increasing ฮฑ\alpha also increases Type I error risk while increasing nn reduces uncertainty without that trade-off. If an FRQ asks how to increase power without raising false positive risk, sample size is your answer.


Errors from Data Collection Problems

These errors occur before you even run a hypothesis test. They threaten the validity of your entire study.

Sampling Error

Sampling error is the random variation between a sample statistic and the true population parameter. Every sample you draw will give slightly different results, and that's expected.

  • Reduced by larger sample sizes but never eliminated entirely.
  • Not the same as bias: sampling error averages out over many samples, while bias does not.

Selection Bias

Selection bias is a systematic tendency for the sample to differ from the population in a particular direction. Your results can't be generalized because your sample doesn't represent the population.

  • Caused by non-random sampling: convenience samples (surveying only your friends), voluntary response (only motivated people reply), or undercoverage of certain groups.
  • Cannot be fixed by increasing sample size. A bigger biased sample is still biased. Only proper random sampling prevents this.

Measurement Error

Measurement error is inaccuracy in how variables are recorded, affecting both the validity and reliability of your data.

  • Sources include: faulty instruments, unclear survey questions (e.g., leading or confusing wording), or inconsistent data collection procedures across sites.
  • Threatens internal validity even when sampling is done correctly.

Compare: Sampling error vs. selection bias is a distinction the AP exam frequently tests. Sampling error is random variation that decreases with larger nn, while selection bias is systematic distortion that persists regardless of sample size. If a question asks what problem cannot be solved by collecting more data, think bias.


Errors in Interpreting Relationships

Even with good data and correct calculations, you can draw wrong conclusions about what the results mean.

Confounding Error

A confounding variable influences both the explanatory and response variables, creating a false appearance of causation. For example, ice cream sales and drowning rates both increase in summer. Temperature is the confounder; ice cream doesn't cause drowning.

  • Only randomized experiments can establish causation because random assignment balances confounders across treatment groups.
  • In observational studies: always consider alternative explanations for observed associations.

Simpson's Paradox

Simpson's Paradox occurs when a trend reverses or disappears when data is combined across groups. What's true for each subgroup isn't true for the whole dataset.

This happens because of unequal group sizes or a lurking variable related to both the grouping and the outcome. A classic example: a treatment might appear worse overall but actually perform better within every age group, because sicker (older) patients disproportionately received that treatment.

  • Solution: examine data at appropriate levels of stratification before drawing conclusions.

Regression to the Mean

Regression to the mean describes how extreme observations tend to be followed by less extreme ones, purely due to natural variability.

  • Mistaken for causation when an intervention is applied after extreme values are observed.
  • Example: students who score very low on one test often improve on the next, even without tutoring. Their first score was partly due to bad luck, and that bad luck is unlikely to repeat.

Compare: Confounding and Simpson's Paradox both involve hidden variables distorting conclusions, but confounding obscures a true relationship (or creates a false one), while Simpson's Paradox can actually reverse an apparent relationship when you look at subgroups.


Errors from Multiple Testing

Running many tests on the same data inflates your overall error rate in ways that single-test reasoning doesn't capture.

Multiple Comparison Error

Each test at ฮฑ=0.05\alpha = 0.05 has a 5% false positive rate individually. But if you run 20 tests and none of the effects are real, you'd expect about 1 false positive by chance alone.

The family-wise error rate accumulates: with kk independent tests, the probability of at least one Type I error is approximately 1โˆ’(1โˆ’ฮฑ)k1 - (1 - \alpha)^k. For 20 tests at ฮฑ=0.05\alpha = 0.05, that's about 1โˆ’(0.95)20โ‰ˆ0.641 - (0.95)^{20} \approx 0.64, or a 64% chance of at least one false positive.

Corrections like Bonferroni address this by using ฮฑ/k\alpha / k for each individual test, keeping the overall false positive rate near your desired ฮฑ\alpha.

Survivorship Bias

Survivorship bias occurs when you analyze only "successful" cases while ignoring failures, leading to systematically optimistic conclusions.

  • Common in business and finance: studying only companies that survived ignores those that failed using similar strategies, making those strategies look better than they are.
  • Prevention: ensure your sample includes all relevant cases, not just visible successes.

Compare: Multiple comparison error inflates false positives from running too many tests, while survivorship bias creates false conclusions by examining an incomplete dataset. Both require thinking beyond the data right in front of you.


Quick Reference Table

ConceptBest Examples
Hypothesis testing decisionsType I Error, Type II Error
Error probability factorsSignificance level (ฮฑ\alpha), Power (1โˆ’ฮฒ1-\beta), Sample size
Sampling and measurementSampling error, Selection bias, Measurement error
Relationship interpretationConfounding, Simpson's Paradox, Regression to the mean
Multiple testing issuesMultiple comparison error, Survivorship bias
Increases powerLarger nn, larger effect size, larger ฮฑ\alpha, smaller variability
Cannot be fixed by larger nnSelection bias, Confounding, Measurement error
Requires randomization to addressConfounding (for causal claims)

Self-Check Questions

  1. A researcher sets ฮฑ=0.01\alpha = 0.01 instead of ฮฑ=0.05\alpha = 0.05. How does this change affect the probability of Type I error? The probability of Type II error? Explain the trade-off.

  2. Which two errors both involve systematic problems that cannot be reduced by increasing sample size? What distinguishes them from sampling error?

  3. A study finds that a tutoring program improves test scores, but students were enrolled in the program after scoring below average. What statistical phenomenon might explain the improvement even if the program has no effect?

  4. Compare and contrast confounding and Simpson's Paradox. In what way do both involve "hidden" variables, and how do their effects on conclusions differ?

  5. FRQ-style: A pharmaceutical company tests 20 different drug compounds for effectiveness, using ฮฑ=0.05\alpha = 0.05 for each test. If none of the drugs actually work, approximately how many false positives would you expect? What adjustment could the company make to control the overall Type I error rate?

Types of Statistical Errors to Know for AP Statistics