Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Statistical fallacies are the hidden traps that turn good data into bad conclusions—and you're being tested on your ability to spot them. These errors show up everywhere: in research studies, news headlines, business decisions, and yes, on your exams. Understanding fallacies isn't just about avoiding mistakes; it's about demonstrating mastery of core statistical principles like independence, sampling theory, conditional probability, and the distinction between correlation and causation.
Here's the key insight: each fallacy represents a violation of a specific statistical principle. When you encounter a fallacy question, you're really being asked to identify which principle was broken. Don't just memorize the names—know what concept each fallacy illustrates and why the reasoning fails. That's what separates a 5 from a 3.
These fallacies involve misunderstanding how variables relate to each other. The core principle: association between variables tells us nothing about the direction or mechanism of influence without proper experimental design.
Compare: Correlation ≠ Causation vs. Simpson's Paradox—both involve misreading relationships between variables, but correlation errors ignore confounders while Simpson's Paradox involves confounders that reverse apparent effects when data is stratified. If an FRQ presents aggregated vs. disaggregated data showing opposite trends, that's Simpson's Paradox.
These fallacies stem from misunderstanding how probability works, especially regarding independence and conditional probability. The principle: past outcomes of independent events provide zero information about future outcomes.
Compare: Gambler's Fallacy vs. Base Rate Fallacy—both involve probability errors, but gambler's fallacy misunderstands independence while base rate fallacy misunderstands conditional probability. The gambler ignores that events are independent; the base rate ignorer fails to weight prior probabilities correctly.
These fallacies occur when the data we analyze doesn't represent the population we care about. The principle: conclusions are only valid for the population from which we properly sampled.
Compare: Survivorship Bias vs. Sampling Bias—both produce unrepresentative data, but survivorship bias specifically excludes failures/non-survivors while sampling bias can skew in any direction depending on the selection mechanism. Survivorship bias is a specific type of selection bias with a predictable direction (toward success).
These fallacies involve how we handle and interpret data after collection. The principle: honest analysis requires considering all relevant evidence and building models that generalize, not just fit.
Compare: Cherry-Picking vs. Overfitting—both lead to conclusions that won't replicate, but cherry-picking is a data selection problem while overfitting is a model complexity problem. Cherry-picking manipulates which data enters analysis; overfitting manipulates how flexibly the model conforms to that data.
| Concept | Best Examples |
|---|---|
| Causation errors | Correlation ≠ Causation, Simpson's Paradox |
| Probability misunderstanding | Gambler's Fallacy, Base Rate Fallacy |
| Selection/sampling problems | Survivorship Bias, Sampling Bias, Ecological Fallacy |
| Natural variation | Regression to the Mean |
| Data manipulation | Cherry-Picking |
| Model complexity | Overfitting |
| Aggregation problems | Simpson's Paradox, Ecological Fallacy |
| Independence violations | Gambler's Fallacy |
A company notices that employees who attend training sessions have higher performance reviews. They conclude the training is effective. Which two fallacies might be at play, and how would you design a study to establish causation?
Compare and contrast survivorship bias and sampling bias. Both involve unrepresentative data—what distinguishes when each applies?
A basketball player makes 10 free throws in a row. Her coach benches her for the next game, and she only makes 6 of 10. The coach claims the rest helped her "come back to earth." What fallacy explains the decline without invoking the coach's theory?
A rare disease affects 1 in 10,000 people. A test is 99% accurate (both sensitivity and specificity). If someone tests positive, why might they still probably not have the disease? Which fallacy does ignoring this represent?
An analyst builds a model with 50 predictor variables that explains 98% of variance in historical stock returns but performs terribly on new data. Identify the fallacy and explain what statistical principle was violated.