Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Hypothesis testing is the backbone of statistical inference—it's how you move from "I think there's a pattern here" to "I can confidently claim this effect is real." Every time you see a study claiming a new drug works, a policy made a difference, or two groups behave differently, hypothesis testing is doing the heavy lifting behind the scenes. You're being tested on your ability to choose the right test for the right situation, interpret results correctly, and understand what "statistically significant" actually means.
The methods in this guide aren't just formulas to memorize—they represent different tools for different jobs. Some compare means, others compare variances or distributions. Some require your data to be normally distributed; others don't care. The key concepts you need to master include parametric vs. non-parametric approaches, comparing means vs. variances vs. distributions, and when assumptions matter. Don't just memorize which test does what—know why you'd reach for one tool instead of another.
Most hypothesis tests you'll encounter ask a simple question: are these means different? The tests below handle this question under different conditions—known vs. unknown variance, one group vs. two, independent vs. paired observations.
Compare: Z-test vs. T-test—both compare means to a reference value, but Z-tests require known population variance while T-tests estimate it from sample data. On exams, if they give you , think Z-test; if they give you , think T-test.
When you have three or more groups, running multiple T-tests inflates your Type I error rate. ANOVA solves this by testing all groups simultaneously using variance decomposition—comparing variation between groups to variation within groups.
Compare: T-test vs. ANOVA—T-tests handle two groups; ANOVA handles three or more. If an FRQ gives you multiple treatment conditions, ANOVA is your go-to. Remember: ANOVA tells you something differs but not what—you need post-hoc tests for that.
These methods go beyond "are groups different?" to ask "how are variables related?" and "which model explains the data better?" Regression quantifies relationships; likelihood ratio tests compare competing explanations.
Compare: Regression coefficients vs. Likelihood ratio tests—coefficient tests ask "does this one predictor matter?" while likelihood ratio tests ask "does this set of predictors improve the model?" Use likelihood ratios when comparing models with different numbers of parameters.
Not all data is continuous. When you're working with counts, categories, or frequencies, you need tests designed for discrete distributions. The chi-square test is your primary tool here.
Compare: Chi-square vs. T-test—Chi-square handles categorical data (counts in categories); T-tests handle continuous data (measured values). If your data is "how many people chose option A vs. B," think chi-square. If it's "what was the average score," think T-test.
Parametric tests (Z, T, ANOVA) assume your data is normally distributed. When that assumption is violated—small samples, skewed distributions, ordinal data—non-parametric tests save the day by working with ranks instead of raw values.
Compare: Wilcoxon vs. Kolmogorov-Smirnov—both are non-parametric, but Wilcoxon focuses on whether one group tends to have larger values (like a median comparison), while K-S tests whether the entire distribution shapes match. Wilcoxon is your T-test replacement; K-S is for distribution comparison.
When you can't rely on theoretical distributions or your sample is unusual, bootstrap methods let you empirically estimate sampling distributions by repeatedly resampling your own data.
Compare: Bootstrap vs. Traditional tests—traditional tests use theoretical distributions (Z, T, F, chi-square); bootstrap builds the distribution empirically from your data. When exam questions mention "violated assumptions" or "unusual statistics," bootstrap is often the answer.
| Concept | Best Examples |
|---|---|
| Comparing one mean to a known value | Z-test (variance known), One-sample T-test (variance unknown) |
| Comparing two independent means | Two-sample T-test, Wilcoxon rank-sum |
| Comparing paired/dependent observations | Paired T-test |
| Comparing three or more means | One-way ANOVA, Two-way ANOVA |
| Comparing variances | F-test |
| Testing categorical associations | Chi-square test |
| Modeling relationships between variables | Simple regression, Multiple regression |
| Comparing model fit | Likelihood ratio test |
| Distribution comparison (non-parametric) | Kolmogorov-Smirnov test |
| Assumption-free inference | Bootstrap methods |
You have two independent groups and non-normal data with several outliers. Which two tests could you use, and why might you prefer the non-parametric option?
A researcher wants to test whether a new teaching method improves scores by measuring the same students before and after the intervention. Which test is appropriate, and why would a two-sample T-test be incorrect here?
Compare and contrast one-way ANOVA and the two-sample T-test. Under what conditions does ANOVA become necessary, and what additional information does two-way ANOVA provide?
An FRQ presents count data showing how many customers preferred each of four product designs across three age groups. Which test would you use, and what assumption must you verify before proceeding?
Your data violates normality assumptions, and you need to construct a 95% confidence interval for the median. Which method allows you to do this without relying on theoretical distributions, and briefly explain how it works?