Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Hypothesis testing is the backbone of statistical inference—it's how we move from "I think there's a pattern here" to "I can demonstrate this effect is real." Every time you're asked to determine whether results are statistically significant, compare group means, or assess whether data fits an expected distribution, you're applying these tests. The AP exam will test your ability to choose the right test for a given scenario, interpret results correctly, and understand the assumptions that make each test valid.
Don't just memorize test names and formulas. You're being tested on when to use each test, what assumptions must hold, and how to interpret the results. Know the difference between parametric and non-parametric approaches, understand why sample size matters, and recognize the relationship between tests that serve similar purposes under different conditions. Master these distinctions, and you'll handle any hypothesis testing question with confidence.
These tests answer the fundamental question: Is the difference between group means large enough to be meaningful, or just random noise? They assume your data follows a normal distribution and use that structure to calculate precise probabilities.
Compare: Z-test vs. T-test—both compare means to detect significant differences, but Z-tests require known population variance while T-tests estimate it from sample data. On the AP exam, if you're given , think Z-test; if you're given , think T-test.
Compare: T-test vs. ANOVA—T-tests handle two-group comparisons, while ANOVA extends this to three or more groups. If an FRQ asks you to compare multiple treatment conditions, ANOVA is your answer; follow up with post-hoc tests to identify which specific groups differ.
Sometimes the question isn't about means—it's about spread or association. These tests examine whether variability differs between groups or whether variables move together in predictable ways.
Compare: F-test vs. ANOVA—both use the F-distribution, but F-tests compare two variances directly while ANOVA uses F-statistics to compare variation between groups to variation within groups. Understanding this distinction helps you interpret ANOVA output correctly.
When your data consists of counts or categories rather than continuous measurements, you need tests designed for frequencies. These tests compare what you observed to what you'd expect under the null hypothesis.
Compare: Chi-square test vs. T-test—Chi-square handles categorical data (counts in categories), while T-tests handle continuous data (measurements). If your data involves frequencies or proportions in a contingency table, Chi-square is your tool.
When your data violates normality assumptions or uses ordinal scales, non-parametric tests save the day. They work with ranks rather than raw values, making fewer assumptions about the underlying distribution.
Compare: Wilcoxon vs. Kruskal-Wallis—Wilcoxon handles two-group comparisons (like the T-test), while Kruskal-Wallis extends to three or more groups (like ANOVA). Both are rank-based and assumption-free regarding distribution shape.
Before applying parametric tests, you need to verify their assumptions hold. These tests specifically assess whether your data comes from a normal distribution.
Compare: Shapiro-Wilk vs. Kolmogorov-Smirnov—both test normality, but Shapiro-Wilk is more powerful for this specific purpose while K-S is more flexible for testing against other distributions. Choose Shapiro-Wilk for normality checks unless you need to test a different distribution.
| Concept | Best Examples |
|---|---|
| Comparing means (2 groups) | Z-test, T-test (two-sample), Paired T-test |
| Comparing means (3+ groups) | ANOVA (one-way, two-way) |
| Comparing variances | F-test |
| Modeling relationships | Regression analysis (simple, multiple) |
| Categorical data | Chi-square test |
| Non-parametric (2 groups) | Wilcoxon rank-sum test |
| Non-parametric (3+ groups) | Kruskal-Wallis test |
| Testing normality | Shapiro-Wilk test, Kolmogorov-Smirnov test |
You have three treatment groups and want to compare their means, but a Shapiro-Wilk test shows significant non-normality. Which test should you use instead of ANOVA, and why?
Compare and contrast the Z-test and T-test: What assumption distinguishes when you'd use each, and how does sample size factor into this decision?
A researcher wants to determine if gender and voting preference are independent. Which test is appropriate, and what would the null hypothesis state?
Which two tests serve as non-parametric alternatives to the T-test and ANOVA, respectively? What do they have in common in terms of methodology?
An FRQ presents a before-and-after study design measuring the same subjects twice. Which specific type of T-test applies, and why would a two-sample T-test be inappropriate here?