Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Hypothesis testing is how we move from "I think there's a pattern here" to "I can demonstrate this effect is real." Every time you determine whether results are statistically significant, compare group means, or assess whether data fits an expected distribution, you're applying these tests.
Don't just memorize test names and formulas. Focus on when to use each test, what assumptions must hold, and how to interpret the results. Know the difference between parametric and non-parametric approaches, understand why sample size matters, and recognize the relationship between tests that serve similar purposes under different conditions.
These tests answer a fundamental question: Is the difference between group means large enough to be meaningful, or just random noise? They assume your data follows a normal distribution and use that structure to calculate precise probabilities.
The T-test is the workhorse of hypothesis testing because in most real situations, you don't know the population standard deviation. You estimate it from your sample using .
Compare: Z-test vs. T-test: both compare means to detect significant differences, but Z-tests require known population variance while T-tests estimate it from sample data. If you're given , think Z-test; if you're given , think T-test. As sample size grows large, the t-distribution approaches the standard normal, so the two tests converge.
Running multiple T-tests to compare several groups inflates your Type I error rate (the chance of a false positive). ANOVA solves this by comparing all group means simultaneously in a single test.
Compare: T-test vs. ANOVA: T-tests handle two-group comparisons, while ANOVA extends this to three or more groups. If a problem asks you to compare multiple treatment conditions, ANOVA is your answer.
Sometimes the question isn't about means. It's about spread or association. These tests examine whether variability differs between groups or whether variables move together in predictable ways.
Compare: F-test vs. ANOVA: both use the F-distribution, but F-tests compare two variances directly while ANOVA uses F-statistics to compare variation between groups to variation within groups. The ANOVA F-statistic is , where MS stands for mean square.
When your data consists of counts or categories rather than continuous measurements, you need tests designed for frequencies. These compare what you observed to what you'd expect under the null hypothesis.
There are two main versions, and knowing which one applies matters:
Compare: Chi-square test vs. T-test: Chi-square handles categorical data (counts in categories), while T-tests handle continuous data (measurements on a numerical scale). If your data involves frequencies or proportions in a table, Chi-square is your tool.
When your data violates normality assumptions or uses ordinal scales (ranked data like survey ratings), non-parametric tests step in. They work with ranks rather than raw values, making fewer assumptions about the underlying distribution.
Compare: Wilcoxon vs. Kruskal-Wallis: Wilcoxon handles two-group comparisons (parallels the T-test), while Kruskal-Wallis extends to three or more groups (parallels ANOVA). Both are rank-based and don't require normality.
Before applying parametric tests, you should verify that the normality assumption holds. These tests assess whether your data plausibly comes from a normal distribution. Note that for both tests, the null hypothesis is that the data is normal, so a small p-value means you reject normality.
Compare: Shapiro-Wilk vs. Kolmogorov-Smirnov: Shapiro-Wilk is more powerful for normality testing, while K-S is more flexible for testing against other distributions. Use Shapiro-Wilk for normality checks unless you need to test a different reference distribution.
| Scenario | Best Test(s) |
|---|---|
| Comparing means (2 groups) | Z-test, T-test (two-sample), Paired T-test |
| Comparing means (3+ groups) | ANOVA (one-way, two-way) |
| Comparing variances | F-test |
| Modeling relationships | Regression analysis (simple, multiple) |
| Categorical data | Chi-square test (goodness-of-fit or independence) |
| Non-parametric (2 groups) | Wilcoxon rank-sum / Mann-Whitney U test |
| Non-parametric (3+ groups) | Kruskal-Wallis test |
| Testing normality | Shapiro-Wilk test, Kolmogorov-Smirnov test |
You have three treatment groups and want to compare their means, but a Shapiro-Wilk test shows significant non-normality. Which test should you use instead of ANOVA, and why?
Compare the Z-test and T-test: what assumption distinguishes when you'd use each, and how does sample size factor into this decision?
A researcher wants to determine if gender and voting preference are independent. Which test is appropriate, and what would the null hypothesis state?
Which two tests serve as non-parametric alternatives to the T-test and ANOVA, respectively? What do they have in common methodologically?
A before-and-after study measures the same subjects twice. Which specific type of T-test applies, and why would a two-sample T-test be inappropriate here?