Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
When your data refuses to play by the rules—skewed distributions, ordinal measurements, small samples, or stubborn outliers—nonparametric tests become your best friends. These methods don't require the normality assumptions that parametric tests demand, making them essential tools when real-world data gets messy. You're being tested on understanding when to choose nonparametric over parametric approaches, how these tests use ranks instead of raw values, and why they sacrifice some statistical power for greater flexibility.
The core principle uniting these methods is elegantly simple: rank the data and work with those ranks. This transformation strips away the influence of extreme values and distributional quirks while preserving the essential ordering information. Whether you're comparing groups, measuring associations, or testing distributional assumptions, mastering nonparametric tests means knowing which tool fits which scenario—and being able to justify that choice on an FRQ. Don't just memorize test names; understand what each test's parametric equivalent is and what assumptions you're escaping by going nonparametric.
The most fundamental distinction in group comparisons is whether your observations are linked (paired/related) or completely separate (independent). Paired designs control for individual variability by using each subject as their own control, while independent designs compare entirely different subjects.
Compare: Wilcoxon Signed-Rank vs. Sign Test—both handle paired data, but Wilcoxon uses magnitude information (ranks of differences) while the Sign Test only counts directions. If your FRQ mentions outliers or asks about the most robust option, the Sign Test is your answer; if it asks about power, Wilcoxon wins.
When you have three or more groups to compare, you need tests that generalize the two-group methods. The key distinction remains whether groups are independent or related (repeated measures/blocked designs).
Compare: Kruskal-Wallis vs. Friedman—both extend to 3+ groups, but Kruskal-Wallis assumes independence while Friedman requires related/blocked data. Think of Kruskal-Wallis as "stacked Mann-Whitney" and Friedman as "stacked Wilcoxon Signed-Rank."
When examining relationships between two variables, rank correlations provide robust alternatives to Pearson's . These methods assess monotonic relationships—whether variables consistently increase or decrease together—rather than strictly linear ones.
Compare: Spearman's vs. Kendall's —both measure monotonic association, but Spearman transforms to ranks then correlates, while Kendall directly counts agreement/disagreement between pairs. Kendall's is preferred for small samples; Spearman's is more intuitive and commonly reported.
Sometimes you need to test whether your data follows a specific distribution or whether two samples come from the same population. These tests examine the entire shape of distributions rather than just central tendency.
Compare: Kolmogorov-Smirnov vs. Mann-Whitney U—both can compare two samples, but K-S tests whether distributions are identical in any way (shape, spread, location), while Mann-Whitney specifically tests for location shift. Use K-S when you care about the whole distribution; use Mann-Whitney when you're focused on central tendency.
Modern computing enables powerful nonparametric approaches that build reference distributions directly from your data. These methods make minimal assumptions and provide exact or approximate inference through computational brute force.
Compare: Permutation Tests vs. Bootstrap—permutation tests shuffle labels to test hypotheses under the null, while bootstrap resamples to estimate the variability of statistics. Use permutation for hypothesis testing ("is there a difference?"); use bootstrap for estimation ("what's the confidence interval?").
| Concept | Best Examples |
|---|---|
| Paired two-group comparison | Wilcoxon Signed-Rank, Sign Test |
| Independent two-group comparison | Mann-Whitney U |
| Multiple independent groups | Kruskal-Wallis |
| Multiple related groups/blocked designs | Friedman Test |
| Rank-based correlation | Spearman's , Kendall's |
| Distributional goodness of fit | Kolmogorov-Smirnov |
| Hypothesis testing via resampling | Permutation Tests |
| Confidence intervals via resampling | Bootstrap Methods |
You have pre-test and post-test scores from 15 participants, but the differences are heavily skewed with two extreme outliers. Which nonparametric test would be most robust, and which would have more power if the outliers weren't so extreme?
A researcher wants to compare customer satisfaction ratings (on a 1-5 scale) across four different store locations with different customers at each location. Which test should they use, and what's the parametric equivalent they're avoiding?
Compare and contrast Spearman's rank correlation and Kendall's tau: What do they both measure, how do their calculations differ, and when might you prefer one over the other?
An FRQ asks you to test whether a sample of reaction times comes from an exponential distribution. Which nonparametric test is designed for this type of question, and what does its test statistic represent?
Explain why permutation tests and bootstrap methods are both called "resampling methods" but serve fundamentally different purposes. Give a scenario where each would be the appropriate choice.