Why This Matters
Choosing the right statistical test is one of the most critical skills you'll develop in Honors Statistics—and it's exactly what exam questions target. You're not just being tested on whether you can calculate a test statistic; you're being tested on whether you understand when to use each test and why that test fits the data. The key principles at play include parametric vs. non-parametric assumptions, number of groups being compared, type of variable (categorical vs. continuous), and independence of samples.
Every test in this guide exists to answer a specific type of question: Is there a difference? Is there a relationship? Is there an association? Understanding the underlying logic—what assumptions must hold, what kind of data you need, what your null hypothesis actually claims—will help you navigate both multiple-choice questions and FRQs with confidence. Don't just memorize test names—know what question each test answers and what conditions must be met to use it.
Comparing Means: Parametric Tests
When you need to determine whether group means differ significantly, parametric tests are your go-to tools—provided your data meets assumptions of normality and (often) equal variances. These tests form the foundation of inferential statistics.
t-Test (One-Sample, Two-Sample, Paired)
- Compares means when population standard deviation is unknown—the one-sample version tests against a known value, two-sample compares independent groups, and paired compares the same subjects at different times
- Designed for small samples (n<30) and assumes approximately normal distribution; two-sample version also requires equal variances unless using Welch's correction
- Produces a t-statistic that measures how many standard errors the sample mean falls from the hypothesized value—larger absolute values indicate stronger evidence against H0
Z-Test
- Requires known population standard deviation or large samples (n≥30)—this is the key distinction from the t-test
- Assumes normal distribution and can be one-sample (comparing to population mean) or two-sample (comparing two group means)
- Produces a z-score using the standard normal distribution; commonly tested in the context of proportion tests where σ can be calculated from p
Compare: t-test vs. Z-test—both compare means, but the t-test is for unknown σ and small samples while the Z-test requires known σ or n≥30. If an FRQ gives you a small sample and sample standard deviation, you need the t-test.
ANOVA (One-Way, Two-Way)
- Extends mean comparison to three or more groups—avoids inflated Type I error from running multiple t-tests
- One-way ANOVA tests one factor's effect; two-way ANOVA examines two factors plus their interaction effect
- Produces an F-statistic (ratio of between-group variance to within-group variance); a significant result means at least one group differs, requiring post-hoc tests to identify which
Comparing Means: Non-Parametric Alternatives
When your data violates normality assumptions or consists of ordinal measurements, non-parametric tests use ranks instead of raw values. They're more flexible but generally less powerful than their parametric counterparts.
Mann-Whitney U Test
- Non-parametric alternative to the two-sample t-test—compares two independent groups when normality cannot be assumed
- Works by ranking all observations from both groups combined, then comparing the sum of ranks between groups
- Produces a U statistic; ideal for ordinal data, skewed distributions, or small samples where t-test assumptions fail
Wilcoxon Signed-Rank Test
- Non-parametric alternative to the paired t-test—compares two related samples or repeated measurements
- Ranks the absolute differences between paired observations while preserving the direction (positive or negative) of each difference
- Produces a W statistic; use when paired differences are non-normal or data is ordinal
Compare: Mann-Whitney U vs. Wilcoxon signed-rank—both are non-parametric, but Mann-Whitney handles independent groups while Wilcoxon handles paired/related samples. Match the test to your study design.
Kruskal-Wallis Test
- Non-parametric alternative to one-way ANOVA—compares three or more independent groups without assuming normality
- Ranks all data points across groups and compares rank sums; essentially an extension of Mann-Whitney to multiple groups
- Produces an H statistic; like ANOVA, a significant result indicates at least one group differs but doesn't specify which
Testing Relationships Between Variables
These tests examine how variables relate to each other—whether one predicts another or whether they move together. The distinction between correlation and regression is heavily tested.
Correlation Analysis
- Measures strength and direction of linear relationships between two continuous variables using the correlation coefficient r, which ranges from −1 to +1
- Does not imply causation—this is perhaps the most frequently tested concept; correlation only indicates association
- Assumes linearity and bivariate normality; always check a scatterplot first to verify the relationship is actually linear
Regression Analysis (Simple Linear, Multiple)
- Models a predictive relationship where one or more independent variables (x) predict a dependent variable (y)
- Simple linear regression uses one predictor (y^=a+bx); multiple regression uses two or more predictors
- Requires four key assumptions: linearity, independence of residuals, homoscedasticity (constant variance), and normality of residuals—memorize these as LINE
Compare: Correlation vs. Regression—correlation measures association strength (symmetric between variables), while regression establishes a predictive equation (asymmetric: x predicts y). FRQs often ask you to interpret slope and r2, not just r.
Testing Categorical Associations
When both variables are categorical, you need tests designed for frequency data rather than means.
Chi-Square Test
- Tests association between categorical variables using a contingency table that compares observed frequencies to expected frequencies under independence
- Requires minimum expected cell frequency of 5—this assumption is commonly tested; violating it invalidates results
- Two main applications: goodness-of-fit (one variable, testing against expected distribution) and test of independence (two variables, testing for association)
Testing Variance Equality
Before running certain parametric tests, you may need to verify that group variances are equal—this is where the F-test comes in.
F-Test
- Compares variances between groups to determine if they differ significantly—calculated as the ratio of two sample variances
- Foundational to ANOVA—the F-statistic in ANOVA is essentially testing whether between-group variance exceeds within-group variance
- Assumes normality and independence; used to validate the equal-variance assumption required by two-sample t-tests and ANOVA
Compare: F-test vs. ANOVA F-statistic—both produce F-values, but the standalone F-test compares variances directly, while ANOVA's F-statistic tests whether means differ by comparing variance components. Know which question each answers.
Quick Reference Table
|
| Comparing two means (parametric) | t-test, Z-test |
| Comparing three+ means (parametric) | One-way ANOVA, Two-way ANOVA |
| Comparing two means (non-parametric) | Mann-Whitney U, Wilcoxon signed-rank |
| Comparing three+ means (non-parametric) | Kruskal-Wallis |
| Testing relationships (continuous) | Correlation, Regression |
| Testing categorical association | Chi-square test |
| Comparing variances | F-test |
| Paired/related samples | Paired t-test, Wilcoxon signed-rank |
Self-Check Questions
-
You have two independent groups with small sample sizes and non-normal distributions. Which test should you use, and why can't you use a two-sample t-test?
-
Compare and contrast correlation analysis and simple linear regression—what question does each answer, and how do their outputs differ?
-
A researcher wants to compare mean test scores across four teaching methods. Which parametric test is appropriate, and what would happen to Type I error if they ran multiple t-tests instead?
-
Which two tests serve as non-parametric alternatives to the paired t-test and two-sample t-test, respectively? What do they have in common?
-
An FRQ presents a contingency table with some expected cell frequencies below 5. Why is this problematic for a chi-square test, and what might you conclude about the analysis?