Choosing the right statistical test is one of the most critical skills in Honors Statistics, and it's exactly what exam questions target. You're not just being tested on whether you can calculate a test statistic; you're being tested on whether you understand when to use each test and why that test fits the data. The key principles at play include parametric vs. non-parametric assumptions, number of groups being compared, type of variable (categorical vs. continuous), and independence of samples.
Every test in this guide exists to answer a specific type of question: Is there a difference? Is there a relationship? Is there an association? Understanding the underlying logic (what assumptions must hold, what kind of data you need, what your null hypothesis actually claims) will help you navigate both multiple-choice questions and free-response problems with confidence. Don't just memorize test names. Know what question each test answers and what conditions must be met to use it.
Comparing Means: Parametric Tests
When you need to determine whether group means differ significantly, parametric tests are your go-to tools, provided your data meets assumptions of normality and (often) equal variances. These tests form the foundation of inferential statistics.
t-Test (One-Sample, Two-Sample, Paired)
Compares means when population standard deviation is unknown. The one-sample version tests a sample mean against a known or hypothesized value. The two-sample version compares means from two independent groups. The paired version compares the same subjects measured at two different times or under two different conditions.
Designed for small samples (n<30) and assumes the underlying population is approximately normal. The two-sample version also requires equal variances unless you apply Welch's correction, which adjusts the degrees of freedom to account for unequal variances.
Produces a t-statistic that measures how many standard errors the sample mean falls from the hypothesized value. Larger absolute values indicate stronger evidence against H0โ.
Z-Test
Requires known population standard deviation or large samples (nโฅ30). This is the key distinction from the t-test.
Assumes normal distribution and can be one-sample (comparing to a population mean) or two-sample (comparing two group means).
Produces a z-score referenced against the standard normal distribution. You'll most commonly see this in the context of proportion tests, where ฯ can be derived directly from p using ฯ=np(1โp)โโ.
Compare: t-test vs. Z-test: both compare means, but the t-test is for unknown ฯ and small samples while the Z-test requires known ฯ or nโฅ30. If a problem gives you a small sample and a sample standard deviation s, you need the t-test.
ANOVA (One-Way, Two-Way)
Extends mean comparison to three or more groups. Running multiple t-tests instead would inflate your Type I error rate (the probability of a false positive compounds with each additional test).
One-way ANOVA tests the effect of a single factor (e.g., comparing test scores across three study methods). Two-way ANOVA examines two factors simultaneously and can also detect an interaction effect, where the impact of one factor depends on the level of the other.
Produces an F-statistic, calculated as the ratio of between-group variance to within-group variance. A significant result tells you at least one group mean differs from the others, but it doesn't tell you which one. You then need post-hoc tests (like Tukey's HSD) to pinpoint the specific differences.
Comparing Means: Non-Parametric Alternatives
When your data violates normality assumptions or consists of ordinal measurements, non-parametric tests use ranks instead of raw values. They're more flexible but generally less statistically powerful than their parametric counterparts, meaning they need stronger effects or larger samples to detect real differences.
Mann-Whitney U Test
Non-parametric alternative to the two-sample t-test. Use it to compare two independent groups when normality cannot be assumed.
Works by ranking all observations from both groups combined, then comparing the sum of ranks between groups. If one group's ranks are systematically higher, that's evidence of a difference.
Produces a U statistic. This test is ideal for ordinal data, heavily skewed distributions, or small samples where t-test assumptions clearly fail.
Wilcoxon Signed-Rank Test
Non-parametric alternative to the paired t-test. Use it to compare two related samples or repeated measurements on the same subjects.
Ranks the absolute differences between paired observations while preserving the direction (positive or negative) of each difference. This means it considers both the magnitude and the sign of changes.
Produces a W statistic. Reach for this test when paired differences are non-normal or when your data is ordinal.
Compare: Mann-Whitney U vs. Wilcoxon signed-rank: both are non-parametric, but Mann-Whitney handles independent groups while Wilcoxon handles paired/related samples. Match the test to your study design, just as you would choose between a two-sample t-test and a paired t-test.
Kruskal-Wallis Test
Non-parametric alternative to one-way ANOVA. Use it to compare three or more independent groups without assuming normality.
Ranks all data points across groups and compares rank sums. Think of it as extending the Mann-Whitney U logic to multiple groups.
Produces an H statistic. Like ANOVA, a significant result indicates at least one group differs but doesn't specify which. You'd follow up with pairwise non-parametric comparisons (such as Dunn's test) to identify the specific differences.
Testing Relationships Between Variables
These tests examine how variables relate to each other: whether one predicts another or whether they move together. The distinction between correlation and regression is heavily tested.
Correlation Analysis
Measures the strength and direction of a linear relationship between two continuous variables using the Pearson correlation coefficient r, which ranges from โ1 to +1. A value of 0 indicates no linear relationship; values near ยฑ1 indicate a strong linear relationship.
Does not imply causation. This is perhaps the most frequently tested concept in all of statistics. Correlation only indicates association. Two variables can be strongly correlated because of a lurking (confounding) variable or pure coincidence.
Assumes linearity and bivariate normality. Always check a scatterplot first to verify the relationship is actually linear. If the scatterplot shows a curve, r will understate or misrepresent the true relationship.
Regression Analysis (Simple Linear, Multiple)
Models a predictive relationship where one or more independent variables (x) predict a dependent variable (y).
Simple linear regression uses one predictor: y^โ=a+bx. Multiple regression uses two or more predictors. In either case, the coefficient b (the slope) tells you the predicted change in y for a one-unit increase in x.
Requires four key assumptions you should memorize as LINE: Linearity, Independence of residuals, Normality of residuals, Equal variance of residuals (homoscedasticity). Residual plots are the primary tool for checking these.
Compare: Correlation vs. Regression: correlation measures association strength and is symmetric (switching x and y gives the same r), while regression establishes a directional predictive equation (x predicts y, not the other way around). On free-response questions, you'll often be asked to interpret the slope (predicted change in y per unit change in x) and r2 (the proportion of variability in y explained by x), not just r.
Testing Categorical Associations
When both variables are categorical, you need tests designed for frequency data rather than means.
Chi-Square Test
Tests association between categorical variables using a contingency table that compares observed frequencies to expected frequencies calculated under the assumption of independence.
Requires a minimum expected cell frequency of 5. This assumption is commonly tested on exams. When expected counts fall below 5, the chi-square approximation becomes unreliable, and you may need to combine categories or use an alternative like Fisher's exact test.
Two main applications: the goodness-of-fit test (one categorical variable, testing whether its distribution matches a hypothesized distribution) and the test of independence (two categorical variables, testing whether they are associated). Both use the same core formula ฯ2=โE(OโE)2โ, but they answer different questions.
Testing Variance Equality
Before running certain parametric tests, you may need to verify that group variances are equal. That's where the F-test comes in.
F-Test
Compares variances between two groups to determine if they differ significantly. It's calculated as the ratio of the larger sample variance to the smaller sample variance: F=s22โs12โโ.
Foundational to ANOVA. The F-statistic in ANOVA uses the same ratio logic, but it compares between-group variance to within-group variance to assess whether group means differ.
Assumes normality and independence. The F-test is sensitive to non-normality, so if your data is clearly skewed, consider Levene's test as a more robust alternative for checking equal variances.
Compare: F-test vs. ANOVA F-statistic: both produce F-values, but the standalone F-test directly compares variances between two groups, while ANOVA's F-statistic tests whether means differ across groups by comparing variance components. Know which question each answers.
Quick Reference Table
Concept
Best Examples
Comparing two means (parametric)
t-test, Z-test
Comparing three+ means (parametric)
One-way ANOVA, Two-way ANOVA
Comparing two means (non-parametric)
Mann-Whitney U, Wilcoxon signed-rank
Comparing three+ means (non-parametric)
Kruskal-Wallis
Testing relationships (continuous)
Correlation, Regression
Testing categorical association
Chi-square test
Comparing variances
F-test
Paired/related samples
Paired t-test, Wilcoxon signed-rank
Self-Check Questions
You have two independent groups with small sample sizes and non-normal distributions. Which test should you use, and why can't you use a two-sample t-test?
Compare and contrast correlation analysis and simple linear regression. What question does each answer, and how do their outputs differ?
A researcher wants to compare mean test scores across four teaching methods. Which parametric test is appropriate, and what would happen to the Type I error rate if they ran multiple t-tests instead?
Which two tests serve as non-parametric alternatives to the paired t-test and two-sample t-test, respectively? What do they have in common?
A free-response question presents a contingency table with some expected cell frequencies below 5. Why is this problematic for a chi-square test, and what might you recommend?