Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Statistical tests are the backbone of reproducible data science—they transform raw observations into defensible conclusions. In collaborative environments, your team needs shared language around when to use parametric vs. non-parametric tests, how to compare groups, and what assumptions must hold for results to be valid. Misapplying a test or violating its assumptions can invalidate entire analyses, waste computational resources, and erode trust in your findings.
You're being tested on more than memorizing formulas. Exams and real-world collaborations demand that you choose the right test for the data type and research question, verify assumptions before running analyses, and interpret outputs in context. Don't just memorize that a t-test compares means—know why you'd pick it over a Mann-Whitney U, and what breaks when normality fails.
These tests assume your data follows a normal distribution and compare central tendencies across groups. The underlying principle is that if groups come from populations with identical means, observed differences should fall within predictable sampling variability.
Compare: T-test vs. ANOVA—both compare means assuming normality, but t-tests handle only two groups while ANOVA scales to three or more. If an FRQ asks you to compare multiple treatment conditions, ANOVA is your tool; reserve t-tests for pairwise follow-ups.
When normality assumptions fail or you're working with ordinal data, these rank-based tests provide robust alternatives. They convert raw values to ranks, making them resistant to outliers and skewed distributions.
Compare: Mann-Whitney U vs. Kruskal-Wallis—both are rank-based and assumption-light, but Mann-Whitney handles two groups while Kruskal-Wallis extends to three or more. Think of Kruskal-Wallis as "non-parametric ANOVA."
Regression methods quantify how predictor variables relate to outcomes, enabling prediction and inference. The core idea is fitting a mathematical function that minimizes the discrepancy between predicted and observed values.
Compare: Linear vs. Logistic Regression—linear regression predicts continuous outcomes; logistic regression predicts probabilities for categorical outcomes. If your dependent variable is binary (pass/fail, churned/retained), logistic is required—linear regression would predict impossible values outside 0-1.
These methods assess whether and how strongly variables relate to each other, without necessarily implying one causes the other. Association tests help identify patterns worth investigating further.
Compare: Correlation vs. Chi-Square—correlation measures relationships between continuous variables; chi-square tests associations between categorical variables. Choosing between them depends entirely on your data types, not your research question.
| Concept | Best Examples |
|---|---|
| Comparing two group means (parametric) | T-test |
| Comparing 3+ group means (parametric) | ANOVA, F-test |
| Comparing groups (non-parametric) | Mann-Whitney U, Kruskal-Wallis |
| Predicting continuous outcomes | Linear regression, Multiple regression |
| Predicting binary outcomes | Logistic regression |
| Measuring continuous variable association | Correlation analysis |
| Testing categorical variable association | Chi-square test |
| Assumption-free alternatives | Mann-Whitney U, Kruskal-Wallis |
You have three treatment groups and your data is heavily right-skewed. Which test should you use instead of ANOVA, and why?
Compare and contrast linear regression and logistic regression—when would using linear regression on a binary outcome produce misleading results?
A colleague runs a chi-square test on two continuous variables. What's wrong with this approach, and which test should they use instead?
Both the t-test and Mann-Whitney U compare two groups. What specific data conditions would make you choose Mann-Whitney U over a t-test?
If an FRQ asks you to "control for confounding variables" when examining the effect of study hours on exam scores, which regression approach allows this, and how does it isolate each predictor's effect?