Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Understanding correlation measures is fundamental to making valid inferences from data—and that's exactly what you're being tested on in this course. These aren't just formulas to memorize; they represent different tools for different situations. The AP exam will challenge you to recognize when to use each measure based on your data type, distribution assumptions, and research question. Choosing the wrong correlation measure can lead to misleading conclusions and flawed decisions.
The key concepts here revolve around linearity assumptions, data types (continuous vs. ordinal vs. categorical), robustness to violations, and controlling for confounding variables. Each correlation measure makes specific assumptions about your data—violate those assumptions, and your results become meaningless. Don't just memorize the formulas; know what type of data each measure requires, what assumptions it makes, and when it outperforms alternatives.
When both variables are continuous and you expect a straight-line relationship, these measures quantify how tightly your data points cluster around that line. The key assumption is linearity—if your scatter plot curves, these measures will underestimate the true relationship.
Compare: Pearson vs. Partial correlation—both measure linear relationships, but partial correlation removes the influence of confounding variables. If an FRQ asks about "controlling for" or "holding constant," partial correlation is your answer.
When your data violates normality assumptions or involves ordinal rankings, these measures assess monotonic relationships—whether variables consistently increase or decrease together, even if not in a straight line.
Compare: Spearman vs. Kendall's tau—both are rank-based and non-parametric, but Kendall's tau handles tied ranks better and works well with small samples. Spearman is computationally simpler and more commonly reported in research.
When one or both variables are categorical (binary or nominal), standard correlation measures don't apply. These specialized measures handle the unique properties of categorical data.
Compare: Point-biserial vs. Phi coefficient—point-biserial requires one continuous variable, while phi works with two binary variables. Both are special cases of Pearson adapted for categorical data.
Sometimes you're not measuring relationships between different constructs—you're assessing whether multiple measurements of the same thing agree with each other. This is fundamentally different from association.
Compare: Pearson vs. ICC—Pearson measures whether two variables move together; ICC measures whether multiple measurements of the same variable agree. Use Pearson for relationships between different constructs, ICC for measurement reliability.
When you're dealing with complex relationships involving multiple variables on both sides of the equation, or when linear measures fail to capture the full picture, these advanced techniques apply.
Compare: Pearson vs. Distance correlation—Pearson only detects linear relationships and can equal zero even when strong non-linear patterns exist. Distance correlation captures any type of dependence, making it more powerful but harder to interpret.
| Concept | Best Examples |
|---|---|
| Linear relationships (continuous data) | Pearson, Partial correlation, Multiple correlation |
| Non-parametric/rank-based | Spearman, Kendall's tau |
| Categorical variables | Point-biserial, Phi coefficient |
| Measurement reliability | Intraclass correlation |
| Non-linear detection | Distance correlation |
| Multivariate relationships | Canonical correlation, Multiple correlation |
| Controlling for confounders | Partial correlation |
| Robust to outliers | Spearman, Kendall's tau, Distance correlation |
You have continuous data with several extreme outliers. Which two correlation measures would be more appropriate than Pearson, and why do they share this advantage?
A researcher wants to know if test anxiety predicts exam performance after controlling for hours studied. Which correlation measure should they use, and what does it reveal that Pearson alone cannot?
Compare and contrast Spearman and Kendall's tau: when would you choose one over the other, and what do their different values for the same data tell you?
You're analyzing the relationship between a binary treatment variable (drug vs. placebo) and a continuous outcome (blood pressure). Which correlation measure applies, and what assumption must your data meet?
FRQ-style: A scatter plot shows a clear U-shaped relationship between two variables, but the Pearson correlation is near zero. Explain why this occurs and identify which correlation measure would better capture this relationship.