Fiveable

🎲Intro to Probability Unit 11 Review

QR code for Intro to Probability practice questions

11.2 Correlation coefficient and its interpretation

🎲Intro to Probability
Unit 11 Review

11.2 Correlation coefficient and its interpretation

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
🎲Intro to Probability
Unit & Topic Study Guides

Correlation coefficient measures the strength and direction of the relationship between two variables. It's a key tool in understanding how things are connected, ranging from -1 to +1, with 0 meaning no linear relationship.

This concept builds on covariance, providing a standardized measure of association. By calculating and interpreting correlation, we can make predictions, guide research, and inform decisions across various fields, from economics to psychology.

Correlation Coefficient

Definition and Formula

  • Correlation coefficient quantifies strength and direction of linear relationship between two continuous variables
  • Denoted as r (sample) or ρ (population)
  • Dimensionless quantity ranging from -1 to +1
  • Formula for Pearson correlation coefficient r=[(xxˉ)(yyˉ)](xxˉ)2(yyˉ)2r = \frac{\sum[(x - \bar{x})(y - \bar{y})]}{\sqrt{\sum(x - \bar{x})^2 \sum(y - \bar{y})^2}}
  • Population correlation coefficient uses population means (μx and μy) instead of sample means
  • Symmetric measure (correlation between X and Y equals correlation between Y and X)
  • Invariant under linear transformations of either variable

Properties and Interpretations

  • Sign indicates direction of relationship (positive or negative)
  • Magnitude represents strength of linear relationship
  • Value of 0 suggests no linear relationship (non-linear relationships may still exist)
  • Strength categories: 0.00-0.19 (very weak), 0.20-0.39 (weak), 0.40-0.59 (moderate), 0.60-0.79 (strong), 0.80-1.0 (very strong)
  • Coefficient of determination (r²) represents proportion of variance in one variable predictable from the other
  • Correlation does not imply causation
  • Sensitive to outliers and influential points
  • Assumes linear relationship (may not accurately represent non-linear relationships)

Calculating Correlation

Data Organization and Preparation

  • Organize data into paired observations (x, y) for each subject or item
  • Calculate mean (average) of x and y variables separately
  • Compute deviations by subtracting mean of x from each x value and mean of y from each y value
    • Example: For data points (2, 3), (4, 5), (6, 7) with means x̄ = 4 and ȳ = 5, deviations are (-2, -2), (0, 0), (2, 2)

Computation Steps

  • Multiply x and y deviations for each pair and sum products (numerator of correlation formula)
  • Square x and y deviations separately, sum each set of squares, multiply sums, and take square root (denominator)
  • Divide numerator by denominator to obtain correlation coefficient
  • Verify calculated coefficient falls within -1 to +1 range
    • Example: Using previous data, r = 8 / (√8 * √8) = 1, indicating perfect positive correlation

Interpreting Correlation

Strength and Direction

  • Positive values indicate positive relationship (variables increase or decrease together)
    • Example: Height and weight in humans (taller individuals tend to weigh more)
  • Negative values indicate negative relationship (one variable increases as other decreases)
    • Example: Temperature and heating costs (higher temperatures lead to lower heating expenses)
  • Magnitude closer to -1 or +1 indicates stronger relationship
  • Value of 0 suggests no linear relationship
    • Example: Shoe size and intelligence (likely no meaningful correlation)

Practical Implications

  • Correlation coefficient helps predict one variable's behavior based on another
  • Useful in various fields (economics, psychology, biology)
    • Example: Correlation between study time and test scores to assess effective study habits
  • Guides decision-making in research and policy development
    • Example: Correlation between air pollution and respiratory diseases informing environmental policies
  • Assists in identifying potential causal relationships for further investigation

Correlation Coefficient Range

Perfect Correlations

  • Correlation of +1 indicates perfect positive linear relationship
    • Example: Converting Celsius to Fahrenheit temperatures
  • Correlation of -1 indicates perfect negative linear relationship
    • Example: Relationship between price and quantity demanded in perfectly elastic markets
  • Perfect correlations rare in real-world data due to natural variability and measurement error

Intermediate Values

  • Values between 0 and ±1 indicate varying degrees of linear relationship
  • Strength increases as absolute value approaches 1
    • Example: Correlation of 0.7 between exercise frequency and cardiovascular health (strong positive relationship)
    • Example: Correlation of -0.4 between hours of TV watched and academic performance (moderate negative relationship)
  • Interpretation depends on context and field of study
    • Example: In social sciences, correlations of 0.3 might be considered meaningful, while in physical sciences, higher correlations may be expected

Limitations and Considerations

  • Correlation coefficient sensitive to outliers and influential points
    • Example: A few extreme data points in stock market analysis can skew overall correlation
  • Assumes linear relationship (may not accurately represent non-linear relationships)
    • Example: Relationship between age and height in humans (linear in childhood, non-linear in adulthood)
  • Restricted range of either variable can affect correlation value
    • Example: Studying correlation between IQ and job performance only for high IQ individuals may underestimate true correlation