upgrade
upgrade

🔢Lower Division Math Foundations

Common Statistical Measures

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistics isn't just about crunching numbers—it's about extracting meaning from data. Every statistical measure you learn serves a specific purpose: some tell you where the "center" of your data lives, others reveal how spread out or clustered your values are, and still others help you understand relationships between variables. On exams, you're being tested on your ability to choose the right measure for the situation and interpret what that measure actually tells you about the underlying data.

The key concepts here fall into three categories: measures of central tendency (where's the middle?), measures of spread (how variable is the data?), and measures of position and relationship (where does a value rank, and how do variables relate?). Don't just memorize formulas—know when each measure is appropriate, what makes it sensitive to outliers, and how to interpret results in context.


Measures of Central Tendency

These measures answer the fundamental question: what's a typical value in this dataset? Each one captures "center" differently, and choosing the right one depends on your data's shape and what you're trying to communicate.

Mean

  • Sum divided by count—the arithmetic average, calculated as xˉ=xin\bar{x} = \frac{\sum x_i}{n}, gives equal weight to every data point
  • Highly sensitive to outliers—a single extreme value can drag the mean far from where most data clusters
  • Best for symmetric distributions—when data is roughly balanced, the mean accurately represents the center

Median

  • The middle value when data is sorted—for odd nn, it's the center value; for even nn, average the two middle values
  • Resistant to outliers—extreme values don't affect the median since it only depends on position, not magnitude
  • Preferred for skewed data—income, home prices, and other right-skewed distributions are better represented by median

Mode

  • Most frequently occurring value—the only central tendency measure that works for categorical data
  • Can be unimodal, bimodal, or multimodal—datasets may have one peak, two peaks, or several, revealing distribution shape
  • Essential for categorical analysis—when you need the most common category (favorite color, most popular product), mode is your tool

Compare: Mean vs. Median—both measure center, but mean uses all values while median uses only position. When an FRQ gives you a skewed distribution or mentions outliers, median is usually the better choice for representing "typical."


Measures of Spread

Knowing the center isn't enough—you need to understand how much variation exists around that center. These measures quantify dispersion, from simple to sophisticated.

Range

  • Maximum minus minimum—the simplest spread measure, calculated as Range=xmaxxmin\text{Range} = x_{max} - x_{min}
  • Uses only two data points—ignores everything between the extremes, missing the full picture of variability
  • Highly sensitive to outliers—one unusual value completely changes the range, limiting its usefulness

Variance

  • Average squared deviation from the mean—calculated as σ2=(xixˉ)2n\sigma^2 = \frac{\sum(x_i - \bar{x})^2}{n} for populations (use n1n-1 for samples)
  • Squaring eliminates negative differences—ensures deviations don't cancel out, but produces units that are squared
  • Foundation for standard deviation—variance is mathematically useful but harder to interpret directly

Standard Deviation

  • Square root of varianceσ=σ2\sigma = \sqrt{\sigma^2} returns the spread measure to original units
  • Measures typical distance from the mean—tells you how far a "normal" data point sits from center
  • Key to the empirical rule—in normal distributions, about 68% of data falls within ±1σ\pm 1\sigma, 95% within ±2σ\pm 2\sigma

Compare: Range vs. Standard Deviation—range is quick but crude (two points only), while standard deviation incorporates every data point. If an exam asks for a "robust" or "reliable" measure of spread, standard deviation wins.

Interquartile Range

  • Difference between Q3 and Q1—calculated as IQR=Q3Q1\text{IQR} = Q_3 - Q_1, capturing the middle 50% of data
  • Resistant to outliers—since it ignores the top and bottom 25%, extreme values don't affect it
  • Used to identify outliers—values below Q11.5×IQRQ_1 - 1.5 \times \text{IQR} or above Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR} are flagged as outliers

Compare: Standard Deviation vs. IQR—both measure spread, but SD is sensitive to outliers while IQR is resistant. For skewed data or when outliers are present, IQR gives a more accurate picture of typical variability.


Measures of Position

These measures tell you where a specific value ranks within the larger dataset—essential for comparing individuals across different scales or distributions.

Percentiles

  • Divide data into 100 equal parts—the nnth percentile means nn% of values fall below that point
  • Useful for standardized comparisons—test scores, growth charts, and rankings all use percentiles
  • The 50th percentile equals the median—this connection helps you interpret percentiles intuitively

Quartiles

  • Divide data into four equal partsQ1Q_1 (25th percentile), Q2Q_2 (50th/median), and Q3Q_3 (75th percentile)
  • Foundation for box plots—the five-number summary (min, Q1Q_1, median, Q3Q_3, max) creates this visualization
  • Quick distribution snapshot—quartiles reveal skewness and help identify potential outliers at a glance

Compare: Percentiles vs. Quartiles—quartiles are just specific percentiles (25th, 50th, 75th). Know that Q2Q_2 is always the median, and IQR is always Q3Q1Q_3 - Q_1.


Measures of Relationship

When you have two variables, you need tools to understand how they move together—this is where correlation comes in.

Correlation Coefficient

  • Measures linear relationship strength and direction—denoted rr, values range from 1-1 to +1+1
  • Sign indicates direction—positive rr means variables increase together; negative rr means one increases as the other decreases
  • Magnitude indicates strengthr|r| close to 1 means strong linear relationship; close to 0 means weak or no linear relationship

Compare: Correlation vs. Causation—a high r|r| value does NOT prove one variable causes changes in another. This is a classic exam trap: correlation describes association, not causation.


Quick Reference Table

ConceptBest Examples
Central tendency (balanced data)Mean
Central tendency (skewed/outliers)Median
Central tendency (categorical)Mode
Simple spread measureRange
Spread in original unitsStandard Deviation
Spread resistant to outliersIQR
Position within distributionPercentiles, Quartiles
Relationship between variablesCorrelation Coefficient

Self-Check Questions

  1. A dataset of home prices includes one mansion worth $15 million. Which measure of central tendency would best represent a "typical" home price, and why?

  2. Compare variance and standard deviation: why do we bother calculating standard deviation when variance already measures spread?

  3. A student scores in the 85th percentile on an exam. Another student's score equals Q3Q_3. Which student performed better, and how do you know?

  4. You're analyzing two datasets with identical means. Dataset A has σ=2\sigma = 2 and Dataset B has σ=10\sigma = 10. What does this tell you about how the data points are distributed differently?

  5. Two variables have a correlation coefficient of r=0.92r = -0.92. Describe the relationship and explain why this does NOT prove that changes in one variable cause changes in the other.