Statistics isn't just about crunching numbers—it's about extracting meaning from data. Every statistical measure you learn serves a specific purpose: some tell you where the "center" of your data lives, others reveal how spread out or clustered your values are, and still others help you understand relationships between variables. On exams, you're being tested on your ability to choose the right measure for the situation and interpret what that measure actually tells you about the underlying data.
The key concepts here fall into three categories: measures of central tendency (where's the middle?), measures of spread (how variable is the data?), and measures of position and relationship (where does a value rank, and how do variables relate?). Don't just memorize formulas—know when each measure is appropriate, what makes it sensitive to outliers, and how to interpret results in context.
Measures of Central Tendency
These measures answer the fundamental question: what's a typical value in this dataset? Each one captures "center" differently, and choosing the right one depends on your data's shape and what you're trying to communicate.
Mean
Sum divided by count—the arithmetic average, calculated as xˉ=n∑xi, gives equal weight to every data point
Highly sensitive to outliers—a single extreme value can drag the mean far from where most data clusters
Best for symmetric distributions—when data is roughly balanced, the mean accurately represents the center
Median
The middle value when data is sorted—for odd n, it's the center value; for even n, average the two middle values
Resistant to outliers—extreme values don't affect the median since it only depends on position, not magnitude
Preferred for skewed data—income, home prices, and other right-skewed distributions are better represented by median
Mode
Most frequently occurring value—the only central tendency measure that works for categorical data
Can be unimodal, bimodal, or multimodal—datasets may have one peak, two peaks, or several, revealing distribution shape
Essential for categorical analysis—when you need the most common category (favorite color, most popular product), mode is your tool
Compare: Mean vs. Median—both measure center, but mean uses all values while median uses only position. When an FRQ gives you a skewed distribution or mentions outliers, median is usually the better choice for representing "typical."
Measures of Spread
Knowing the center isn't enough—you need to understand how much variation exists around that center. These measures quantify dispersion, from simple to sophisticated.
Range
Maximum minus minimum—the simplest spread measure, calculated as Range=xmax−xmin
Uses only two data points—ignores everything between the extremes, missing the full picture of variability
Highly sensitive to outliers—one unusual value completely changes the range, limiting its usefulness
Variance
Average squared deviation from the mean—calculated as σ2=n∑(xi−xˉ)2 for populations (use n−1 for samples)
Squaring eliminates negative differences—ensures deviations don't cancel out, but produces units that are squared
Foundation for standard deviation—variance is mathematically useful but harder to interpret directly
Standard Deviation
Square root of variance—σ=σ2 returns the spread measure to original units
Measures typical distance from the mean—tells you how far a "normal" data point sits from center
Key to the empirical rule—in normal distributions, about 68% of data falls within ±1σ, 95% within ±2σ
Compare: Range vs. Standard Deviation—range is quick but crude (two points only), while standard deviation incorporates every data point. If an exam asks for a "robust" or "reliable" measure of spread, standard deviation wins.
Interquartile Range
Difference between Q3 and Q1—calculated as IQR=Q3−Q1, capturing the middle 50% of data
Resistant to outliers—since it ignores the top and bottom 25%, extreme values don't affect it
Used to identify outliers—values below Q1−1.5×IQR or above Q3+1.5×IQR are flagged as outliers
Compare: Standard Deviation vs. IQR—both measure spread, but SD is sensitive to outliers while IQR is resistant. For skewed data or when outliers are present, IQR gives a more accurate picture of typical variability.
Measures of Position
These measures tell you where a specific value ranks within the larger dataset—essential for comparing individuals across different scales or distributions.
Percentiles
Divide data into 100 equal parts—the nth percentile means n% of values fall below that point
Useful for standardized comparisons—test scores, growth charts, and rankings all use percentiles
The 50th percentile equals the median—this connection helps you interpret percentiles intuitively
Quartiles
Divide data into four equal parts—Q1 (25th percentile), Q2 (50th/median), and Q3 (75th percentile)
Foundation for box plots—the five-number summary (min, Q1, median, Q3, max) creates this visualization
Quick distribution snapshot—quartiles reveal skewness and help identify potential outliers at a glance
Compare: Percentiles vs. Quartiles—quartiles are just specific percentiles (25th, 50th, 75th). Know that Q2 is always the median, and IQR is always Q3−Q1.
Measures of Relationship
When you have two variables, you need tools to understand how they move together—this is where correlation comes in.
Correlation Coefficient
Measures linear relationship strength and direction—denoted r, values range from −1 to +1
Sign indicates direction—positive r means variables increase together; negative r means one increases as the other decreases
Magnitude indicates strength—∣r∣ close to 1 means strong linear relationship; close to 0 means weak or no linear relationship
Compare: Correlation vs. Causation—a high ∣r∣ value does NOT prove one variable causes changes in another. This is a classic exam trap: correlation describes association, not causation.
Quick Reference Table
Concept
Best Examples
Central tendency (balanced data)
Mean
Central tendency (skewed/outliers)
Median
Central tendency (categorical)
Mode
Simple spread measure
Range
Spread in original units
Standard Deviation
Spread resistant to outliers
IQR
Position within distribution
Percentiles, Quartiles
Relationship between variables
Correlation Coefficient
Self-Check Questions
A dataset of home prices includes one mansion worth $15 million. Which measure of central tendency would best represent a "typical" home price, and why?
Compare variance and standard deviation: why do we bother calculating standard deviation when variance already measures spread?
A student scores in the 85th percentile on an exam. Another student's score equals Q3. Which student performed better, and how do you know?
You're analyzing two datasets with identical means. Dataset A has σ=2 and Dataset B has σ=10. What does this tell you about how the data points are distributed differently?
Two variables have a correlation coefficient of r=−0.92. Describe the relationship and explain why this does NOT prove that changes in one variable cause changes in the other.