๐ŸŽฒIntro to Statistics

Measures of Dispersion

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

When you describe a dataset, the mean or median only tells half the story. You also need to know how spread out the values are. Measures of dispersion answer the question: how much do individual data points vary from the center? This concept underpins everything from understanding sampling variability to interpreting confidence intervals and hypothesis tests.

Different measures of spread serve different purposes. Some are sensitive to outliers, others resist them. Some preserve original units, others standardize for comparison. Don't just memorize formulas. Understand when to use each measure and what it tells you about the variability of your distribution.


Simple Range-Based Measures

These measures use specific data points (like maximums, minimums, or quartiles) to capture spread. They're intuitive and easy to calculate, but they don't use every observation in the dataset.

Range

The range is the simplest measure of spread. You only need two values to calculate it.

  • Formula: Range=Maximumโˆ’Minimum\text{Range} = \text{Maximum} - \text{Minimum}
  • Highly sensitive to outliers since it depends entirely on the two most extreme observations
  • Useful for a quick snapshot, but rarely sufficient on its own because a single unusual value can inflate it dramatically

For example, if exam scores run from 52 to 98, the range is 98โˆ’52=4698 - 52 = 46. But if one student scored 12, the range jumps to 86, even though the bulk of scores didn't change.

Interquartile Range (IQR)

The IQR measures the spread of the middle 50% of your data, cutting off the top and bottom quarters.

  • Formula: IQR=Q3โˆ’Q1\text{IQR} = Q_3 - Q_1
  • Resistant to outliers because it ignores the extreme values in the upper and lower 25% of the distribution
  • It's also the basis for identifying outliers: any value below Q1โˆ’1.5ร—IQRQ_1 - 1.5 \times \text{IQR} or above Q3+1.5ร—IQRQ_3 + 1.5 \times \text{IQR} is flagged as a potential outlier

Compare: Range vs. IQR โ€” both measure spread using specific data points, but range uses extremes while IQR uses quartiles. If a question gives you a dataset with obvious outliers and asks which measure better represents typical spread, IQR is your answer.


Deviation-Based Measures

These measures calculate how far each data point falls from the mean, then summarize those deviations. They use every observation, making them more informative but also more sensitive to extreme values.

Variance

Variance measures the average squared deviation from the mean. Squaring serves two purposes: it eliminates negative deviations (which would otherwise cancel out), and it gives extra weight to points far from the mean.

  • Population variance: ฯƒ2=โˆ‘(xiโˆ’ฮผ)2N\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}
  • Sample variance: s2=โˆ‘(xiโˆ’xห‰)2nโˆ’1s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}

The difference in denominators matters. Sample variance divides by nโˆ’1n - 1 (called Bessel's correction) because a sample tends to underestimate the true population spread. Using nโˆ’1n - 1 corrects for this bias, giving you an unbiased estimate of the population variance.

The downside of variance is that its units are squared. If your data is in centimeters, variance is in square centimeters, which is hard to interpret directly.

Standard Deviation

Standard deviation is simply the square root of variance, which brings you back to the original units of your data.

  • Population: ฯƒ=ฯƒ2\sigma = \sqrt{\sigma^2}
  • Sample: s=s2s = \sqrt{s^2}

The empirical rule (68-95-99.7 rule) gives standard deviation its most useful interpretation for normal distributions:

  • About 68% of data falls within ยฑ1\pm 1 standard deviation of the mean
  • About 95% falls within ยฑ2\pm 2 standard deviations
  • About 99.7% falls within ยฑ3\pm 3 standard deviations

So if exam scores have a mean of 75 and a standard deviation of 10, roughly 68% of students scored between 65 and 85.

Mean Absolute Deviation (MAD)

MAD averages the absolute deviations from the mean instead of squaring them.

  • Formula: MAD=โˆ‘โˆฃxiโˆ’xห‰โˆฃn\text{MAD} = \frac{\sum |x_i - \bar{x}|}{n}
  • It represents the typical distance a data point sits from the mean, in original units
  • Because it doesn't square deviations, large outliers don't get amplified as much as they do with standard deviation

MAD is less commonly tested than standard deviation, but it's useful conceptually. It gives you a more intuitive sense of "average distance from the center."

Compare: Variance vs. Standard Deviation โ€” variance squares the units (making interpretation awkward), while standard deviation restores original units. Always report standard deviation when describing spread to an audience; use variance primarily in calculations and statistical formulas.


Standardized and Relative Measures

These measures let you compare variability across datasets with different scales or units. They answer the question: which dataset is relatively more spread out?

Coefficient of Variation (CV)

The CV expresses the standard deviation as a percentage of the mean.

  • Formula: CV=sxห‰ร—100%CV = \frac{s}{\bar{x}} \times 100\%
  • This makes it unit-free, so you can compare variability in heights (cm) versus weights (kg)
  • Only meaningful for ratio data with a true zero point. Don't use CV when the mean can be zero or negative, since dividing by zero is undefined and negative means make the percentage uninterpretable.

For example, if test scores have a mean of 75 and SD of 10, the CV is 1075ร—100%โ‰ˆ13.3%\frac{10}{75} \times 100\% \approx 13.3\%. If reaction times have a mean of 250 ms and SD of 40 ms, the CV is 40250ร—100%=16%\frac{40}{250} \times 100\% = 16\%. Even though reaction times have a larger SD in absolute terms, they also show greater relative variability.

Compare: Standard Deviation vs. Coefficient of Variation โ€” standard deviation measures absolute spread in original units, while CV measures relative spread as a percentage of the mean. Use CV when comparing variability between datasets with different units or vastly different means.


Position-Based Measures

These measures describe where data points fall within the distribution, helping you understand both spread and relative standing.

Percentiles and Quartiles

Percentiles divide data into 100 equal parts. The kkth percentile is the value below which k%k\% of observations fall. If you scored at the 90th percentile on a test, 90% of test-takers scored at or below your score.

Quartiles are three specific percentiles that split the data into four equal groups:

  • Q1Q_1 = 25th percentile (lower quartile)
  • Q2Q_2 = 50th percentile (the median)
  • Q3Q_3 = 75th percentile (upper quartile)

The five-number summary pulls these together with the extremes: minimum, Q1Q_1, median, Q3Q_3, maximum. This summary is the foundation of a box plot and gives you a quick picture of both center and spread.

Compare: Percentiles vs. Z-scores โ€” both describe position, but percentiles tell you what percentage of data falls below a value, while z-scores tell you how many standard deviations a value is from the mean. Percentiles work for any distribution shape; z-scores are most useful when you can connect them to a known distribution (like the normal curve).


Quick Reference Table

ConceptBest Examples
Simple spread (uses extremes)Range
Resistant to outliersIQR, MAD
Uses all data pointsVariance, Standard Deviation, MAD
Same units as original dataStandard Deviation, MAD, Range, IQR
Squared unitsVariance
Comparing across different scalesCoefficient of Variation
Describes position in distributionPercentiles, Quartiles
Used in the 1.5 ร— IQR outlier ruleIQR, Q1Q_1, Q3Q_3

Self-Check Questions

  1. A dataset contains one extreme outlier. Which two measures of spread would be most affected, and which two would be most resistant?

  2. You're comparing the variability of test scores (mean = 75, SD = 10) with the variability of reaction times in milliseconds (mean = 250, SD = 40). Which dataset shows greater relative variability, and what measure would you use to determine this?

  3. Explain why sample variance divides by nโˆ’1n-1 instead of nn. What problem does this correction solve?

  4. Compare and contrast standard deviation and IQR as measures of spread. Under what conditions would you choose one over the other?

  5. If a value falls at the 90th percentile, what does this tell you? How would you describe this same value's position using quartiles?