upgrade
upgrade

๐ŸŽฒIntro to Statistics

Measures of Dispersion

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

When you describe a dataset, the mean or median only tells half the storyโ€”you also need to know how spread out the values are. Measures of dispersion answer the critical question: how much do individual data points vary from the center? This concept underpins everything from understanding sampling variability to interpreting confidence intervals and hypothesis tests. On exams, you'll need to calculate these measures, choose the right one for different situations, and explain what they reveal about your data.

The key insight here is that different measures of spread serve different purposes. Some are sensitive to outliers, others resist them. Some preserve original units, others standardize for comparison. Don't just memorize formulasโ€”understand when to use each measure and what it tells you about the shape and variability of your distribution. That's what separates a correct answer from a complete one.


Simple Range-Based Measures

These measures use specific data points (like maximums, minimums, or quartiles) to capture spread. They're intuitive and easy to calculate, but they don't use every observation in the dataset.

Range

  • Calculated as maximum minus minimumโ€”the simplest possible measure of spread, requiring only two values
  • Highly sensitive to outliers since it depends entirely on the most extreme observations in your dataset
  • Formula: Range=Maximumโˆ’Minimum\text{Range} = \text{Maximum} - \text{Minimum}, useful for quick assessments but rarely sufficient alone

Interquartile Range (IQR)

  • Measures the spread of the middle 50% of dataโ€”calculated as the difference between the third and first quartiles
  • Resistant to outliers because it ignores the extreme values in the upper and lower 25% of the distribution
  • Formula: IQR=Q3โˆ’Q1\text{IQR} = Q_3 - Q_1, and it's the basis for identifying outliers using the 1.5 ร— IQR rule

Compare: Range vs. IQRโ€”both measure spread using specific data points, but range uses extremes while IQR uses quartiles. If an FRQ gives you a dataset with obvious outliers and asks which measure better represents typical spread, IQR is your answer.


Deviation-Based Measures

These measures calculate how far each data point falls from the mean, then summarize those deviations. They use every observation, making them more informative but also more sensitive to extreme values.

Variance

  • Measures the average squared deviation from the meanโ€”squaring eliminates negative values and emphasizes larger deviations
  • Population variance uses nn in the denominator; sample variance uses nโˆ’1n-1 (Bessel's correction) to produce an unbiased estimate
  • Formulas: ฯƒ2=โˆ‘(xiโˆ’ฮผ)2N\sigma^2 = \frac{\sum(x_i - \mu)^2}{N} for populations; s2=โˆ‘(xiโˆ’xห‰)2nโˆ’1s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1} for samples

Standard Deviation

  • The square root of varianceโ€”returns the measure of spread to the original units of your data
  • Interpretation: approximately 68% of data falls within one standard deviation of the mean in a normal distribution (empirical rule)
  • Formulas: ฯƒ=ฯƒ2\sigma = \sqrt{\sigma^2} for populations; s=s2s = \sqrt{s^2} for samplesโ€”know which symbol to use

Mean Absolute Deviation (MAD)

  • Averages the absolute deviations from the meanโ€”uses โˆฃxiโˆ’xห‰โˆฃ|x_i - \bar{x}| instead of squared differences
  • More intuitive interpretation than variance since it represents the typical distance from the mean in original units
  • Less commonly used on AP exams but important conceptuallyโ€”it's less sensitive to outliers than standard deviation

Compare: Variance vs. Standard Deviationโ€”variance squares the units (making interpretation awkward), while standard deviation restores original units. Always report standard deviation when describing spread; use variance primarily in calculations and statistical formulas.


Standardized and Relative Measures

These measures allow you to compare variability across datasets with different scales or unitsโ€”essential when asking "which dataset is relatively more spread out?"

Coefficient of Variation (CV)

  • Calculated as standard deviation divided by the mean, expressed as a percentage: CV=sxห‰ร—100%CV = \frac{s}{\bar{x}} \times 100\%
  • Enables comparison across different scalesโ€”you can compare variability in heights (cm) versus weights (kg)
  • Only meaningful for ratio data with a true zero point; don't use CV when the mean can be zero or negative

Compare: Standard Deviation vs. Coefficient of Variationโ€”standard deviation measures absolute spread in original units, while CV measures relative spread as a percentage of the mean. Use CV when comparing variability between datasets with different units or vastly different means.


Position-Based Measures

These measures describe where data points fall within the distribution, helping you understand both spread and relative standing.

Percentiles and Quartiles

  • Percentiles divide data into 100 equal partsโ€”the kkth percentile is the value below which k%k\% of observations fall
  • Quartiles are special percentiles: Q1Q_1 (25th percentile), Q2Q_2 (median, 50th percentile), and Q3Q_3 (75th percentile)
  • Five-number summary uses minimum, Q1Q_1, median, Q3Q_3, and maximum to describe distribution shape and spread

Compare: Percentiles vs. Z-scoresโ€”both describe position, but percentiles tell you what percentage of data falls below a value, while z-scores tell you how many standard deviations a value is from the mean. Percentiles work for any distribution; z-scores assume you're working with the standard deviation.


Quick Reference Table

ConceptBest Examples
Simple spread (uses extremes)Range
Resistant to outliersIQR, MAD
Uses all data pointsVariance, Standard Deviation, MAD
Same units as original dataStandard Deviation, MAD, Range, IQR
Squared unitsVariance
Comparing across different scalesCoefficient of Variation
Describes position in distributionPercentiles, Quartiles
Used in the 1.5 ร— IQR outlier ruleIQR, Q1Q_1, Q3Q_3

Self-Check Questions

  1. A dataset contains one extreme outlier. Which two measures of spread would be most affected, and which two would be most resistant?

  2. You're comparing the variability of test scores (mean = 75, SD = 10) with the variability of reaction times in milliseconds (mean = 250, SD = 40). Which dataset shows greater relative variability, and what measure would you use to determine this?

  3. Explain why sample variance divides by nโˆ’1n-1 instead of nn. What problem does this correction solve?

  4. Compare and contrast standard deviation and IQR as measures of spread. Under what conditions would you choose one over the other?

  5. If a value falls at the 90th percentile, what does this tell you? How would you describe this same value's position using quartiles?