Descriptive statistics aren't just formulas to memorize. They're tools for answering specific questions about data. Every exam question on this topic is really asking one of three things: What's typical? How spread out is the data? Where does this value fall relative to others? Understanding which formula answers which question is the difference between guessing and knowing exactly what the problem wants.
You're being tested on your ability to choose the right measure for a given situation and interpret what it tells you. A dataset with outliers calls for different tools than a symmetric distribution. Comparing variability across datasets with different units? There's a specific formula for that. Don't just memorize the calculations. Know what each statistic reveals about the data and when to use it.
Measures of Center: Finding What's "Typical"
These formulas answer the question: What single value best represents this dataset? "Typical" means different things depending on your data's shape and type, so the right choice of center measure matters.
Mean (Arithmetic Average)
Formula:xห=nโxiโโ โ add up all values and divide by the count
Sensitive to outliers โ a single extreme value can drag the mean away from where most data sits. For example, if five friends earn $40k, $42k, $45k, $43k, and $500k, the mean is $134k, which doesn't represent anyone in the group well.
Best for symmetric distributions without extreme values; it acts as the "balance point" of the data
Median
The middle value when data is ordered from smallest to largest. For an even number of values, average the two middle ones.
Resistant to outliers โ this makes it the preferred center measure for skewed distributions
Splits data into halves โ exactly 50% of observations fall above and 50% fall below
Mode
Most frequently occurring value โ the only center measure you can use with categorical (nominal) data
Can be multiple or none โ datasets may be unimodal (one mode), bimodal (two modes), multimodal, or have no mode at all
Useful for identifying clusters โ reveals where data concentrates, especially in discrete distributions
Compare: Mean vs. Median โ both measure center, but median resists outliers while mean incorporates every value. If a problem describes income data or home prices (classic right-skewed scenarios), median is almost always the better choice.
Measures of Spread: Quantifying Variability
These formulas answer: How spread out is the data? Two datasets can have identical means but wildly different spreads. These measures capture that difference.
Range
Formula:Range=MaxโMin
Uses only two values โ it ignores everything between the extremes, making it highly sensitive to outliers
Quick but crude โ useful for a first glance, but tells you nothing about how values are distributed in between
Variance
Formula:s2=nโ1โ(xiโโxห)2โ
This is the average of the squared deviations from the mean. The nโ1 in the denominator (instead of n) is called Bessel's correction, and it's used for sample variance to give a better estimate of the true population variance.
Squared units problem โ if your data is in dollars, variance is in "dollars squared," which isn't directly interpretable
Foundation for other statistics โ variance underlies standard deviation, hypothesis tests, and regression analysis
Standard Deviation
Formula:s=s2โ โ the square root of variance, which brings you back to the original units
Measures typical distance from the mean โ roughly, how far a "typical" data point sits from xห
Pairs with mean โ when you report the mean as your center, report standard deviation as your spread measure. They work together for symmetric data.
Interquartile Range (IQR)
Formula:IQR=Q3โโQ1โ โ this spans the middle 50% of data
Resistant to outliers โ it ignores the most extreme 25% on each end
Pairs with median โ when median is your center measure, IQR is your spread measure. This pairing shows up constantly in boxplots.
Compare: Standard Deviation vs. IQR โ both measure spread, but SD uses every data point (sensitive to outliers) while IQR focuses on the middle 50% (resistant). Choose SD for symmetric data, IQR for skewed data or when outliers are present.
Position Measures: Locating Individual Values
These formulas answer: Where does this specific value fall within the distribution? They transform raw values into relative standing.
Percentiles
The kth percentile is the value below which k% of observations fall
Quartiles are special percentiles:Q1โ = 25th percentile, Q2โ (median) = 50th percentile, Q3โ = 75th percentile
Context matters for interpretation โ scoring at the 90th percentile on an exam is great, but being at the 90th percentile for blood pressure is a health concern
Z-Score (Standard Score)
Formula:z=sxโxหโ โ measures how far a value is from the mean, in units of standard deviations
Here's how to interpret it: a z=2 means the value is 2 standard deviations above the mean, while z=โ1.5 means 1.5 standard deviations below the mean.
Enables comparisons across different scales โ z-scores let you compare performance on tests with completely different means and standard deviations. You convert both scores to the same scale.
Compare: Percentiles vs. Z-scores โ both describe position, but percentiles tell you what percentage of data falls below a value, while z-scores tell you how many standard deviations from the mean a value sits. Z-scores can be negative (below the mean); percentiles cannot.
Shape Measures: Describing the Distribution
These measures answer: What does the distribution look like? Shape determines which center and spread measures are appropriate, so you need to assess it before choosing your tools.
Skewness
Measures asymmetry โ how lopsided the distribution is compared to a symmetric bell curve
Named for the direction of the tail โ positive skew (right-skewed) has a long right tail; negative skew (left-skewed) has a long left tail. Think of it this way: the tail points toward the skew's name.
Affects the mean-median relationship โ in right-skewed data, mean > median because the mean gets pulled toward the long tail. In left-skewed data, mean < median.
Kurtosis
Measures tail heaviness โ it indicates how likely extreme values (outliers) are compared to a normal distribution
High kurtosis (leptokurtic) โ peaked center with heavy tails, meaning more outliers than a normal distribution would produce
Low kurtosis (platykurtic) โ flatter center with light tails, meaning fewer extreme values
Compare: Skewness vs. Kurtosis โ skewness describes left-right asymmetry (which direction the tail stretches), while kurtosis describes tail weight (how likely extreme values are). Both help you understand why the mean might be misleading.
Standardized Comparison: Coefficient of Variation
This formula answers: Which dataset has more relative variability? It's essential when comparing datasets measured in different units or with very different means.
Coefficient of Variation (CV)
Formula:CV=xหsโร100% โ expresses the standard deviation as a percentage of the mean
Unit-free comparison โ this lets you compare the variability of heights (in cm) to weights (in kg), which raw standard deviations can't do
Higher CV = more relative spread โ a CV of 25% means the standard deviation is one quarter of the mean
Compare: Standard Deviation vs. Coefficient of Variation โ SD measures absolute spread in original units; CV measures relative spread as a percentage. Use CV when comparing variability across datasets with different scales or units.
Quick Reference Table
Concept
Best Formulas/Measures
Center (symmetric data)
Mean
Center (skewed data)
Median, Mode
Spread (symmetric data)
Standard Deviation, Variance
Spread (skewed data)
IQR, Range
Individual position
Z-score, Percentiles
Distribution shape
Skewness, Kurtosis
Comparing variability across scales
Coefficient of Variation
Outlier detection
IQR (1.5รIQR rule), Z-score (beyond ยฑ2 or ยฑ3)
Self-Check Questions
A dataset of household incomes in a city is strongly right-skewed. Which measure of center should you report, and which measure of spread pairs best with it?
Two students took different standardized tests. Student A scored 720 on a test with mean 500 and SD 100. Student B scored 28 on a test with mean 21 and SD 5. Who performed better relative to their test? Which formula helps you answer this?
Compare variance and standard deviation. Why do we bother calculating standard deviation when variance already measures spread?
You're comparing the consistency of two manufacturing processes: one produces bolts with mean length 10mm, the other produces beams with mean length 5000mm. Why would standard deviation alone be misleading, and what measure should you use instead?
A distribution has positive skewness and high kurtosis. Describe what this distribution looks like, and explain whether the mean or median would be larger.