Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Descriptive statistics aren't just formulas to memorize. They're tools for answering specific questions about data. Every exam question on this topic is really asking one of three things: What's typical? How spread out is the data? Where does this value fall relative to others? Understanding which formula answers which question is the difference between guessing and knowing exactly what the problem wants.
You're being tested on your ability to choose the right measure for a given situation and interpret what it tells you. A dataset with outliers calls for different tools than a symmetric distribution. Comparing variability across datasets with different units? There's a specific formula for that. Don't just memorize the calculations. Know what each statistic reveals about the data and when to use it.
These formulas answer the question: What single value best represents this dataset? "Typical" means different things depending on your data's shape and type, so the right choice of center measure matters.
Compare: Mean vs. Median โ both measure center, but median resists outliers while mean incorporates every value. If a problem describes income data or home prices (classic right-skewed scenarios), median is almost always the better choice.
These formulas answer: How spread out is the data? Two datasets can have identical means but wildly different spreads. These measures capture that difference.
This is the average of the squared deviations from the mean. The in the denominator (instead of ) is called Bessel's correction, and it's used for sample variance to give a better estimate of the true population variance.
Compare: Standard Deviation vs. IQR โ both measure spread, but SD uses every data point (sensitive to outliers) while IQR focuses on the middle 50% (resistant). Choose SD for symmetric data, IQR for skewed data or when outliers are present.
These formulas answer: Where does this specific value fall within the distribution? They transform raw values into relative standing.
Here's how to interpret it: a means the value is 2 standard deviations above the mean, while means 1.5 standard deviations below the mean.
Compare: Percentiles vs. Z-scores โ both describe position, but percentiles tell you what percentage of data falls below a value, while z-scores tell you how many standard deviations from the mean a value sits. Z-scores can be negative (below the mean); percentiles cannot.
These measures answer: What does the distribution look like? Shape determines which center and spread measures are appropriate, so you need to assess it before choosing your tools.
Compare: Skewness vs. Kurtosis โ skewness describes left-right asymmetry (which direction the tail stretches), while kurtosis describes tail weight (how likely extreme values are). Both help you understand why the mean might be misleading.
This formula answers: Which dataset has more relative variability? It's essential when comparing datasets measured in different units or with very different means.
Compare: Standard Deviation vs. Coefficient of Variation โ SD measures absolute spread in original units; CV measures relative spread as a percentage. Use CV when comparing variability across datasets with different scales or units.
| Concept | Best Formulas/Measures |
|---|---|
| Center (symmetric data) | Mean |
| Center (skewed data) | Median, Mode |
| Spread (symmetric data) | Standard Deviation, Variance |
| Spread (skewed data) | IQR, Range |
| Individual position | Z-score, Percentiles |
| Distribution shape | Skewness, Kurtosis |
| Comparing variability across scales | Coefficient of Variation |
| Outlier detection | IQR (1.5รIQR rule), Z-score (beyond ยฑ2 or ยฑ3) |
A dataset of household incomes in a city is strongly right-skewed. Which measure of center should you report, and which measure of spread pairs best with it?
Two students took different standardized tests. Student A scored 720 on a test with mean 500 and SD 100. Student B scored 28 on a test with mean 21 and SD 5. Who performed better relative to their test? Which formula helps you answer this?
Compare variance and standard deviation. Why do we bother calculating standard deviation when variance already measures spread?
You're comparing the consistency of two manufacturing processes: one produces bolts with mean length 10mm, the other produces beams with mean length 5000mm. Why would standard deviation alone be misleading, and what measure should you use instead?
A distribution has positive skewness and high kurtosis. Describe what this distribution looks like, and explain whether the mean or median would be larger.