🎲Intro to Statistics

Descriptive Statistics Formulas

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Descriptive statistics aren't just formulas to memorize—they're tools for answering specific questions about data. Every exam question about descriptive statistics is really asking: What's typical? How spread out is the data? Where does this value fall relative to others? Understanding which formula answers which question is the difference between guessing and knowing exactly what the problem wants.

You're being tested on your ability to choose the right measure for a given situation and interpret what it tells you. A dataset with outliers? You need different tools than a symmetric distribution. Comparing variability across datasets with different units? There's a specific formula for that. Don't just memorize the calculations—know what each statistic reveals about the data and when to use it.

Measures of Center: Finding What's "Typical"

These formulas answer the question: What single value best represents this dataset? The key insight is that "typical" means different things depending on your data's shape and type.

Mean (Arithmetic Average)

Formula: $\bar{x} = \frac{\sum x_i}{n}$ —sum all values and divide by the count
Sensitive to outliers—a single extreme value can drag the mean away from where most data sits
Best for symmetric distributions without extreme values; serves as the "balance point" of the data

Median

The middle value when data is ordered—for even $n$ , average the two middle values
Resistant to outliers—makes it the preferred center measure for skewed distributions
Splits data into halves—exactly 50% of observations fall above and below

Mode

Most frequently occurring value—the only center measure usable with categorical (nominal) data
Can be multiple or none—datasets may be unimodal, bimodal, multimodal, or have no mode
Useful for identifying clusters—reveals where data concentrates, especially in discrete distributions

Compare: Mean vs. Median—both measure center, but median resists outliers while mean incorporates every value. If an FRQ describes income data or home prices (classic right-skewed scenarios), median is almost always the better choice.

Measures of Spread: Quantifying Variability

These formulas answer: How spread out is the data? Two datasets can have identical means but wildly different spreads—these measures capture that difference.

Range

Formula: $\text{Range} = \text{Max} - \text{Min}$ —the simplest spread measure
Uses only two values—ignores everything between the extremes, making it highly sensitive to outliers
Quick but crude—useful for a first glance, but tells you nothing about how values distribute

Variance

Formula: $s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$ —average of squared deviations from the mean
Squared units problem—if data is in dollars, variance is in "dollars squared," which isn't interpretable
Foundation for other statistics—variance underlies standard deviation, hypothesis tests, and regression analysis

Standard Deviation

Formula: $s = \sqrt{s^2}$ —the square root of variance, returning to original units
Measures typical distance from mean—roughly, how far a "typical" data point sits from $\bar{x}$
Pairs with mean—when you report mean, report standard deviation; they work together for symmetric data

Interquartile Range (IQR)

Formula: $\text{IQR} = Q_3 - Q_1$ —spans the middle 50% of data
Resistant to outliers—ignores the extreme 25% on each end
Pairs with median—when median is your center measure, IQR is your spread measure; used in boxplots

Compare: Standard Deviation vs. IQR—both measure spread, but SD uses every data point (sensitive to outliers) while IQR focuses on the middle 50% (resistant). Choose SD for symmetric data, IQR for skewed data or when outliers are present.

Position Measures: Locating Individual Values

These formulas answer: Where does this specific value fall within the distribution? They transform raw values into relative standing.

Percentiles

The $k$ th percentile is the value below which $k\%$ of observations fall
Quartiles are special percentiles— $Q_1$ = 25th percentile, $Q_2$ (median) = 50th, $Q_3$ = 75th
Context-dependent interpretation—90th percentile on a test is great; 90th percentile for blood pressure is concerning

Z-Score (Standard Score)

Formula: $z = \frac{x - \bar{x}}{s}$ —measures distance from mean in standard deviation units
Interpretation: $z = 2$ means the value is 2 standard deviations above the mean
Enables comparisons across scales—a z-score lets you compare performance on tests with different means and SDs

Compare: Percentiles vs. Z-scores—both describe position, but percentiles tell you what percentage of data falls below a value, while z-scores tell you how many standard deviations from the mean. Z-scores can be negative (below mean); percentiles cannot.

Shape Measures: Describing the Distribution

These formulas answer: What does the distribution look like? Shape affects which center and spread measures are appropriate.

Skewness

Measures asymmetry—how lopsided the distribution is relative to a symmetric bell curve
Direction of the tail—positive skew (right-skewed) has a long right tail; negative skew (left-skewed) has a long left tail
Affects mean-median relationship—in right-skewed data, mean > median; in left-skewed data, mean < median

Kurtosis

Measures tail heaviness—indicates how likely extreme values (outliers) are
High kurtosis (leptokurtic)—peaked center with heavy tails, more outliers than normal distribution
Low kurtosis (platykurtic)—flatter center with light tails, fewer extreme values

Compare: Skewness vs. Kurtosis—skewness describes left-right asymmetry (direction of tail), while kurtosis describes tail weight (likelihood of extreme values). Both help you understand why the mean might be misleading.

Standardized Comparison: Coefficient of Variation

This formula answers: Which dataset has more relative variability? Essential when comparing datasets with different units or vastly different means.

Coefficient of Variation (CV)

Formula: $CV = \frac{s}{\bar{x}} \times 100\%$ —standard deviation as a percentage of the mean
Unit-free comparison—lets you compare variability of heights (in cm) to weights (in kg)
Higher CV = more relative spread—a CV of 25% means the SD is a quarter of the mean

Compare: Standard Deviation vs. Coefficient of Variation—SD measures absolute spread in original units; CV measures relative spread as a percentage. Use CV when comparing variability across datasets with different scales or units.

Quick Reference Table

Concept	Best Formulas/Measures
Center (symmetric data)	Mean
Center (skewed data)	Median, Mode
Spread (symmetric data)	Standard Deviation, Variance
Spread (skewed data)	IQR, Range
Individual position	Z-score, Percentiles
Distribution shape	Skewness, Kurtosis
Comparing variability across scales	Coefficient of Variation
Outlier detection	IQR (1.5×IQR rule), Z-score (beyond ±2 or ±3)

Self-Check Questions

A dataset of household incomes in a city is strongly right-skewed. Which measure of center should you report, and which measure of spread pairs best with it?
Two students took different standardized tests. Student A scored 720 on a test with mean 500 and SD 100. Student B scored 28 on a test with mean 21 and SD 5. Who performed better relative to their test? Which formula helps you answer this?
Compare and contrast variance and standard deviation. Why do we bother calculating standard deviation when variance already measures spread?
You're comparing the consistency of two manufacturing processes: one produces bolts with mean length 10mm, the other produces beams with mean length 5000mm. Why would standard deviation alone be misleading, and what measure should you use instead?
A distribution has positive skewness and high kurtosis. Describe what this distribution looks like, and explain whether the mean or median would be larger.

🎲Intro to Statistics

Descriptive Statistics Formulas

Why This Matters

Measures of Center: Finding What's "Typical"

Mean (Arithmetic Average)

Median

Mode

Measures of Spread: Quantifying Variability

Range

Variance

Standard Deviation

Interquartile Range (IQR)

Position Measures: Locating Individual Values

Percentiles

Z-Score (Standard Score)

Shape Measures: Describing the Distribution

Skewness

Kurtosis

Standardized Comparison: Coefficient of Variation

Coefficient of Variation (CV)

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes