Fiveable

🎲Intro to Statistics Unit 2 Review

QR code for Intro to Statistics practice questions

2.3 Measures of the Location of the Data

2.3 Measures of the Location of the Data

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Measures of Central Tendency and Spread

Quartiles, percentiles, and the median help you figure out where individual data points sit relative to the rest of the dataset. The interquartile range (IQR) tells you how spread out the middle chunk of data is and gives you a method for flagging outliers. Together, these tools let you describe a dataset's center, spread, and shape without relying on a single number.

Quartiles and Percentiles Calculation

Quartiles divide an ordered dataset into four equal parts. Each quartile corresponds to a specific percentile:

  • First quartile (Q1) is the 25th percentile: 25% of data falls below this value.
  • Second quartile (Q2) is the 50th percentile, also called the median: 50% of data falls below this value.
  • Third quartile (Q3) is the 75th percentile: 75% of data falls below this value.

More generally, a percentile tells you what percentage of data falls below a certain value. If you're at the 90th percentile on a test, you scored higher than 90% of test-takers.

To calculate the kkth percentile:

  1. Arrange the data in ascending order (smallest to largest).
  2. Calculate the locator using the formula: L=k100(n+1)L = \frac{k}{100}(n + 1), where nn is the total number of data points.
  3. If LL is a whole number, the kkth percentile is the data value at position LL.
  4. If LL is not a whole number, interpolate between the two nearest data values. For example, if L=7.5L = 7.5, average the 7th and 8th data values.

Note: different textbooks use slightly different percentile formulas. The one above is common in intro courses, but your instructor may use a variation. Check which formula your class expects.

The five-number summary pulls these ideas together: minimum, Q1, median, Q3, and maximum. It gives you a quick snapshot of how the data is distributed.

Median as a Measure of Central Tendency

The median is the middle value when you line up all data points from smallest to largest.

  • For an odd number of values, the median is the exact middle value. In {1, 2, 3, 4, 5}, the median is 3.
  • For an even number of values, the median is the average of the two middle values. In {1, 2, 3, 4}, the median is 2+32=2.5\frac{2 + 3}{2} = 2.5.

Why use the median instead of the mean? The median is resistant to outliers. A few extreme values won't drag it up or down the way they would with the mean. This makes the median a better choice for skewed distributions. Income data is a classic example: a handful of very high earners can pull the mean well above what most people actually earn, while the median stays closer to the "typical" value.

For reference, the other common measures of central tendency are the mean (the arithmetic average of all values) and the mode (the most frequently occurring value).

Quartiles and percentiles calculation, Normal Distribution and Percentiles | AllAboutLean.com

Interquartile Range for Outlier Identification

The interquartile range (IQR) measures the spread of the middle 50% of data:

IQR=Q3Q1IQR = Q3 - Q1

Because it only looks at the middle half, the IQR isn't affected by extreme values at either end. That makes it a more robust measure of spread than the range (which uses only the minimum and maximum).

The IQR also gives you a standard method for identifying potential outliers using the 1.5 × IQR rule:

  • Lower outliers: values less than Q11.5×IQRQ1 - 1.5 \times IQR
  • Upper outliers: values greater than Q3+1.5×IQRQ3 + 1.5 \times IQR

For example, suppose Q1=10Q1 = 10, Q3=20Q3 = 20, so IQR=10IQR = 10. Any value below 101.5×10=510 - 1.5 \times 10 = -5 or above 20+1.5×10=3520 + 1.5 \times 10 = 35 would be flagged as a potential outlier.

These boundaries are sometimes called fences. Values beyond them aren't automatically "bad" data; they just deserve a closer look.

Additional Measures of Spread

  • Standard deviation measures the average distance of data points from the mean. A larger standard deviation means data points are more spread out from the center.
  • Variance is the square of the standard deviation (variance=σ2\text{variance} = \sigma^2). It captures the same idea but in squared units, which is why standard deviation is usually easier to interpret.
Quartiles and percentiles calculation, Interquartile Range and Boxplots (1 of 3) | Concepts in Statistics

Using Measures of Location and Spread

Interpreting Quartiles and Percentiles

Quartiles and percentiles let you place a single data point in context. Consider a set of exam scores where Q1=50Q1 = 50, the median = 75, and Q3=90Q3 = 90:

  • 25% of students scored below 50.
  • Half the students scored below 75.
  • 75% of students scored below 90.

If a student scored in the 90th percentile, that student performed better than 90% of peers. Standardized tests like the SAT report percentiles for exactly this reason: they tell you where you stand relative to everyone else, not just your raw score.

Median and IQR Describe Dataset Characteristics

The median and IQR work as a pair to describe a dataset's center and spread.

The median tells you the typical value. A median household income of $100,000 suggests a relatively wealthy area, while a median age of 25 suggests a young population. Context matters for interpreting whether a median is "high" or "low."

The IQR tells you how tightly data clusters around that center. An IQR of 2 points on a test means most students scored very close to each other. An IQR of 20 years in age data means there's a wide range of ages in the middle half of the group.

Together, the median and IQR can reveal skewness. If the distance from the median to Q3 is much larger than the distance from Q1 to the median, the data is likely right-skewed (a long tail stretching toward higher values). The reverse pattern suggests left skew.

A box plot is the visual representation of the five-number summary. The box spans from Q1 to Q3 (showing the IQR), a line inside the box marks the median, and "whiskers" extend to the minimum and maximum (or to the fences, with outliers plotted as individual points). Box plots make it easy to compare distributions across groups at a glance.