Fiveable

๐Ÿ“ŠHonors Statistics Unit 2 Review

QR code for Honors Statistics practice questions

2.4 Box Plots

2.4 Box Plots

Written by the Fiveable Content Team โ€ข Last updated August 2025
Written by the Fiveable Content Team โ€ข Last updated August 2025
๐Ÿ“ŠHonors Statistics
Unit & Topic Study Guides
Pep mascot

Box Plots

Box plots give you a visual summary of a dataset using just five key numbers. They're especially useful for quickly spotting the center, spread, and skewness of a distribution, and they make comparing multiple datasets straightforward when placed side by side.

Pep mascot
more resources to help you study

Construction of Box Plots

Every box plot is built from the five-number summary:

  • Minimum โ€” the smallest value in the dataset
  • Q1 (first quartile) โ€” the median of the lower half (25th percentile)
  • Median (Q2) โ€” the middle value of the entire dataset (50th percentile)
  • Q3 (third quartile) โ€” the median of the upper half (75th percentile)
  • Maximum โ€” the largest value in the dataset

To construct a box plot from these five numbers:

  1. Draw a number line that covers the range of your data.

  2. Draw a rectangular box from Q1 to Q3. This box represents the interquartile range (IQR), where IQR=Q3โˆ’Q1IQR = Q3 - Q1.

  3. Draw a vertical line inside the box at the median (Q2).

  4. Calculate the outlier boundaries: Q1โˆ’1.5ร—IQRQ1 - 1.5 \times IQR and Q3+1.5ร—IQRQ3 + 1.5 \times IQR. Any data point beyond these boundaries is an outlier and gets plotted as an individual dot.

  5. Draw whiskers from each edge of the box to the most extreme data point that is not an outlier. The whiskers do not necessarily extend to the minimum and maximum if outliers are present.

That last point trips people up. The whiskers reach to the farthest non-outlier values, not automatically to the min and max. If your dataset has outliers, the whiskers stop short and the outliers appear as separate points.

Construction of box plots, Box Plots โ€“ Building Skills for Data Science

Interpretation of Box Plot Features

The box captures the middle 50% of the data. A wider box means more variability in that central portion; a narrow box means the middle half of values are tightly clustered.

The median line shows the center of the distribution. Its position within the box tells you about skewness:

  • Median closer to Q1 โ†’ the data is right-skewed (the upper half of values is more spread out)
  • Median closer to Q3 โ†’ the data is left-skewed (the lower half of values is more spread out)
  • Median roughly centered in the box โ†’ the distribution is approximately symmetric

The whiskers show how far the data extends beyond the middle 50%, excluding outliers. A longer whisker on one side reinforces the direction of skew. For example, if the right whisker is much longer than the left, that's another sign of right skew.

Outliers are individual points plotted beyond the whiskers. These represent unusually extreme values. A single outlier might just be an unusual observation, but multiple outliers on one side can signal a strongly skewed distribution or data entry errors worth investigating.

Construction of box plots, Box Plots โ€“ Building Skills for Data Science

Data Distribution and Variability

Box plots summarize four key aspects of a distribution at a glance: center (median), spread (IQR and range), shape (skewness from median position and whisker lengths), and unusual values (outliers). The quartiles divide the data into four groups, each containing roughly 25% of the observations.

One limitation to keep in mind: box plots don't show the exact shape of the distribution the way a histogram does. Two datasets can produce identical box plots but have very different frequency patterns within each quartile. Box plots trade that detail for the ability to compare groups efficiently.

Comparison Using Side-by-Side Box Plots

Placing box plots on the same scale lets you compare multiple groups directly. For example, you might compare exam scores across three class sections. Here's what to look at:

  1. Compare medians โ€” Which group has a higher or lower center? If one box plot's median line sits clearly above another's, that group tends to have higher values.
  2. Compare IQRs (box lengths) โ€” A longer box means more variability in the middle 50%. If Class A's box is twice as wide as Class B's, the scores in Class A are more spread out.
  3. Compare whisker lengths โ€” Longer whiskers indicate a wider overall range (excluding outliers). Unequal whisker lengths between groups suggest different amounts of variability in the tails.
  4. Compare outliers โ€” Note which groups have outliers and on which side. A group with several low outliers might have a few students who struggled significantly compared to their peers.

When writing comparison statements, be specific. Instead of "Class A did better," say something like "Class A's median score (82) was higher than Class B's (74), and Class A's IQR was narrower, suggesting more consistent performance."

Side-by-side box plots are one of the most efficient ways to compare distributions across groups, which is why they show up frequently on assessments. Practice reading them quickly by always checking center, spread, shape, and outliers in that order.