Fiveable

📊AP Statistics Unit 1 Review

QR code for AP Statistics practice questions

1.9 Comparing Distributions of a Quantitative Variable

📊AP Statistics
Unit 1 Review

1.9 Comparing Distributions of a Quantitative Variable

Written by the Fiveable Content Team • Last updated September 2025
Verified for the 2026 exam
Verified for the 2026 examWritten by the Fiveable Content Team • Last updated September 2025
📊AP Statistics
Unit & Topic Study Guides
Pep mascot

We talked a lot about distributions and how to describesummarize, and represent them in alternative formats; now it's time to put more into practice by comparing multiple sets of data.

Comparing Groups with Stem-and-Leaf Plots: Warm Up

Before we dive deeper into AP-style questions, which are more descriptive and comprehensive in nature, let's do a warm up question using a familiar graphical method used in statistics: stem plots!

Question: The weight of two groups of eight animals, Group M and Group N, are recorded and the data is shown in the stem plots below (with each stem and leaf representing weight in kg). Use the stem plots to compare the weight of the animals in the two groups.

Group M:

1 | 4

2 | 3 4 8

3 | 2 6 8

4 |

5 | 0

Group N:

1 | 0 

2 | 3 6

3 | 5

4 | 1

5 | 4 7

6 | 2

To compare the two groups, we can look at the distribution of the data and compare the range of the data. From the stem plots, we can see that Group M has weights ranging from 14 to 50 kg, while Group N has weights ranging from 10 to 62 kg. Group N has a wider range of weights, with some animals being significantly heavier than the heaviest animal in Group M.

We can also look at the distribution of the data within each group to see if there are any patterns or trends. For example, we can see that Group M has a cluster of values in the 20s and 30s, while Group N has a more even distribution of values throughout the range. This suggests that Group M has a higher proportion of animals that are relatively similar in weight, while Group N has a more diverse range of weights.

Overall, the stem plots show that Group N has a wider range of weights compared to Group M, with a more diverse distribution of weights within the group!

Pep mascot
more resources to help you study

Comparing Groups with Histograms: Practice AP-Style Problem

Records are kept by each state in the United States on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers (P-T ratio) can be calculated for each state. The histograms below show the P-T ratio for every state during the 2001–2002 school year. The histogram on the left displays the ratios for the 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of the Mississippi River.

Source: The College Board (via AP Classroom)

The question asks us to estimate the median (not to compute but estimate). For states west of the Mississippi (n = 24), n/2, the median falls between the 12th and 13th value in the ordered list, and both the 12th and 13th values fall in the interval 15–16. For states east of the Mississippi (n = 26) the median falls between the 13th and 14th value in the ordered list, and both of these values also fall in the interval 15–16. So, both groups have median at least 15 or at most 16 students per teacher. 

b. Write a few sentences comparing the distributions of P-T ratios for states in the two groups (west and east) during the 2001–2002 school year.

Here, you apply the three things about the distribution: shape, center and spread one by one. Always start with shape first. The shapes of the two histograms look different. The histogram for West is unimodal and skewed to the right, whereas the histogram East is unimodal and nearly symmetric.

For the center we already found in part (a), that the medians of the two distributions are about the same, between 15 and 16 for both distributions.

Aaaaand finally, report the spread! Look at how the values are scattered or concentrated next to its center on the distributions. The histograms show that West values vary more than in East. Although the data are grouped but we still can approximate the range. The range for the west is at most 22 – 12 = 10, and the range for the east is at most 19 – 12 = 7. The east has less variability compared to the West.

c. Using your answers in parts (a) and (b), explain how you think the mean P-T ratio during the 2001–2002 school year will compare for the two groups (west and east).

The two histograms have different shapes. Since West is skewed to the right, the mean will be higher and greater than the median. The highest number on the right tail will affect the mean number. For East, since it is fairly symmetric, the mean will be close to the median. To compare the two groups, we can conclude that the mean for the west group will probably be greater than the mean for the east group.

Comparing Groups with Box Plots: Practice AP-Style Problem

A team of psychologists studied the concept of visualization in basketball, where players visualize making a basket before shooting the ball. They conducted an experiment in which 20 basketball players with similar abilities were randomly assigned to two groups. The 10 players in group 1 received visualization training, and the 10 players in group 2 did not.

Each player stood 22 feet from the basket at the same location on the basketball court. Each player was then instructed to attempt to make the basket until two consecutive baskets were made. The players who received visualization training were instructed to use visualization techniques before attempting to make the basket. The total number of attempts, including the last two attempts, were recorded for each player.

The total number of attempts for each of the 20 players are summarized in the following box plots.

Source: The College Board

We have two groups, with 10 basketball players randomly assigned to each group. 

We learn from the question that group 1 received visualization training but group 2. There are a few things here we can compare to find the answer to the question. We can see both groups have the same minimum attempts, and all other measures are different. 

25% of the time the group 1 made the basket in 3 trials but group 2 in 4 trials. 

Now, look at the median. The median is much lower for group 1 than for group 2. Group 1 has an outlier, which is still less than the maximum of group 2. We can see that the training had an impact on group 1, as all the 5 summary measures are less than from group 2; however, we are not asked to generalize this finding yet.  

Finally, to answer the question: it is good enough only to report the median. Because the median number of attempts for players who received visualization training (4) is less than the median number of attempts for players who did not receive training (7), those who received visualization training tend to need fewer attempts to make two consecutive baskets. Talk about talent and persistence!

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

TermDefinition
centerA measure indicating the middle or typical value of a distribution.
clusterConcentrations of data usually separated by gaps in a distribution.
gapRegions of a distribution between two data values where there are no observed data.
graphical representationsVisual displays such as bar charts, pie charts, or other graphs used to present data in a visual format.
histogramA graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance.
independent samplesTwo or more separate groups of data where the values in one group do not influence or depend on the values in another group.
meanThe average value of a dataset, represented by μ in the context of a population.
outlierData points that are unusually small or large relative to the rest of the data.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
side-by-side boxplotsA graphical representation that displays multiple boxplots arranged next to each other to compare the distributions of different groups or samples.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
variabilityThe spread or dispersion of data values in a distribution.

Frequently Asked Questions

How do I compare two histograms for my AP Stats exam?

Look at the two histograms and describe the same five features for each, then compare them directly: shape, center, spread (variability), outliers/gaps, and any clusters or modes. - Shape: note symmetry vs. skew (e.g., “both right-skewed; B is more strongly skewed”). - Center: give a measure (median or mean) and say which histogram has the larger center (e.g., “median around 120 ppm for B vs. ~75 ppm for A”). - Spread: compare range or IQR and mention which is more variable (e.g., “A has a wider range; B is tighter”). - Outliers/gaps: point out any isolated bars or empty intervals. - Modes/clusters: note unimodal/bimodal or clusters. Finish with a one-sentence conclusion in context (what that difference means for the variable). Use side-by-side boxplots or report means/SDs if asked for numerical comparison on the exam (Skill 2.D). For a quick Topic 1.9 review, see the Fiveable study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and more practice problems (https://library.fiveable.me/practice/ap-statistics).

What's the difference between side-by-side boxplots and regular boxplots?

A regular (single) boxplot shows one quantitative distribution with its five-number summary: min, Q1, median, Q3, max (and any outliers). A side-by-side boxplot just draws two or more boxplots on the same scale so you can compare independent samples directly. Use side-by-side boxplots when you want to compare center (medians), spread (IQRs), skewness, gaps/clusters, and outliers across groups—exactly what UNC-1.N in the CED expects (compare two or more independent samples). Tip: always line them up on the same axis so medians and IQRs are easy to read; look for differences in medians (center), IQR/whisker length (variability), and isolated points (outliers). For extra practice on comparing distributions, check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and the practice problems collection (https://library.fiveable.me/practice/ap-statistics).

When do I use mean vs median to compare two data sets?

Use the mean when the distribution is roughly symmetric with no strong outliers—the mean (with SD) summarizes center and spread well and is what inference about means uses on the exam. Use the median when the distribution is skewed or has outliers—the median (with IQR or range) gives a resistant measure of center. Quick rule of thumb: symmetric → mean ± SD; skewed/outliers → median and IQR. When comparing two samples on the AP exam, describe shape, center, and spread (and mention outliers/clusters/gaps)—e.g., “Sample A is right-skewed with a larger median; Sample B is roughly symmetric with a higher mean.” If you’re unsure or distributions differ, report both mean and median and justify which you prefer. For practice comparing graphs and summaries (histograms, side-by-side boxplots, mean/median, SD/IQR) see the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and the Unit 1 overview (https://library.fiveable.me/ap-statistics/unit-1). For more problems, try the practice set (https://library.fiveable.me/practice/ap-statistics).

I'm confused about how to describe the shape, center, and spread when comparing distributions - can someone help?

Think of comparing distributions as describing three things for each group, then stating how they differ. 1) Shape—note symmetry vs. skewness (right/left), modality (uni/bi), clusters or gaps, and outliers. e.g., “Group A is right-skewed with one cluster; Group B is roughly symmetric.” 2) Center—pick median for skewed data (resistant) or mean for roughly symmetric. Give a number or range: “Median A ≈ 40 ppm; median B ≈ 120 ppm.” 3) Spread—use IQR (and boxplots) for skewed data or SD/range for symmetric. Say which has more variability: “A has larger IQR and wider range than B.” When you compare, always mention context and at least one of shape/center/spread (CED UNC-1.N.1, UNC-1.O.1). Good phrasing: “Compared to B, A is more skewed right, has a lower center (median ≈ 50 vs. 120), and greater spread (IQR larger; more extreme values).” For practice, use the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz), the Unit 1 overview (https://library.fiveable.me/ap-statistics/unit-1), and try problems at (https://library.fiveable.me/practice/ap-statistics).

How do I know which graph to use when comparing multiple groups of data?

Pick the graph that highlights the features you need to compare (center, spread, shape, outliers, clusters/gaps) and that fits your sample sizes. - Side-by-side boxplots—best for quick comparisons of median, IQR, range, and outliers across multiple groups (use when you have moderate-to-large samples). (CED: UNC-1.N.1) - Histograms—use when you want to see shape and modality (skewness, bimodality) for each group; compare bins and use the same bin widths/scale. - Dotplots or stemplots—use for small samples (show individual values and clusters/gaps). - Overlaid kernel density plots or smoothed histograms—good for shape comparison if sample sizes are big and you can keep the same scale. Always use the same axis scales, label axes and units, and pair graphs with numerical summaries (mean/median, s or IQR, range, % outliers). The AP exam expects you to compare distributions (Skill 2.D in Unit 1), so be ready to describe shape, center, variability, and context. For a quick review, see the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz). For more practice problems, visit (https://library.fiveable.me/practice/ap-statistics).

What's the step-by-step process for comparing two distributions using summary statistics?

Step-by-step: 1. Pick appropriate displays first (side-by-side boxplots, histograms, dotplots) to spot shape, clusters, gaps, and outliers (CED UNC-1.N.1). 2. State shape for each (symmetric, right/left skewed, bimodal)—shape affects whether to use mean/sd or median/IQR. 3. Compare center: report medians (and/or means if roughly symmetric). Give actual numbers (e.g., medianA = 45, medianB = 52). 4. Compare spread: report IQRs for skewed data and standard deviations for roughly symmetric data; also say ranges. 5. Note outliers and gaps explicitly and say which group they’re in. 6. Summarize differences clearly in context: which group tends to be larger, which is more variable, and any practical significance. 7. If needed for AP tasks, connect to inference: mention sample sizes and that comparisons using means/sds require roughly symmetric shape or large n (AP exam checks conditions). Use CED keywords (mean, median, sd, IQR, outliers, skewness). For a quick refresher, see the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and practice problems (https://library.fiveable.me/practice/ap-statistics).

Can someone explain how to identify outliers when comparing boxplots side by side?

Look at each boxplot’s five-number summary (Q1, median, Q3, min, max) and use the IQR method. Calculate IQR = Q3 − Q1, then fences: lower = Q1 − 1.5·IQR and upper = Q3 + 1.5·IQR. Any points (or plotted dots) outside those fences are outliers; points beyond Q1 − 3·IQR or Q3 + 3·IQR are often called extreme outliers. On side-by-side boxplots, mark which groups have outliers, how many, and how far they lie from the box—that helps you compare variability and influence on center. Say, “Group A has two mild outliers above the upper fence; Group B has one extreme low outlier,” then comment on how those outliers affect medians and IQRs. This aligns with Topic 1.9 (compare center, variability, outliers) in the CED (UNC-1.N). For a quick refresher, check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

I don't understand how to write about variability differences between two data sets - what should I look for?

Focus on the spread you see and pick the right measure to compare it. Look at graphs (side-by-side boxplots, histograms, dotplots): note which distribution is more “spread out” (longer whiskers, wider histogram). Then quantify with numbers: range and SD show overall spread (SD is best for roughly symmetric data), IQR and box lengths show spread of the middle 50% and are better when distributions are skewed or have outliers. Point out outliers and gaps—a few extreme values can make SD large while IQR stays small. Always mention context and which measure you chose and why (e.g., “Region A has larger SD (12 vs. 5) but similar IQRs, likely because of a few large outliers”). AP graders expect both graphical and numerical comparison (use side-by-side boxplots and report IQR/SD as appropriate). For more examples and practice, check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and browse lots of practice questions (https://library.fiveable.me/practice/ap-statistics).

How do I calculate and compare standard deviations for multiple samples?

Calculate each sample’s standard deviation with the AP formula for a sample: s = sqrt[ Σ(xi − x̄)² / (n − 1) ]. Steps: find each sample’s mean x̄, subtract the mean from each observation, square and sum those deviations, divide by n−1, then take the square root. Do this separately for each sample to get s1, s2, … When you compare sample standard deviations: - Compare the numeric s values directly—larger s means more spread in the original units. Give the context (e.g., “Sample A s = 12 minutes, Sample B s = 5 minutes—A is more variable”). - Check shape and outliers first (skewness or extreme values inflate s). If distributions are skewed or have outliers, also report IQR or use side-by-side boxplots as CED Topic 1.9 recommends. - Note sample size: small n makes s less reliable. On the exam, you’ll often pair these numeric comparisons with graphs (histograms/boxplots) and mention center, variability, and outliers (CED UNC-1.N / UNC-1.O). For a quick refresh, see the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and practice lots of problems (https://library.fiveable.me/practice/ap-statistics).

When comparing distributions, do I always need to mention center, spread, and shape?

Short answer: usually, yes—but only include the pieces that matter for the question. What AP expects: the CED says when you compare quantitative distributions you should look at center, variability (spread), and shape—and also mention clusters, gaps, and outliers when they’re relevant (UNC-1.N.1, UNC-1.O.1). That means in many prompts you should explicitly compare center (mean/median), spread (SD/IQR/range), and shape (skewness, modality) plus any obvious outliers or gaps. That said: focus on what affects the comparison. If two distributions have the same shape and no outliers, a short comparison of centers and variability is enough. If one is skewed or has an outlier, call that out because it changes which center/measure you emphasize (median/IQR vs mean/SD). Always give context (what the variable is) and use numbers when available. Want more practice and examples that match the CED? Check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try problems at Fiveable’s practice page (https://library.fiveable.me/practice/ap-statistics).

What does it mean when one distribution has a higher IQR than another?

IQR = Q3 − Q1, so it measures the spread of the middle 50% of the data (the central box in a boxplot). If one distribution has a higher IQR than another, its middle 50% of values are more spread out—the typical/central observations are more variable and less tightly clustered. That tells you the distribution has greater variability (in the center) even if the medians are the same. Remember: IQR is robust to outliers, so a larger IQR usually reflects genuine increased spread of typical values, not just a few extreme points. On the AP exam you’ll often compare distributions using IQR (and boxplots) to describe variability per UNC-1.N (see the Topic 1.9 study guide) (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz). For more practice comparing spread, check the Unit 1 review (https://library.fiveable.me/ap-statistics/unit-1) and practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I describe clusters and gaps when I have multiple histograms to compare?

When you compare clusters and gaps across histograms, be specific and tie descriptions to context (CED: UNC-1.N). Say where clusters occur (values or bins) and how many, and point out any gaps or empty ranges. Short checklist and sentence templates: - Identify clusters: “Histogram A has a strong cluster around 40–55 (most bars high there), while Histogram B’s main cluster is 70–85.” - Note multiple clusters (bimodal): “Histogram C is bimodal with peaks near 20 and 60.” - Describe gaps: “Histogram A shows a clear gap from 56–69 (no observations), whereas B has no such gap.” - Compare frequency/strength: “A’s cluster around 40–55 is taller (higher relative frequency) than B’s cluster near 70.” - Mention context and overall features: “In terms of lead concentration, Region A clusters low while Region B clusters higher, and A has a wide gap in the midrange.” On the AP exam you’ll be expected to compare features (clusters, gaps, center, spread, outliers) concisely and in context—use exact ranges/bins and relative language (“larger”, “smaller”, “more pronounced”). For more examples and practice wording, check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

I keep mixing up which measures go with which type of distribution - how do I remember when to use mean/std dev vs median/IQR?

Use mean and standard deviation when the distribution is roughly symmetric with no strong outliers—those stats summarize center and spread well because the mean uses every value and s measures typical distance from that mean. Use median and IQR when the distribution is skewed or has outliers—the median/IQR resist extreme values and better describe a “typical” center and the middle 50% spread. Quick checklist for AP comparisons (Topic 1.9 / UNC-1.O & UNC-1.N): - Look at the graph first (histogram, boxplot, dotplot). If roughly symmetric → report mean ± SD. - If skewed or outliers → report median and IQR (or use side-by-side boxplots to compare IQRs). - Always describe shape, center, variability, and mention outliers or gaps (CED keywords). Want practice: review the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try problems at (https://library.fiveable.me/practice/ap-statistics).

What's the formula for comparing relative frequencies between different sample sizes?

Turn raw counts into comparable proportions: relative frequency (or sample proportion) = x/n (or % = 100·x/n). If you want to compare two samples formally, use the sampling distribution for the difference of proportions. For two independent samples with sample proportions p̂1 = x1/n1 and p̂2 = x2/n2: - Standard error (general) for p̂1 − p̂2: SE = sqrt[ p̂1(1 − p̂1)/n1 + p̂2(1 − p̂2)/n2 ]. - Test statistic for H0: p1 = p2 (use pooled proportion p̂c = (x1 + x2)/(n1 + n2)): SE_pooled = sqrt[ p̂c(1 − p̂c)(1/n1 + 1/n2) ], z = (p̂1 − p̂2) / SE_pooled. These formulas (and the one-sample percent/relative-frequency idea) are what AP expects when comparing relative frequencies—see the sampling distributions & standard error entries on the AP formula sheet. For a quick topic refresher check the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I write a good comparison paragraph for an FRQ about two data sets?

Start with one clear comparative sentence (overall which sample is larger or more spread out). Then hit the five CED pieces: shape, center, variability, outliers/gaps, and context. For example: - Overall: “Sample B tends to have higher lead concentrations than Sample A.” - Shape: “Both distributions are right-skewed (long right tail).” - Center: “The median for B is higher than A (say median B > median A), so typical values in B are larger.” - Variability/outliers: “A shows a larger range and more extreme high values, so A is more variable and has outliers.” - Conclude in context: “So, region B has higher typical lead levels, but region A shows more extreme contamination.” Always reference the graph or summary stats you used (median, mean, IQR, SD, range) and compare pints numerically when given. AP graders look for shape, center, spread, at least one comparison, and context—practice writing these with the Topic 1.9 study guide (https://library.fiveable.me/ap-statistics/unit-1/comparing-distributions-quantitative-variable/study-guide/2j5wKJg84ZKKN1T5CEmz) and try extra FRQs at (https://library.fiveable.me/practice/ap-statistics).