In AP Statistics, an outlier is a data point that is unusually small or large relative to the rest of the data, identified using the 1.5×IQR rule (beyond Q1 − 1.5×IQR or Q3 + 1.5×IQR) or by lying 2+ standard deviations from the mean; in regression, it's a point with a large residual that breaks the trend.
An outlier is a data point that sits unusually far from the rest of the dataset. The CED gives you two official ways to flag one (EK 1.7.C). First, the 1.5×IQR rule says a value is an outlier if it falls more than 1.5×IQR above Q3 or more than 1.5×IQR below Q1. Second, the standard deviation rule says a value 2 or more standard deviations above or below the mean counts as an outlier. On the exam, the 1.5×IQR rule is the one you'll compute by hand most often, especially with boxplots.
Outliers matter because of what they do to your summary statistics. The mean, standard deviation, and range are nonresistant (non-robust), meaning one extreme value can drag them around. The median and IQR are resistant, meaning outliers barely touch them. That's why a right-skewed salary dataset gets reported with a median, not a mean. The term also shows up in two-variable data with a twist. An outlier in regression is a point with a large residual that doesn't follow the trend of the rest of the data, which is a different idea from a high-leverage point (extreme x-value) or an influential point (changes the line substantially if removed).
Outliers live primarily in Unit 1 (Exploring One-Variable Data), where LO 1.6.A requires you to mention unusual features like outliers, gaps, and clusters whenever you describe a distribution, and LO 1.7.C asks you to justify choosing median/IQR over mean/SD because of resistance. They reappear in Unit 2 through LO 2.9.A, where you distinguish outliers from high-leverage and influential points in regression, and where a single point can shift the correlation r (LO 2.5.A/B). Then they go quiet until inference. In Units 7 and 9, checking conditions for a two-sample t-test (LO 7.8.C) or a t-test for slope (LO 9.4.C) means examining your data for skew and outliers, because strong outliers in small samples wreck the approximate-normality assumption. So one Unit 1 concept ends up gatekeeping whether your Unit 7 and 9 procedures are even valid.
Keep studying AP Statistics Unit 2
Influential Points (Unit 2)
An outlier in regression has a large residual; an influential point changes the LSRL substantially when you remove it. Outliers and high-leverage points are often influential, but the three labels are not interchangeable, and Topic 2.9 MCQs love testing whether you know the difference.
1.5×IQR Rule (Unit 1)
This is the fence-building formula behind boxplots. Anything beyond Q1 − 1.5×IQR or Q3 + 1.5×IQR gets plotted as its own dot, and the whiskers stop at the most extreme non-outlier value, not at the min and max.
Resistant Statistics: Median and IQR (Unit 1)
Outliers are the whole reason 'resistant' matters. One billionaire moving into a neighborhood yanks the mean income up but barely budges the median, which is why skewed-with-outliers data gets summarized with median and IQR.
Conditions for t-Tests (Units 7 & 9)
Before running a two-sample t-test or a t-test for slope with a small sample, you check graphs of the data for strong skew or outliers. An outlier in a sample of n = 12 can invalidate the normality condition, so the Unit 1 skill of spotting outliers becomes a required step in inference.
Outliers show up in three reliable ways. First, in describe-the-distribution questions, where you must address shape, center, variability, AND unusual features. The 2019 FRQ Q1 (room sizes histogram) and 2021 FRQ Q1 (hospital length of stay, which explicitly flags 'unusually short or long' values) both reward naming and quantifying potential outliers. Second, in resistance questions, like the practice item asking which statistic a max of 89 affects most when the rest of the data sits between 12 and 31 (answer: the mean, range, or SD, never the median or IQR). Third, in condition-checking for inference, as in 2023 FRQ Q4, where a small-sample comparison means you must verify the data show no strong skew or outliers before trusting a t-procedure. Expect to actually compute the 1.5×IQR fences, so know Q1 − 1.5×IQR and Q3 + 1.5×IQR cold, and always state your conclusion in context.
In regression, these are three distinct labels. An outlier has a large residual (it's far from the LSRL vertically). A high-leverage point has an extreme x-value (far horizontally from the other points). An influential point is any point whose removal substantially changes the slope, intercept, or r. Here's the trap: a high-leverage point that falls right on the trend line can be influential while having a tiny residual, so it's NOT an outlier. The CED keeps these definitions separate in EK 2.9.A, and so do MCQs.
The 1.5×IQR rule flags any value above Q3 + 1.5×IQR or below Q1 − 1.5×IQR as an outlier, and a second accepted method flags values 2 or more standard deviations from the mean.
The mean, standard deviation, and range are nonresistant and get pulled toward outliers, while the median and IQR are resistant and barely change.
When a distribution is skewed or has outliers, report the median and IQR as your measures of center and spread, and be ready to justify that choice in writing.
In regression, an outlier is a point with a large residual, which is different from a high-leverage point (extreme x-value) and an influential point (removing it substantially changes the line).
Outliers in a boxplot are plotted as individual symbols beyond the whiskers, and the whiskers extend only to the most extreme values that are not outliers.
For t-tests in Units 7 and 9 with small samples, you must check the data for outliers and strong skew, because they can violate the approximate-normality condition.
An outlier is a data point that is unusually small or large relative to the rest of the data. AP Stats accepts two identification methods: the 1.5×IQR rule (beyond Q1 − 1.5×IQR or Q3 + 1.5×IQR) and the rule that a value 2 or more standard deviations from the mean is an outlier.
No. The AP exam never asks you to silently delete outliers, and doing so without justification loses credit. Your job is to identify them, comment on their effect (like pulling the mean toward the tail), and sometimes compare analyses with and without the point.
An outlier in regression has a large residual, meaning it sits far from the LSRL. An influential point is one whose removal substantially changes the slope, intercept, or correlation. They often overlap, but a high-leverage point sitting right on the trend line can be influential without being an outlier at all.
The mean. The mean, standard deviation, and range are nonresistant, so a single extreme value drags them toward it. The median and IQR are resistant, which is why skewed data like salaries (a classic exam example) gets summarized with the median.
A single outlier can substantially increase or decrease r, since correlation is built from nonresistant means and standard deviations. That's part of why a value of r close to 1 or −1 doesn't by itself prove a linear model is appropriate; you still need to look at the scatterplot and residuals.