This topic is about reading scatterplots and asking the right question: when points spread out around a line, is that scatter just random variation, or is something non-random going on? Recognizing this difference is the starting point for the slope inference you do in Unit 9.
Why This Matters for the AP Statistics Exam
This topic sets up the whole idea behind inference for slopes. Before you build a confidence interval or run a test for a slope, you need to understand why the slope of a sample regression line can vary at all. Different random samples from the same population produce slightly different lines, and that variability is what makes inference necessary.
On the exam, you will see scatterplots and computer output and need to judge whether a pattern in the points is meaningful or just noise. Getting comfortable with the random vs. non-random question now makes the calculation and interpretation topics later in the unit much easier. This skill shows up in both multiple-choice questions and free-response work where you interpret relationships between two quantitative variables.

Key Takeaways
- Variation in where points fall relative to a line can be random or non-random, and telling them apart is the core skill of this topic.
- Random scatter around a line is expected; a clear pattern in how points miss the line suggests something systematic.
- Different random samples give slightly different slopes, so the sample slope is an estimate of the true population slope, not a fixed fact.
- Correlation does not prove causation, and an apparent relationship can come from random chance or a confounding variable.
- A linear model is only appropriate when the points actually follow a roughly straight-line pattern.
Reading Scatterplots: Random vs. Non-Random Variation
Unit 9 focuses on linear regression models, where you fit a line to paired quantitative data and study the relationship between two variables. Real data almost never falls exactly on a line, so points scatter around it. The key question in this topic is what that scatter is telling you.
Variation in points' positions relative to a theoretical line may be random or non-random.
- Random variation is the ordinary, unpredictable spread you expect even when a linear model fits well. The points bounce above and below the line with no clear pattern.
- Non-random variation is a systematic pattern in how the points miss the line. For example, the points might curve, fan out as x increases, or form separate clusters. That kind of pattern signals that a simple line may not capture the relationship well.
Spotting non-random patterns matters because the slope inference tools later in the unit assume a roughly linear relationship with consistent scatter. If you can see curvature or uneven spread, those assumptions may not hold.
Why Slopes Vary at All
It can feel strange that a slope "varies" since any single dataset gives exactly one line of best fit. The reason is sampling. Imagine every student in a physics class measures spring length for 10 different hanging masses and fits a least-squares regression line. Each student's slope would come out a little different, forming an approximately normal distribution centered at the true population slope.
That spread in possible slopes is what the rest of Unit 9 measures and accounts for. The sample slope you calculate is a point estimate of the true slope, so it carries uncertainty just like a sample mean or sample proportion does.
Correlation and Causation
When you fit a line, you are describing the correlation between two quantitative variables. Two cautions matter here.
First, an apparent correlation can come from random chance. If you plotted the day of the month against daily rainfall, you might see a pattern that looks like a relationship, even though there is no real connection. That is the same idea as random variation around a line: a pattern you see in one sample may not reflect anything real.
Second, correlation does not mean causation. Two variables can move together without one causing the other. A hot, sunny day raises ice cream sales and also raises sunburns, but ice cream does not cause sunburn. The hot sun is a confounding variable influencing both. Whenever you describe a relationship, think about other variables that could be driving the pattern.
Random Error vs. Systematic Error
The scatter around a line can also reflect different kinds of measurement error.
- Random error is unpredictable spread from factors outside your control. It pushes points above and below the line without a consistent direction.
- Systematic error is a predictable bias from a fixable cause, like a miscalibrated instrument. It tends to shift points in a consistent way and shows up as non-random variation.
Examples
Random error:
- Fluctuations in the power supply while using an electronic balance to weigh an object
- Temperature changes in the environment while conducting a chemical reaction
- Wind gusts affecting the flight of a thrown object
Systematic error:
- Using a ruler that is not made correctly to measure the length of an object
- Using a thermometer that has not been calibrated to measure the temperature of a solution
- Using a pipette that is not properly calibrated to dispense a precise volume of a liquid
How to Use This on the AP Statistics Exam
MCQ
Expect to look at a scatterplot or a residual plot and decide whether the scatter is random or shows a pattern. If the points curve or fan out, a straight-line model is questionable. If they bounce around the line with no clear shape, random variation is a reasonable description.
Free Response
When you interpret relationships between two quantitative variables, describe direction, strength, and form, and stay away from causal language unless the data came from a randomized experiment. Frame associations as predictions, not guarantees. For example, say a "predicted increase" rather than claiming one variable causes a change in the other.
Common Trap
Do not jump straight to a test or interval. This topic is about asking the right question first. Read carefully to see whether the prompt wants you to describe a pattern, judge whether a line fits, or actually run a procedure.
Common Misconceptions
- "The slope of a sample can't vary because there's only one line of best fit." For a single dataset there is one slope, but different random samples give different slopes. That sample-to-sample variability is the whole reason inference for slopes exists.
- "All scatter around a line means the model is bad." Some scatter is normal and expected. What signals a problem is a clear pattern in how points miss the line, like curvature or uneven spread, not random bouncing.
- "A correlation proves one variable causes the other." Correlation can come from a confounding variable or even random chance. Causation generally requires a well-designed randomized experiment.
- "Random error and systematic error are the same thing." Random error is unpredictable and has no consistent direction. Systematic error is a predictable bias from a fixable source and shows up as a non-random pattern.
Related AP Statistics Guides
- Unit 9 Overview: Slopes
- 9.2 Confidence Intervals for the Slope of a Regression Model
- 9.4 Setting Up a Test for the Slope of a Regression Model
- 9.3 Justifying a Claim About the Slope of a Regression Model Based on a Confidence Interval
- 9.5 Carrying Out a Test for the Slope of a Regression Model
- 9.6 Skills Focus: Selecting an Appropriate Inference Procedure
Vocabulary
The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.Term | Definition |
|---|---|
non-random variation | Variation in data points that follows a systematic or predictable pattern rather than occurring by chance. |
scatter plots | A graph that displays the relationship between two quantitative variables, with each point representing an observation. |
variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
Frequently Asked Questions
What is a scatter plot?
A scatter plot displays paired quantitative data with one variable on the x-axis and the other on the y-axis. It helps you see direction, form, strength, and unusual points.
What does random scatter around a line mean?
Random scatter means the points vary around a line without a clear pattern. Some scatter is expected even when a linear model is appropriate.
What does non-random variation in a scatterplot look like?
Non-random variation can look like curvature, fanning, clusters, or a consistent pattern in how points miss the line. Those patterns suggest a simple linear model may not fit well.
Why do sample slopes vary?
Different random samples from the same population can produce different regression lines, so the sample slope varies from sample to sample.
Does correlation prove causation?
No. Correlation describes an association between two quantitative variables, but causation generally requires a well-designed randomized experiment or stronger study design.
How does Topic 9.1 show up on AP Statistics?
You may interpret scatterplots, decide whether variation around a line appears random, or explain why slope inference is needed before doing formal calculations.