A scatterplot is a graph that displays bivariate quantitative data, plotting one point per individual with the explanatory variable on the x-axis and the response variable on the y-axis. On the AP exam, you describe it using form, direction, strength, and unusual features (DUFS).
A scatterplot is the standard graph for bivariate quantitative data, meaning two numeric measurements taken on the same individuals. Each dot represents one individual. Its x-coordinate is the value of the explanatory variable (the one used to predict) and its y-coordinate is the value of the response variable (the one being predicted). That setup comes straight from Topic 2.4 of the CED.
When the AP exam asks you to "describe the relationship" in a scatterplot, it wants four specific things: direction (positive or negative association), form (linear or nonlinear), strength (how tightly the points cluster around a pattern), and unusual features (outliers, clusters, gaps). A positive association means y tends to increase as x increases; negative means y tends to decrease. Everything else in Unit 2, from the correlation coefficient to the least-squares regression line to residual plots, is built on top of what you see in this graph. Read the scatterplot first, model second.
Scatterplots live in Unit 2: Exploring Two-Variable Data, anchoring learning objectives 2.4.A (represent bivariate quantitative data using scatterplots) and 2.4.B (describe form, direction, strength, and unusual features). But they don't stay there. Topic 2.6 uses the scatterplot's pattern to justify a linear regression model, Topic 2.7 turns the scatterplot's leftover vertical distances into residual plots, and Topic 2.9 uses scatterplots to spot outliers, high-leverage points, and influential points. Then Unit 9 brings the scatterplot back for inference. Learning objective 9.1.A asks you to identify questions suggested by variation in scatterplots, specifically whether the scatter around a line looks random or non-random. That question is the entire motivation for testing and building confidence intervals for the slope (9.3.A and 9.3.B). If you can read a scatterplot well, you have a head start on roughly a quarter of the course.
Keep studying AP Statistics Unit 2
Residuals and Residual Plots (Unit 2)
A residual plot is what's left of a scatterplot after you subtract out the regression line. Each residual is y minus ลท, and plotting those leftovers against x tells you whether a line was the right model. Random scatter in the residual plot means the linear form you saw in the scatterplot is legit (LO 2.7.B).
Linear Regression and the LSRL (Unit 2)
The least-squares regression line ลท = a + bx is fit directly to the scatterplot's points. The scatterplot also defines the safe zone for prediction. Plugging in an x-value beyond the range of the plotted data is extrapolation, and the CED warns those predictions get less reliable the further out you go (2.6.A).
Outliers and Influential Points (Unit 2)
You can only spot regression outliers (big residuals) and high-leverage points (extreme x-values) by looking at the scatterplot. An influential point is one that visibly drags the line, slope, or correlation when removed, which is exactly the kind of before-and-after comparison exam questions love (LO 2.9.A).
Inference for Slopes (Unit 9)
Unit 9 asks the question a single scatterplot can't answer alone. Is the pattern you see real, or just random sampling noise? Variation in points around a line might be random or non-random (9.1.A), and a confidence interval for the slope lets you justify a claim about the true population relationship (9.3.B).
Correlation Coefficient (Unit 2)
The correlation r puts a number on the direction and strength you see in a linear scatterplot. But r only makes sense if the form is actually linear, so always look at the plot before trusting the number. A curved pattern can hide behind a high r.
Scatterplots show up almost every year on the FRQ section, usually as Question 1. The 2017 exam gave a scatterplot of wolf length versus weight, 2018 used checkout times versus number of customers in line, and 2022 (bullfrog length versus mass) made you compare a scatterplot of original data to one of transformed data. Typical tasks include describing the association (use direction, form, strength, and unusual features, in context, every time), interpreting slope or predicted values from the plotted line, and judging whether a linear model is appropriate. Multiple-choice questions push the same skills with a twist. You might see a plot where one point's removal drops r from 0.85 to 0.45 (that's an influential point), or a fitted line where points fall below it at low and high x-values but above it in the middle (that's a curved pattern, so the linear model misses the form). The single biggest scoring habit is writing your description in context with variable names, not just "strong positive linear."
A scatterplot shows the raw data, y against x, and is where you first describe the association. A residual plot shows the leftovers after fitting a model, residual (y - ลท) against x or against ลท. They answer different questions. The scatterplot asks "what does the relationship look like?" while the residual plot asks "was my model the right choice?" A curved pattern can be subtle in a scatterplot but scream at you in the residual plot, which is exactly why the CED makes residual plots the official tool for checking linearity (2.7.B).
A scatterplot displays bivariate quantitative data with one point per individual, the explanatory variable on the x-axis, and the response variable on the y-axis.
Describe every scatterplot using four things: direction (positive or negative), form (linear or nonlinear), strength, and unusual features, always in context.
The scatterplot comes before the math. Correlation and the LSRL only make sense after you've confirmed the form looks roughly linear.
Predicting with an x-value outside the range of the scatterplot's data is extrapolation, and those predictions get less reliable the further you go.
Points that don't follow the trend (outliers), points with extreme x-values (high leverage), and points that change the line when removed (influential) are all spotted on the scatterplot.
In Unit 9, scatter around a line raises the inference question of whether the pattern is random or real, which confidence intervals for the slope help answer.
A scatterplot is a graph of bivariate quantitative data where each point represents one individual, with the explanatory variable on the x-axis and the response variable on the y-axis. It's the foundation of Unit 2 and reappears in Unit 9 for slope inference.
No. A clear pattern in a sample scatterplot could still be random variation, which is the whole point of Topic 9.1. You need inference, like a confidence interval for the slope, to justify a claim about the population relationship.
A scatterplot graphs the raw data (y versus x), while a residual plot graphs the differences between actual and predicted values (y - ลท) against x or ลท. You use the scatterplot to describe the association and the residual plot to check whether a linear model fits.
Hit four components: direction, form, strength, and unusual features, and name the actual variables in context. For example, "there is a strong, positive, linear association between wolf length and wolf weight, with no obvious outliers."
No. Scatterplots require two quantitative variables. Categorical data gets displayed with bar graphs, mosaic plots, or two-way tables instead, which is a distinction multiple-choice questions like to test.