Two-variable data questions on the Digital SAT ask you to work with scatterplots: reading them, interpreting the relationships they show, fitting models to the data, and using those models to make predictions. This topic spans a wide range of skills, from simply describing what a scatterplot shows to choosing between linear, quadratic, and exponential models and comparing how different types of growth behave. You can expect roughly 2–4 questions on these concepts across the math section, and they appear at every difficulty level.

Reading and Interpreting Scatterplots

Scatterplots display two-variable data by plotting points on an x-y coordinate plane. Each point represents one observation, with one variable on each axis. Your first job on any scatterplot question is to understand what the axes represent and what each point means in context.

Direction of the relationship:

A positive correlation means that as x increases, y tends to increase.
A negative correlation means that as x increases, y tends to decrease.
No correlation means the points show no clear trend.

Strength of the relationship:

Strong correlation: points cluster tightly around a line or curve.
Weak correlation: points show a general trend but with lots of scatter.

Form of the relationship:

Linear: points follow a straight-line pattern.
Curved: points bend, suggesting a quadratic or exponential model might fit better.

Example: A scatterplot shows the number of employees (x-axis) and annual revenue in millions of dollars (y-axis) for 30 companies. The points rise from left to right and cluster closely around a straight line. You'd describe this as a strong, positive, linear association between number of employees and revenue.

Some questions ask you to analyze the data without making predictions. These might ask what the scatterplot shows about the relationship, whether a particular point is consistent with the trend, or what a specific data point represents. Always read the axis labels and think about context before answering.

more resources to help you study

practice questions cheatsheets

Clusters and Outliers

Look for clusters (groups of points that bunch together, possibly indicating subgroups in the data) and outliers (points that fall far from the overall pattern). The SAT might ask how removing an outlier would affect the line of best fit. An outlier that's far from the trend can pull the line toward it, so removing it would shift the slope or intercept.

Fitting a Linear Model

When the data in a scatterplot follows a roughly straight-line pattern, you fit a linear regression model. The line of best fit has the equation:

$y = mx + b$

where $m$ is the slope and $b$ is the y-intercept.

Interpreting slope in context: The slope $m$ tells you the predicted change in $y$ for each one-unit increase in $x$ . This is one of the most common question types.

Interpreting the y-intercept: The value $b$ is the predicted value of $y$ when $x = 0$ . Sometimes this makes sense in context; sometimes it doesn't (e.g., predicting revenue when a company has zero employees).

Worked Example 1

A researcher collects data on the age (in years) and value (in thousands of dollars) of 20 used cars. The line of best fit is:

$y = -1.8x + 28$

Question: What does the slope of $-1.8$ mean in this context?

Solution: For each additional year of age, the predicted value of a used car decreases by $1.8$ thousand dollars (or $\$1{,}800$ ). The negative slope reflects the negative correlation between age and value.

Question: According to the model, what is the predicted value of a car that is 5 years old?

$y = -1.8(5) + 28$ $y = -9 + 28$ $y = 19$

The predicted value is $19$ thousand dollars, or $\$19{,}000$ .

Worked Example 2

A scatterplot shows the relationship between hours of sunlight per day ( $x$ ) and the number of visitors to a park ( $y$ ). The line of best fit is $y = 45x - 120$ . A data point at $x = 10$ shows an actual value of $y = 360$ .

Question: What is the residual for this data point?

First, find the predicted value:

$y = 45(10) - 120 = 450 - 120 = 330$

Then calculate the residual:

$\text{residual} = \text{actual} - \text{predicted} = 360 - 330 = 30$

The residual is $30$ , meaning the actual number of visitors was 30 more than the model predicted. A positive residual means the point sits above the line of best fit.

Fitting Quadratic and Exponential Models

Not all scatterplots follow a straight line. When the data curves, you need a nonlinear model.

Quadratic models have the form $y = ax^2 + bx + c$ . The data will show a U-shape (opening up if $a > 0$ ) or an inverted U-shape (opening down if $a < 0$ ). A classic example: a ball's height over time follows a downward-opening parabola.

Exponential models have the form $y = a \cdot b^x$ . The data will show rapid increase (if $b > 1$ ) or rapid decrease (if $0 < b < 1$ ). A classic example: bacterial population doubling every hour.

How to Tell Which Model Fits

Look at the shape of the point cloud:

Straight line → linear
Curve that bends once (like a hill or valley) → quadratic
Curve that increases faster and faster (or decreases and levels off) → exponential

The SAT may show you a scatterplot and ask which equation best models the data, or it may give you the equation and ask you to interpret it.

Worked Example 3

A scatterplot shows the number of bacteria in a sample ( $y$ , in thousands) measured every hour ( $x$ ). The data points are approximately: $(0, 2)$ , $(1, 6)$ , $(2, 18)$ , $(3, 54)$ .

Question: Which type of model best fits this data?

Check the pattern. From $x = 0$ to $x = 1$ , $y$ triples ( $2 \to 6$ ). From $x = 1$ to $x = 2$ , $y$ triples again ( $6 \to 18$ ). From $x = 2$ to $x = 3$ , $y$ triples again ( $18 \to 54$ ). A constant multiplicative factor means this is exponential growth.

The model would be $y = 2 \cdot 3^x$ , where $2$ is the initial population (in thousands) and $3$ is the growth factor per hour.

Worked Example 4

A scatterplot shows the height of a projectile ( $y$ , in meters) at various times ( $x$ , in seconds). The points rise, reach a peak around $x = 3$ , then fall. Which model fits?

This is a parabolic shape, so a quadratic model fits best. If the equation is $y = -5x^2 + 30x + 2$ , the negative coefficient on $x^2$ confirms the downward-opening shape.

Making Predictions from Models

Once you have a model (linear, quadratic, or exponential), you can use it to make predictions by plugging in values.

Interpolation means predicting within the range of the observed data. This is generally reliable.

Extrapolation means predicting outside the observed range. This is less reliable because you're assuming the trend continues unchanged.

Worked Example 5

Using the linear model $y = -1.8x + 28$ for used car values, predict the value of a 20-year-old car:

$y = -1.8(20) + 28 = -36 + 28 = -8$

The model predicts $-8$ thousand dollars, which is negative and makes no sense. This is the danger of extrapolation: the original data probably only included cars up to about 12–15 years old, and the linear trend doesn't hold beyond that range.

The SAT tests this concept by asking you to evaluate whether a prediction is reasonable or by having you recognize when extrapolation produces unreliable results.

Interpreting Graphs of Relationships

Some questions give you a graph showing the relationship between two quantities and ask you to read and interpret it. This goes beyond scatterplots to include smooth curves representing known relationships.

For these questions, focus on:

What the axes represent (units matter)
The behavior of the graph (increasing, decreasing, constant, changing rate)
Specific values (reading coordinates from the graph)

If a graph shows temperature over time and the curve is steep between hours 2 and 4, that means temperature changed rapidly during that period. If the curve flattens out after hour 6, the temperature stabilized.

Comparing Linear and Exponential Growth

The SAT sometimes asks you to compare how linear and exponential models behave, especially over time.

Linear growth adds a constant amount each period: $y = mx + b$ . The graph is a straight line.
Exponential growth multiplies by a constant factor each period: $y = a \cdot b^x$ . The graph curves upward (when $b > 1$ ).

The critical difference: exponential growth always eventually overtakes linear growth, no matter how large the linear rate is. Early on, a linear model might produce larger values, but the exponential model will surpass it.

Worked Example 6

Account A starts with $\$500$ and gains $\$50$ per year: $y = 50x + 500$ .

Account B starts with $\$500$ and grows by $8\%$ per year: $y = 500(1.08)^x$ .

After 5 years:

Account A: $y = 50(5) + 500 = 750$
Account B: $y = 500(1.08)^5 = 500(1.469) \approx 735$

Account A is ahead. After 20 years:

Account A: $y = 50(20) + 500 = 1{,}500$
Account B: $y = 500(1.08)^{20} = 500(4.661) \approx 2{,}330$

Account B has pulled far ahead. This pattern is what the SAT is testing: exponential growth starts slow but accelerates, while linear growth stays constant.

What to Watch For on Test Day

Always read axis labels and units. Many wrong answers come from misunderstanding what the variables represent. A slope of $3.5$ means very different things depending on whether y is in dollars, thousands of dollars, or millions.
Match the shape to the model. Straight pattern → linear. U-shape or arch → quadratic. Rapid acceleration or decay → exponential. Don't force a linear model onto curved data.
Slope interpretation is the most common question type. Practice stating slope in context: "For each additional [x-unit], the predicted [y-variable] increases/decreases by [slope] [y-units]."
Watch for extrapolation traps. If a question asks for a prediction far outside the data range, the answer might be mathematically correct but contextually unreasonable. The SAT sometimes tests whether you recognize this.
Correlation does not mean causation. A scatterplot showing a strong correlation between two variables does not prove one causes the other. Only a controlled experiment can establish causation. If a question asks what you can "conclude," stick with association unless the study design supports a causal claim.

2,589 studying →