Fiveable

📊AP Statistics Unit 2 Review

QR code for AP Statistics practice questions

2.4 Representing the Relationship Between Two Quantitative Variables

📊AP Statistics
Unit 2 Review

2.4 Representing the Relationship Between Two Quantitative Variables

Written by the Fiveable Content Team • Last updated September 2025
Verified for the 2026 exam
Verified for the 2026 examWritten by the Fiveable Content Team • Last updated September 2025
📊AP Statistics
Unit & Topic Study Guides
Pep mascot

In a bivariate quantitative data set, we often have two sets of quantitative data that are related or dependent in some way. One of the variables, referred to as the "independent" or "explanatory" (x) variable, is thought to have an effect on the other variable, which is referred to as the "dependent" or "response" (y) variable. The explanatory variable is often used to explain or predict the value of the response variable. 

For example, in a study examining the relationship between age and blood pressure, age might be the explanatory variable and blood pressure the response variable. In this case, the value of the explanatory variable (age) might be used to predict the value of the response variable (blood pressure). 

What is a Scatterplot?

We can organize this data into scatterplots, which is a graph of the data. On the horizontal axis (also called the x-axis) is the explanatory variable and on the vertical axis is the response variable. The explanatory variable is also known as the independent variable, while the response variable is the dependent variable. Here are two examples below:

Pep mascot
more resources to help you study
Graph 1 Image Courtesy of: Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.

Describing Scatterplots

When given a scatterplot, we are often asked to describe it. In AP Statistics, there are four things graders are looking for when asked to describe a scatterplot or describe the correlation in a scatterplot.

Form

The form of a scatterplot refers to the general shape of the plotted points on the graph. A scatterplot may have a linear form, in which the points form a straight line, or a curved form, in which the points follow a curved pattern. The form of a scatterplot can be useful for understanding the relationship between the two variables and for identifying patterns or trends in the data.

For example, a scatterplot with a linear form might indicate a strong, positive relationship between the two variables, where an increase in one variable is associated with an increase in the other. A scatterplot with a curved form might indicate a nonlinear relationship between the two variables, such as a quadratic relationship, where the relationship between the variables is not a straight line.

In the scatterplot above, Graph 1 is best described as curved, while Graph 2 is obviously linear.

Direction

The direction of the scatterplot is the general trend that you see when going left to right. Graph 1 is decreasing as the values of the response variable tend to go down from left to right while graph 2 is increasing as the values of the response variable tend to go up from left to right.

In a linear model, the direction of the relationship between two variables is often described in terms of positive or negative correlation. Positive correlation means that as one variable increases, the other variable also tends to increase. Negative correlation means that as one variable increases, the other variable tends to decrease.

The slope of the line that fits the data can be used to determine the direction of the correlation. If the slope is positive, the correlation is positive, and if the slope is negative, the correlation is negative. 

For example, consider a linear model that shows the relationship between age and height. If the slope of the line is positive, it indicates that as age increases, height tends to increase as well. This would indicate a positive correlation between age and height. On the other hand, if the slope of the line is negative, it would indicate a negative correlation between age and height, where an increase in age is associated with a decrease in height.

Strength

The strength of a scatterplot describes how closely the points fit a certain model, and it can either be strongmoderate, or weak. How we figure this out numerically will be on the next section about correlation and the correlation coefficient. In our case, Graph 1 shows a medium strength correlation while Graph 2 shows a strong strength correlation. 🥋

Unusual Features

Lastly, we have to discuss unusual features on a scatterplot. The two types you should know are clusters and outliers, which are similar to their single-variable counterparts.

Clusters are groups of points that are close together on the scatterplot. They may indicate that there are subgroups or patterns within the data that are different from the overall trend.

Outliers are points that are far from the other points on the scatterplot and may indicate unusual or unexpected values in the data. Outliers can be caused by errors in data collection or measurement, or they may indicate a genuine difference in the population being studied.

It's important to consider unusual features on a scatterplot when analyzing the data, as they can influence the interpretation of the relationship between the two variables and the results of statistical analyses. 

Example

Describe the scatterplot in the context of the problem.

Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.

A sample answer may look like this: "In the scatterplot above, we see that it appears to follow a linear pattern. It also shows a negative correlation since the Gesell score seems to decrease as the age at first word increases. The correlation appears to be moderate, since there are some points that follow the pattern exactly, while others seem to break apart from the pattern. The data appears to have one cluster with an outlier at Child 19, because the predicted Gesell Score for Child 19 (value at line) has a large discrepancy from the actual Gesell score (value at point). Also, the data has an influential point that is a high leverage point with Child 18 because it heavily influences the negative correlation of the data set."

**Notice that this response is IN CONTEXT of the problem. This is a great way to maximize your credit on the AP Statistics exam.

Side Note: Outliers, Influential Points, and (High) Leverage Points

Source: Cambridge University Press

After going through the example problem above, the biggest question you might have in mind is: Whats the difference between outliers, influential points, and high leverage points--given that they all greatly impact scatterplot trends, correlations (to be discussed in-depth in the next section), and such?

  • An outlier is a data point that is significantly different from the rest of the data in a dataset. Outliers can have a significant impact on the results of statistical analyses and can potentially distort the overall pattern of the data.
  • An influential point is a data point that has a significant impact on the regression line or the fitted model, but it is technically not an outlier. Influential points can cause the regression line to change direction or curvature, and they can have a large influence on the slope and intercept of the line.
  • high leverage point is a data point that has a large value for one of the independent variables in a regression model. High leverage points can have a large influence on the fitted model, and they can be detected by examining the leverage values for each data point. High leverage points may or may not be outliers.

In summary, outliers are data points that are significantly different from the rest of the data, influential points are data points that have a significant impact on the fitted model, and high leverage points are data points that have a large value for one of the independent variables and can have a large influence on the fitted model.

🎥 Watch: AP Stats - Scatterplots and Association[

](https://app.fiveable.me/ap-stats/unit-2/scatterplots-association/slides/KqjaqR3xVBs7M1J6oHJN/DQvNCelFSMVC)

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

TermDefinition
bivariate quantitative dataA data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population.
clusterConcentrations of data usually separated by gaps in a distribution.
directionThe type of association between two variables in a scatter plot, described as positive or negative.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
formThe pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear.
linearA form of association in a scatter plot where the points follow a straight-line pattern.
negative associationA relationship between two variables where as values of one variable increase, values of the other variable tend to decrease.
non-linearA form of association in a scatter plot where the points do not follow a straight-line pattern.
outlierData points that are unusually small or large relative to the rest of the data.
positive associationA relationship between two variables where as values of one variable increase, values of the other variable tend to increase.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
scatter plotA graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane.
strengthA measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak.

Frequently Asked Questions

How do I make a scatterplot from two sets of data?

Pick which variable you’ll put on x (explanatory) and which on y (response). Then follow these steps: 1. Pair your data so each observation is (x, y). 2. Choose scales for the x- and y-axes that cover the data range and use equal intervals. Label axes with variable name and units. 3. For each observation, plot one point at its (x, y) coordinates. Don’t connect points. 4. After plotting, describe the scatterplot using AP language: direction (positive/negative), form (linear or nonlinear), strength (strong/moderate/weak), and any unusual features (clusters, outliers, influential points, large residuals). 5. If asked, you can fit a least-squares regression line and report slope/intercept, correlation r, and residuals (calculator allowed on the AP exam). Want guided practice and examples? Check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and try practice problems at (https://library.fiveable.me/practice/ap-statistics).

What's the difference between explanatory and response variables?

The explanatory variable (x) is the one you think helps explain or predict changes in the other variable; the response variable (y) is the outcome you measure. On a scatterplot each point gives an x (explanatory) and a y (response) value for an individual (UNC-1.S.1, UNC-1.S.2, UNC-1.S.3). Example: if you study hours studied and test score, hours is explanatory and score is response because you use hours to predict score. On the AP exam you’ll often need to label which variable is explanatory vs. response, describe direction/form/strength (DAT-1.A), or interpret a regression slope as the change in the response per one-unit increase in the explanatory. For more examples and practice with scatterplots and labeling, check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and hundreds of practice problems (https://library.fiveable.me/practice/ap-statistics).

I'm confused about positive vs negative association - can someone explain?

Positive vs. negative association just tells you the direction of the pattern in a scatterplot (CED DAT-1.A). If points trend upward left-to-right, that’s a positive association: as one quantitative variable increases, the other tends to increase (e.g., study time ↑, test score ↑). If points trend downward, that’s a negative association: as one increases, the other tends to decrease (e.g., latitude ↑, average low temp ↓). Also say whether the pattern is roughly linear or nonlinear (UNC-1.S.2, DAT-1.A.4) and note strength—how tightly points follow the pattern (strong/moderate/weak, DAT-1.A.5). On the AP exam you should describe form, direction, strength, and any unusual features (clusters/outliers). Want more examples and practice? Check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and practice problems (https://library.fiveable.me/practice/ap-statistics).

When do I put which variable on the x-axis and which on the y-axis?

Put the explanatory (predictor) variable on the x-axis and the response (outcome) variable on the y-axis. The CED calls this out: a scatterplot shows one numeric value on x and one on y, and UNC-1.S.3 defines the explanatory variable as the one used to explain or predict the response. If you’re not sure which is which, ask: “Which variable do I want to predict or explain?” put that on y. If there’s no causal story, pick the variable you’ll treat as the predictor (common in modeling). Always label axes with units and choose scales that show the form, direction, strength, and any unusual features (DAT-1.A). For AP practice, review the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and try problems at the Unit 2 page (https://library.fiveable.me/ap-statistics/unit-2) or the practice question bank (https://library.fiveable.me/practice/ap-statistics).

How do I describe the form of a scatterplot step by step?

Start by remembering the CED checklist: when you describe a scatterplot talk about form, direction, strength, and unusual features. Step-by-step: 1. Identify the variables and which is explanatory (x) and response (y)—put it in context. 2. Form: say whether the pattern is linear or nonlinear (curved, clusters, no pattern). If roughly straight, call it linear. 3. Direction: state positive (y increases with x) or negative (y decreases with x). 4. Strength: judge how tightly points follow the form—strong, moderate, or weak (tight = strong; lots of scatter = weak). 5. Unusual features: note outliers or influential points and any clusters or gaps. Mention residuals or influential point effects if relevant. 6. If needed for the exam, quantify with r or a least-squares line and give interpretation of slope/intercept (AP expects this in Topic 2.5–2.6). Practice by describing several plots; Fiveable’s Topic 2.4 study guide can help (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j). For extra practice across Unit 2 see (https://library.fiveable.me/ap-statistics/unit-2) or try problems (https://library.fiveable.me/practice/ap-statistics).

What does it mean when a scatterplot has strong vs weak association?

Strength refers to how closely the points in a scatterplot follow a clear form (often linear). - Strong association: the points lie close to a single pattern (e.g., nearly a straight line). Correlation r is near ±1, residuals are small, and predictions from a least-squares line are more accurate. - Weak association: the points are widely scattered with no clear pattern; r is near 0, residuals are large, and the line of best fit does a poor job predicting y from x. Also describe direction (positive/negative) and form (linear/nonlinear) when you report strength—AP scoring expects form, direction, strength, and unusual features (clusters/outliers) (CED DAT-1.A). For more examples and practice, check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and try practice questions (https://library.fiveable.me/practice/ap-statistics).

I don't understand what clusters and outliers are in scatterplots - help?

Think of a scatterplot as a map of all your bivariate observations (UNC-1.S). - Cluster: a group of points that lie near each other and form a visible sub-group. Clusters show that the relationship might differ for subpopulations (e.g., students vs. adults) or that data naturally fall into groups—mention clusters when you describe form, direction, strength, and unusual features (DAT-1.A.1). - Outlier: a single point that sits far away from the overall pattern. Outliers have large residuals (big difference between observed y and predicted y) and can affect correlation, the least-squares line, or be influential points that change slope/intercept noticeably (keywords: residuals, influential point, outlier). On the exam, always describe scatterplots using form (linear/nonlinear), direction, strength, and unusual features (clusters/outliers). For practice and quick review, see the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and Unit 2 overview (https://library.fiveable.me/ap-statistics/unit-2). For more practice problems go to (https://library.fiveable.me/practice/ap-statistics).

How do I know if my scatterplot shows linear or non-linear form?

Look at the pattern the points make—that tells you form. If the points cluster roughly along a straight line (same slope across the plot) the form is linear; if they bend, curve, level off, or follow a U-shape the form is nonlinear. Also check direction (positive/negative), strength (how tightly points hug a line), and unusual features (clusters or outliers) as the CED requires. A numerical check: a correlation r near ±1 indicates a strong linear pattern; r near 0 suggests no linear relationship (but could still be strongly nonlinear). Fit a least-squares line and make a residual plot—if residuals scatter randomly with no pattern, linear is appropriate; a curved pattern in residuals means the relationship is nonlinear. For more examples and practice identifying form, see the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and Unit 2 overview (https://library.fiveable.me/ap-statistics/unit-2).

What's the formula for determining if association is positive or negative?

You don’t need a fancy trick—direction (positive or negative) is just the sign of the slope or the correlation. Two formulas to check: - Correlation r: r = (1/(n−1)) Σ[(xi−x̄)/sx * (yi−ȳ)/sy]. If r > 0 the association is positive; if r < 0 it’s negative. - Least-squares slope b: b = r * (sy/sx). If b > 0 the relationship is increasing (positive); if b < 0 it’s decreasing (negative). On the AP, you’ll often just look at a scatterplot: “positive” means y tends to increase as x increases; “negative” means y tends to decrease as x increases (CED DAT-1.A.2–A.3). Want practice computing r or b from data? Check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and try problems at (https://library.fiveable.me/practice/ap-statistics).

Can someone explain bivariate quantitative data in simple terms?

Bivariate quantitative data = pairs of numbers collected from the same individuals. Example: for each student you record (hours studied, test score). Each pair is one observation. You display them with a scatterplot: one variable on the x-axis (explanatory, the one you use to predict) and the other on the y-axis (response, what you predict) (CED: UNC-1.S, UNC-1.S.2–3). When you look at a scatterplot describe: - direction (positive or negative), - form (linear or nonlinear), - strength (how close points follow a pattern: strong/moderate/weak), - unusual features (clusters, outliers, influential points) (CED: DAT-1.A.1–6). AP tip: the exam expects you to make these descriptive statements and connect them to context (e.g., “positive, moderately strong, roughly linear, one high outlier”). For a short study guide, see Fiveable’s Topic 2.4 page (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j). Practice lots of scatterplots at (https://library.fiveable.me/practice/ap-statistics).

How do I identify unusual features when describing a scatterplot?

When you describe unusual features in a scatterplot, look for clusters, gaps, outliers, and influential points—those are the items AP asks you to mention along with form, direction, and strength (CED: DAT-1.A.1). Practically: - Clusters: groups of points separated from others—note how many and where they sit. - Gaps: ranges of x with no points; mention if that might affect interpretation. - Outliers: points that don’t follow the overall pattern (big residuals from a fitted line). Say which variable is unusual and how far it is from the pattern. - Influential points: points with extreme x that noticeably change the slope/intercept of a least-squares line—say if the point would shift the regression a lot. When you can, quantify: “one outlier at x≈12, y≈80 with a large residual” or “two clusters around x≈5 and x≈20.” To practice identifying these, check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and try problems at (https://library.fiveable.me/practice/ap-statistics).

What are the four things I need to describe about every scatterplot for the AP exam?

Every scatterplot on the AP exam should be described with these four things (CED DAT-1.A.1): - Direction—positive or negative (as x increases, does y tend to increase or decrease?). - Form—linear vs. nonlinear (is a straight-line model reasonable?). - Strength—how closely points follow the form (strong, moderate, weak). Use words like “tight” or “widely scattered.” - Unusual features—outliers, clusters, or influential points (mention any points that don’t fit the pattern and how they might affect a line of best fit). Always state these in context (name the variables, e.g., “as study time increases, test score tends to increase—positive, roughly linear, moderate strength, one high outlier”). The AP rubric expects those four components (Topic 2.4). For a quick refresher, see the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j). More unit review and lots of practice problems are at (https://library.fiveable.me/ap-statistics/unit-2) and (https://library.fiveable.me/practice/ap-statistics).

I keep mixing up which variable goes where - is there a trick to remember?

Short trick: ask “which variable do I use to explain or predict the other?” Put the explanatory (independent) variable on the x-axis and the response (dependent) variable on the y-axis. The CED calls these explanatory and response (UNC-1.S.3); a scatterplot shows one numeric value on x and one on y (UNC-1.S.2). Quick heuristics that work on AP tasks: - If the problem asks “how does A affect B?” A → x (explanatory), B → y (response). - Time or dose usually goes on x; outcomes (growth, score, temp) go on y. - If nothing predicts the other (observational association only), you can still pick one—be consistent and state which you chose when interpreting slope or residuals. Why it matters: slope and predictions use “change in y for a one-unit increase in x,” so putting variables correctly keeps interpretations correct (Topic 2.4 → Topic 2.6). For more examples and practice, see the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and Unit 2 resources (https://library.fiveable.me/ap-statistics/unit-2). You can drill this with lots of practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I tell the difference between moderate and strong association?

Think of strength as how tightly the points hug a clear form (usually a line). For a strong association the points lie very close to a straight line (few vertical scatter, little fuzz), for a moderate association they follow the line but with noticeable scatter, and for a weak association the points are widely spread with only a vague trend. Also watch for outliers or clusters that can change your judgment. You can use r (correlation) and R² as quick, approximate guides—AP doesn’t require exact cutoffs, but many teachers use these rough rules of thumb: |r| < 0.3 ≈ weak, 0.3–0.6 ≈ moderate, > 0.6 ≈ strong. Remember: context, form (linear vs nonlinear), and unusual features matter—a high |r| with a clear non-linear pattern is misleading. On the exam describe direction, form, strength, and any unusual features (CED DAT-1.A). For more practice and examples, see the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and more AP practice problems (https://library.fiveable.me/practice/ap-statistics).

When I describe direction, do I say "as x increases, y increases" or something else?

Short answer: yes—but say it in context and use “positive/negative” language from the CED. For example: “As the explanatory variable x increases, the response y tends to increase” (positive association) or “As x increases, y tends to decrease” (negative association). Tips that AP graders like: - Name which variable is x (explanatory) and which is y (response)—DAT-1.A.3 and UNC-1.S.3. e.g., “As study time (x) increases, test score (y) tends to increase (positive association).” - Use “tends to” or “on average” rather than implying a perfect rule. - Always state direction (positive/negative), form (linear/nonlinear), strength (strong/moderate/weak), and any unusual features when describing scatterplots—DAT-1.A.1–A.5. For a quick refresher, check the Topic 2.4 study guide (https://library.fiveable.me/ap-statistics/unit-2/representing-relationship-between-two-quantitative-variables/study-guide/3rWWsKXcnbYlqY64hQ1j) and practice problems (https://library.fiveable.me/practice/ap-statistics).