Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

Unit 2 Overview: Exploring Two-Variable Data

5 min readโ€ขdecember 29, 2022

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Athena_Codes

Athena_Codes

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Athena_Codes

Athena_Codes

Attend a live cram event

Review all units live with expert teachers & students

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-OEoWFVWaQjK1.png?alt=media&token=d7c3e3d5-b697-4273-bbbe-de77dd501bd8

image courtesy of: spotio.com

You made it to Unit 2! This part of the AP Stats curriculum focuses on analyzing relationships between two variables. This unit should help you understand how to visualize and describe relationships between two variables using , correlation, and least-squares regression.

Here, you'll learn:

  • how to create and use them to identify patterns and trends in two-variable data

  • the concept of correlation and how to calculate the , which measures the and direction of the linear relationship between two variables

  • least-squares regression, a method for finding the line of best fit for two-variable data

  • how to interpret the and of the regression line and how to use the to make predictions about one variable based on values of the other variable

  • evaluating the fit of a linear model

  • using residual plots to assess the appropriateness of using a linear model to describe the relationship between two variables

Sounds like a lot, eh? Don't worry! As usual, we'll break this unit into small chunks with examples sprinkled throughout. Are you ready? Here we go!

Exam Weighting

  • 5-7% of the test

  • Roughly 2 to 3 multiple choice questions

  • Possibly one FRQ or a portion of the investigative task

Bivariate Data

After covering single-variable statistics, itโ€™s time to increase the complexity a little bit with two-variable statistics! Just like the differences on univariate data, we also have two different types of bivariate data that we may encounter: categorical and quantitative.

Categorical

With , we can use two way tables to represent the relationship between two different categories of . A common example for bivariate categorical data may be something like measuring a student's class level (freshman, sophomore, junior, and senior) along with their learning style preference for 2020-2021 (virtual or traditional). A statistician could take these numbers and see if there was a correlation between class level and learning choice.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-giQuzTU2j3zt.svg?alt=media&token=7a6fc7af-7f23-4948-9d34-fd3032bb625f

Source: Math Leaks

Quantitative

With , we can show the relationship between these using . We will also see whether there is a relationship between two variables in both situations. This will link to later units as well.

Since every individual has two quantities assigned to them, one of these quantities will be plotted on the x-axis as the independent variable, while the other variable is plotted on the y-axis as the dependent variable. After creating that scatterplot, we can form a trend of our data points using various models, primarily in AP Statistics. This means that we will fit a line to our points on the scatterplot so that we can make predictions about other x-values within the range of our model.

For example, we may look at someone's height in inches along the x-axis and their shoe size along the y-axis. In this particular example, one would expect to see a positive correlation, because as height increases, so would shoe size.

Computer Outputs

On the AP exam, it is highly unlikely that you will be asked to create a scatterplot, a two way table, or a linear regression model. Instead, the question will generally provide our models via computer outputs or printouts that students are required to be able to interpret. The most important part of this unit is being able to identify the important aspects of a model and interpret what they mean.

Just like in unit 1, being able to interpret your data in context of the problem is the biggest skill you will be tested on. This includes the aspects of categorical models (like two way tables) along with the different aspects of a linear regression model like , y-, , and correlation of determination.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-BRBPRkGnxE9h.png?alt=media&token=d09338b8-77fa-41f0-8439-d9a07d8cca85

Source: Stats Medic

Mathematical Practices

Three of the College Board's mathematical practices for AP Statistics are used in this unit, which will be outlined below.

1. Selecting Statistical Methods

This is useful when we decide whether we want to use and the type to use, or to use inference techniques learned later on. As with the rest of AP Statistics, it is vital that students know whether a problem is employing quantitive data methods or prior to proceeding with any statistical methods.

2.

Using , weโ€™ll figure out how to figure out different statistics from two-variable data sets and also find ways to model with them and draw conclusions.

3.

In this unit, we will learn to argue about the of how much variables are related to each other, and also the most important sentence of this unit: ! For instance, if I gather the amount of rain everyday of the week for a year and find that the rain total on Tuesdays is quite a bit higher than Mondays, does this mean that the day of the week causes it to rain more? Obviously not! In this instance, the two variables (day of the week and rain totals) are correlated, but are not causing one another.

Main Ideas for this Unit

    • Two Way Tables

      • Joint Relative Frequencies

    • Correlation

      • Form

      • Direction

      • Unusual features (gaps, clusters, outliers)

    • Linear Regression (Least Squares Regression)

    • r, R^2. and s

    • Influential Points

    • Transforming Data Sets

๐ŸŽฅ Watch: AP Stats Unit 2

Key Terms to Review (26)

Bivariate Data

: Bivariate data refers to a set of data that involves two variables, where each observation has a pair of values. It allows for the analysis and comparison of how the two variables are related or correlated.

Categorical Data Methods

: Categorical data methods are techniques used to analyze qualitative or categorical data. These methods focus on organizing, summarizing, and interpreting non-numerical information.

Categorical Variables

: Categorical variables represent characteristics or qualities that fall into distinct categories or groups. They cannot be measured numerically but instead provide information about group membership or classification.

Conditional Relative Frequencies

: Conditional relative frequencies refer to the proportion or percentage of one category given another category. It shows how likely an event is based on a specific condition.

Correlation Coefficient

: The correlation coefficient is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative relationship, 0 indicates no relationship, and 1 indicates a perfect positive relationship.

Correlation Does Not Imply Causation

: Correlation does not imply causation means that just because two variables are correlated does not necessarily mean that one variable causes changes in another variable.

Data Analysis

: Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making.

Extrapolation

: Extrapolation refers to using a mathematical model, such as linear regression, to estimate or predict values outside of an observed range based on patterns within that range. It assumes that trends observed within known data will continue beyond those limits.

Intercept

: The intercept is the value on the y-axis where a line crosses when x equals zero. It represents an initial starting point before any other factors come into play.

Linear Regression (Least Squares Regression)

: Linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data points. It helps us understand how changes in one variable are associated with changes in another variable.

Linear Regression Models

: Linear regression models are statistical models that examine the relationship between a dependent variable and one or more independent variables. They use a straight line to represent this relationship and can be used to make predictions or understand the impact of changes in the independent variables on the dependent variable.

Marginal Relative Frequencies

: Marginal relative frequencies refer to the proportion or percentage of a specific category in relation to the total number of observations. It shows the distribution of one categorical variable independently.

Mosaic plots

: Mosaic plots are graphical displays used to visualize associations between two or more categorical variables in contingency tables.

Quantitative Data Methods

: Quantitative data methods are techniques used to analyze numerical data. These methods focus on understanding patterns, trends, and relationships within datasets that consist of numbers.

Quantitative Variables

: Quantitative variables are numerical measurements that represent quantities or amounts.

r, R^2, and s

: r, R^2, and s are statistical measures used to assess the strength and direction of a linear relationship between two variables.

Regression Equation

: A regression equation is a mathematical formula that represents the relationship between a dependent variable and one or more independent variables in a statistical model.

Residuals

: Residuals are the differences between observed values and predicted values in a regression analysis. They represent the vertical distances between data points and the least-squares regression line.

Scatterplots

: Scatterplots are graphs that display the relationship between two quantitative variables. Each point on the graph represents a pair of values, one for each variable.

Segmented bar graphs

: Segmented bar graphs are visual representations of categorical data where each category is divided into segments that represent the proportion or frequency of a subcategory within that category.

Side-by-side bar graphs

: Side-by-side bar graphs are used to compare distributions between different groups or categories. They display multiple bars next to each other, with each bar representing a different group/category.

Slope

: Slope represents how steep or flat a line is. In statistics, it specifically refers to how much one variable changes for every unit change in another variable.

Statistical Argumentation

: Statistical argumentation refers to using statistical evidence to support or refute claims or hypotheses in an objective manner.

Strength

: Strength refers to the degree of association or relationship between two variables. It measures how closely the data points in a scatterplot cluster around a straight line.

Two-Variable Statistics Methods

: Two-variable statistics methods are techniques used to analyze the relationship between two variables in a data set. These methods help us understand how changes in one variable affect another.

Unusual features (gaps, clusters, outliers)

: Unusual features refer to patterns or characteristics in data that deviate significantly from what is expected based on typical behavior. These features can include gaps, clusters, or outliers.

Unit 2 Overview: Exploring Two-Variable Data

5 min readโ€ขdecember 29, 2022

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Athena_Codes

Athena_Codes

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Athena_Codes

Athena_Codes

Attend a live cram event

Review all units live with expert teachers & students

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-OEoWFVWaQjK1.png?alt=media&token=d7c3e3d5-b697-4273-bbbe-de77dd501bd8

image courtesy of: spotio.com

You made it to Unit 2! This part of the AP Stats curriculum focuses on analyzing relationships between two variables. This unit should help you understand how to visualize and describe relationships between two variables using , correlation, and least-squares regression.

Here, you'll learn:

  • how to create and use them to identify patterns and trends in two-variable data

  • the concept of correlation and how to calculate the , which measures the and direction of the linear relationship between two variables

  • least-squares regression, a method for finding the line of best fit for two-variable data

  • how to interpret the and of the regression line and how to use the to make predictions about one variable based on values of the other variable

  • evaluating the fit of a linear model

  • using residual plots to assess the appropriateness of using a linear model to describe the relationship between two variables

Sounds like a lot, eh? Don't worry! As usual, we'll break this unit into small chunks with examples sprinkled throughout. Are you ready? Here we go!

Exam Weighting

  • 5-7% of the test

  • Roughly 2 to 3 multiple choice questions

  • Possibly one FRQ or a portion of the investigative task

Bivariate Data

After covering single-variable statistics, itโ€™s time to increase the complexity a little bit with two-variable statistics! Just like the differences on univariate data, we also have two different types of bivariate data that we may encounter: categorical and quantitative.

Categorical

With , we can use two way tables to represent the relationship between two different categories of . A common example for bivariate categorical data may be something like measuring a student's class level (freshman, sophomore, junior, and senior) along with their learning style preference for 2020-2021 (virtual or traditional). A statistician could take these numbers and see if there was a correlation between class level and learning choice.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-giQuzTU2j3zt.svg?alt=media&token=7a6fc7af-7f23-4948-9d34-fd3032bb625f

Source: Math Leaks

Quantitative

With , we can show the relationship between these using . We will also see whether there is a relationship between two variables in both situations. This will link to later units as well.

Since every individual has two quantities assigned to them, one of these quantities will be plotted on the x-axis as the independent variable, while the other variable is plotted on the y-axis as the dependent variable. After creating that scatterplot, we can form a trend of our data points using various models, primarily in AP Statistics. This means that we will fit a line to our points on the scatterplot so that we can make predictions about other x-values within the range of our model.

For example, we may look at someone's height in inches along the x-axis and their shoe size along the y-axis. In this particular example, one would expect to see a positive correlation, because as height increases, so would shoe size.

Computer Outputs

On the AP exam, it is highly unlikely that you will be asked to create a scatterplot, a two way table, or a linear regression model. Instead, the question will generally provide our models via computer outputs or printouts that students are required to be able to interpret. The most important part of this unit is being able to identify the important aspects of a model and interpret what they mean.

Just like in unit 1, being able to interpret your data in context of the problem is the biggest skill you will be tested on. This includes the aspects of categorical models (like two way tables) along with the different aspects of a linear regression model like , y-, , and correlation of determination.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-BRBPRkGnxE9h.png?alt=media&token=d09338b8-77fa-41f0-8439-d9a07d8cca85

Source: Stats Medic

Mathematical Practices

Three of the College Board's mathematical practices for AP Statistics are used in this unit, which will be outlined below.

1. Selecting Statistical Methods

This is useful when we decide whether we want to use and the type to use, or to use inference techniques learned later on. As with the rest of AP Statistics, it is vital that students know whether a problem is employing quantitive data methods or prior to proceeding with any statistical methods.

2.

Using , weโ€™ll figure out how to figure out different statistics from two-variable data sets and also find ways to model with them and draw conclusions.

3.

In this unit, we will learn to argue about the of how much variables are related to each other, and also the most important sentence of this unit: ! For instance, if I gather the amount of rain everyday of the week for a year and find that the rain total on Tuesdays is quite a bit higher than Mondays, does this mean that the day of the week causes it to rain more? Obviously not! In this instance, the two variables (day of the week and rain totals) are correlated, but are not causing one another.

Main Ideas for this Unit

    • Two Way Tables

      • Joint Relative Frequencies

    • Correlation

      • Form

      • Direction

      • Unusual features (gaps, clusters, outliers)

    • Linear Regression (Least Squares Regression)

    • r, R^2. and s

    • Influential Points

    • Transforming Data Sets

๐ŸŽฅ Watch: AP Stats Unit 2

Key Terms to Review (26)

Bivariate Data

: Bivariate data refers to a set of data that involves two variables, where each observation has a pair of values. It allows for the analysis and comparison of how the two variables are related or correlated.

Categorical Data Methods

: Categorical data methods are techniques used to analyze qualitative or categorical data. These methods focus on organizing, summarizing, and interpreting non-numerical information.

Categorical Variables

: Categorical variables represent characteristics or qualities that fall into distinct categories or groups. They cannot be measured numerically but instead provide information about group membership or classification.

Conditional Relative Frequencies

: Conditional relative frequencies refer to the proportion or percentage of one category given another category. It shows how likely an event is based on a specific condition.

Correlation Coefficient

: The correlation coefficient is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative relationship, 0 indicates no relationship, and 1 indicates a perfect positive relationship.

Correlation Does Not Imply Causation

: Correlation does not imply causation means that just because two variables are correlated does not necessarily mean that one variable causes changes in another variable.

Data Analysis

: Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making.

Extrapolation

: Extrapolation refers to using a mathematical model, such as linear regression, to estimate or predict values outside of an observed range based on patterns within that range. It assumes that trends observed within known data will continue beyond those limits.

Intercept

: The intercept is the value on the y-axis where a line crosses when x equals zero. It represents an initial starting point before any other factors come into play.

Linear Regression (Least Squares Regression)

: Linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data points. It helps us understand how changes in one variable are associated with changes in another variable.

Linear Regression Models

: Linear regression models are statistical models that examine the relationship between a dependent variable and one or more independent variables. They use a straight line to represent this relationship and can be used to make predictions or understand the impact of changes in the independent variables on the dependent variable.

Marginal Relative Frequencies

: Marginal relative frequencies refer to the proportion or percentage of a specific category in relation to the total number of observations. It shows the distribution of one categorical variable independently.

Mosaic plots

: Mosaic plots are graphical displays used to visualize associations between two or more categorical variables in contingency tables.

Quantitative Data Methods

: Quantitative data methods are techniques used to analyze numerical data. These methods focus on understanding patterns, trends, and relationships within datasets that consist of numbers.

Quantitative Variables

: Quantitative variables are numerical measurements that represent quantities or amounts.

r, R^2, and s

: r, R^2, and s are statistical measures used to assess the strength and direction of a linear relationship between two variables.

Regression Equation

: A regression equation is a mathematical formula that represents the relationship between a dependent variable and one or more independent variables in a statistical model.

Residuals

: Residuals are the differences between observed values and predicted values in a regression analysis. They represent the vertical distances between data points and the least-squares regression line.

Scatterplots

: Scatterplots are graphs that display the relationship between two quantitative variables. Each point on the graph represents a pair of values, one for each variable.

Segmented bar graphs

: Segmented bar graphs are visual representations of categorical data where each category is divided into segments that represent the proportion or frequency of a subcategory within that category.

Side-by-side bar graphs

: Side-by-side bar graphs are used to compare distributions between different groups or categories. They display multiple bars next to each other, with each bar representing a different group/category.

Slope

: Slope represents how steep or flat a line is. In statistics, it specifically refers to how much one variable changes for every unit change in another variable.

Statistical Argumentation

: Statistical argumentation refers to using statistical evidence to support or refute claims or hypotheses in an objective manner.

Strength

: Strength refers to the degree of association or relationship between two variables. It measures how closely the data points in a scatterplot cluster around a straight line.

Two-Variable Statistics Methods

: Two-variable statistics methods are techniques used to analyze the relationship between two variables in a data set. These methods help us understand how changes in one variable affect another.

Unusual features (gaps, clusters, outliers)

: Unusual features refer to patterns or characteristics in data that deviate significantly from what is expected based on typical behavior. These features can include gaps, clusters, or outliers.


ยฉ 2024 Fiveable Inc. All rights reserved.

APยฎ and SATยฎ are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


ยฉ 2024 Fiveable Inc. All rights reserved.

APยฎ and SATยฎ are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.