ap-stats Unit 2 Vocabulary | Fiveable

✌️AP Statistics Unit 2 Vocabulary

76 essential vocabulary terms and definitions for Unit 2 – Exploring Two–Variable Data

Study Unit 2 →

Practice Vocabulary

✌️Unit 2 – Exploring Two–Variable Data

Topics

✌️Unit 2 – Exploring Two–Variable Data

2.1 Introducing Statistics

Term	Definition
association	The relationship between two variables where knowing the value of one variable provides information about the other variable.
patterns	Recurring or observable regularities in data that may suggest a relationship between variables.
relationships	Connections or associations between two or more variables in a dataset.

2.2 Representing Two Categorical Variables

Term	Definition
association	The relationship between two variables where knowing the value of one variable provides information about the other variable.
distribution	The pattern of how data values are spread or arranged across a range.
joint relative frequency	A cell frequency in a two-way table divided by the total number of observations in the entire table, expressing the proportion of the total for a specific combination of categories.
mosaic plots	A graphical representation of two categorical variables where rectangles are sized proportionally to represent the frequency or relative frequency of each combination of categories.
segmented bar graphs	A graphical representation where bars are divided into segments, with each segment representing a category of a second categorical variable, showing the composition within each category of the first variable.
side-by-side bar graphs	A graphical representation that displays bars for one categorical variable grouped side-by-side for each category of another categorical variable, allowing for easy comparison between groups.
two-way table	A table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.3 Statistics for Two Categorical Variables

Term	Definition
association	The relationship between two variables where knowing the value of one variable provides information about the other variable.
categorical variable	A variable that takes on values that are category names or group labels rather than numerical values.
conditional relative frequency	A relative frequency for a specific part of a contingency table, such as cell frequencies in a row divided by the total for that row.
distribution	The pattern of how data values are spread or arranged across a range.
marginal relative frequencies	The row and column totals in a two-way table divided by the total for the entire table, representing the proportion of observations in each row or column.
summary statistics	Numerical measures that describe key features of a dataset, such as center, spread, and shape.
two-way table	A table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.4 Representing the Relationship Between Two Quantitative Variables

Term	Definition
bivariate quantitative data	A data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population.
cluster	Concentrations of data usually separated by gaps in a distribution.
direction	The type of association between two variables in a scatter plot, described as positive or negative.
explanatory variable	A variable whose values are used to explain or predict corresponding values for the response variable.
form	The pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear.
linear	A form of association in a scatter plot where the points follow a straight-line pattern.
negative association	A relationship between two variables where as values of one variable increase, values of the other variable tend to decrease.
non-linear	A form of association in a scatter plot where the points do not follow a straight-line pattern.
outlier	Data points that are unusually small or large relative to the rest of the data.
positive association	A relationship between two variables where as values of one variable increase, values of the other variable tend to increase.
response variable	A variable whose values are being explained or predicted based on the explanatory variable.
scatter plot	A graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane.
strength	A measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak.

2.5 Correlation

Term	Definition
causation	A relationship where changes in one variable directly cause changes in another variable.
correlation	A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
linear model	A mathematical representation of the linear relationship between two variables.
linear relationship	A relationship between two variables that can be described by a straight line.
quantitative variable	A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.

2.6 Linear Regression Models

Term	Definition
explanatory variable	A variable whose values are used to explain or predict corresponding values for the response variable.
extrapolation	Predicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions.
least-squares regression line	A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
linear regression model	An equation that uses an explanatory variable to predict a response variable in a linear relationship.
predicted value	The estimated response value obtained from a regression model, denoted as ŷ.
response variable	A variable whose values are being explained or predicted based on the explanatory variable.
slope	The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-intercept	The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.7 Residuals

Term	Definition
actual value	The observed or measured response value in a dataset, denoted as y.
bivariate data	Data involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them.
form of association	The pattern or type of relationship between two variables, such as linear, curved, or no relationship.
linear model	A mathematical representation of the linear relationship between two variables.
predicted value	The estimated response value obtained from a regression model, denoted as ŷ.
randomness in residuals	The absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data.
residual	The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plot	A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.

2.8 Least Squares Regression

Term	Definition
coefficient of determination	The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
coefficients	The numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line.
correlation	A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
explanatory variable	A variable whose values are used to explain or predict corresponding values for the response variable.
least-squares regression line	A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
parameter	A numerical summary that describes a characteristic of an entire population.
predicted value	The estimated response value obtained from a regression model, denoted as ŷ.
residual	The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
response variable	A variable whose values are being explained or predicted based on the explanatory variable.
sample standard deviation	The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
simple linear regression	A regression model that describes the linear relationship between one explanatory variable and one response variable.
slope	The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-intercept	The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.9 Analyzing Departures from Linearity

Term	Definition
coefficient of determination	The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
correlation	A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
high-leverage point	A point in regression that has a substantially larger or smaller x-value than other observations in the dataset.
influential points	Points in a regression that, when removed, substantially change the relationship between variables, such as the slope, y-intercept, or correlation.
least-squares regression line	A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
natural logarithm	A mathematical transformation using the logarithm with base e, often applied to response or explanatory variables to linearize relationships.
outlier	Data points that are unusually small or large relative to the rest of the data.
residual	The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plot	A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.
slope	The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
transformed data set	A dataset created by applying mathematical transformations (such as logarithms or powers) to the original variables to achieve a more linear relationship.
y-intercept	The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.