upgrade
upgrade
✌️AP Statistics Unit 2 Vocabulary

76 essential vocabulary terms and definitions for Unit 2 – Exploring Two–Variable Data

Study Unit 2
Practice Vocabulary
✌️Unit 2 – Exploring Two–Variable Data
Topics

✌️Unit 2 – Exploring Two–Variable Data

2.1 Introducing Statistics

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
patternsRecurring or observable regularities in data that may suggest a relationship between variables.
relationshipsConnections or associations between two or more variables in a dataset.

2.2 Representing Two Categorical Variables

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
distributionThe pattern of how data values are spread or arranged across a range.
joint relative frequencyA cell frequency in a two-way table divided by the total number of observations in the entire table, expressing the proportion of the total for a specific combination of categories.
mosaic plotsA graphical representation of two categorical variables where rectangles are sized proportionally to represent the frequency or relative frequency of each combination of categories.
segmented bar graphsA graphical representation where bars are divided into segments, with each segment representing a category of a second categorical variable, showing the composition within each category of the first variable.
side-by-side bar graphsA graphical representation that displays bars for one categorical variable grouped side-by-side for each category of another categorical variable, allowing for easy comparison between groups.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.3 Statistics for Two Categorical Variables

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
conditional relative frequencyA relative frequency for a specific part of a contingency table, such as cell frequencies in a row divided by the total for that row.
distributionThe pattern of how data values are spread or arranged across a range.
marginal relative frequenciesThe row and column totals in a two-way table divided by the total for the entire table, representing the proportion of observations in each row or column.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.4 Representing the Relationship Between Two Quantitative Variables

TermDefinition
bivariate quantitative dataA data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population.
clusterConcentrations of data usually separated by gaps in a distribution.
directionThe type of association between two variables in a scatter plot, described as positive or negative.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
formThe pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear.
linearA form of association in a scatter plot where the points follow a straight-line pattern.
negative associationA relationship between two variables where as values of one variable increase, values of the other variable tend to decrease.
non-linearA form of association in a scatter plot where the points do not follow a straight-line pattern.
outlierData points that are unusually small or large relative to the rest of the data.
positive associationA relationship between two variables where as values of one variable increase, values of the other variable tend to increase.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
scatter plotA graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane.
strengthA measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak.

2.5 Correlation

TermDefinition
causationA relationship where changes in one variable directly cause changes in another variable.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
linear modelA mathematical representation of the linear relationship between two variables.
linear relationshipA relationship between two variables that can be described by a straight line.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.

2.6 Linear Regression Models

TermDefinition
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
extrapolationPredicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
linear regression modelAn equation that uses an explanatory variable to predict a response variable in a linear relationship.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.7 Residuals

TermDefinition
actual valueThe observed or measured response value in a dataset, denoted as y.
bivariate dataData involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them.
form of associationThe pattern or type of relationship between two variables, such as linear, curved, or no relationship.
linear modelA mathematical representation of the linear relationship between two variables.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
randomness in residualsThe absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plotA scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.

2.8 Least Squares Regression

TermDefinition
coefficient of determinationThe value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
coefficientsThe numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
parameterA numerical summary that describes a characteristic of an entire population.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
sample standard deviationThe standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
simple linear regressionA regression model that describes the linear relationship between one explanatory variable and one response variable.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.9 Analyzing Departures from Linearity

TermDefinition
coefficient of determinationThe value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
high-leverage pointA point in regression that has a substantially larger or smaller x-value than other observations in the dataset.
influential pointsPoints in a regression that, when removed, substantially change the relationship between variables, such as the slope, y-intercept, or correlation.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
natural logarithmA mathematical transformation using the logarithm with base e, often applied to response or explanatory variables to linearize relationships.
outlierData points that are unusually small or large relative to the rest of the data.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plotA scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
transformed data setA dataset created by applying mathematical transformations (such as logarithms or powers) to the original variables to achieve a more linear relationship.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.