| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| patterns | Recurring or observable regularities in data that may suggest a relationship between variables. |
| relationships | Connections or associations between two or more variables in a dataset. |
| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| joint relative frequency | A cell frequency in a two-way table divided by the total number of observations in the entire table, expressing the proportion of the total for a specific combination of categories. |
| mosaic plots | A graphical representation of two categorical variables where rectangles are sized proportionally to represent the frequency or relative frequency of each combination of categories. |
| segmented bar graphs | A graphical representation where bars are divided into segments, with each segment representing a category of a second categorical variable, showing the composition within each category of the first variable. |
| side-by-side bar graphs | A graphical representation that displays bars for one categorical variable grouped side-by-side for each category of another categorical variable, allowing for easy comparison between groups. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| conditional relative frequency | A relative frequency for a specific part of a contingency table, such as cell frequencies in a row divided by the total for that row. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| marginal relative frequencies | The row and column totals in a two-way table divided by the total for the entire table, representing the proportion of observations in each row or column. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| bivariate quantitative data | A data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| direction | The type of association between two variables in a scatter plot, described as positive or negative. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| form | The pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear. |
| linear | A form of association in a scatter plot where the points follow a straight-line pattern. |
| negative association | A relationship between two variables where as values of one variable increase, values of the other variable tend to decrease. |
| non-linear | A form of association in a scatter plot where the points do not follow a straight-line pattern. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| positive association | A relationship between two variables where as values of one variable increase, values of the other variable tend to increase. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| scatter plot | A graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane. |
| strength | A measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak. |
| Term | Definition |
|---|---|
| causation | A relationship where changes in one variable directly cause changes in another variable. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| linear model | A mathematical representation of the linear relationship between two variables. |
| linear relationship | A relationship between two variables that can be described by a straight line. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| Term | Definition |
|---|---|
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| extrapolation | Predicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| linear regression model | An equation that uses an explanatory variable to predict a response variable in a linear relationship. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
| Term | Definition |
|---|---|
| actual value | The observed or measured response value in a dataset, denoted as y. |
| bivariate data | Data involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them. |
| form of association | The pattern or type of relationship between two variables, such as linear, curved, or no relationship. |
| linear model | A mathematical representation of the linear relationship between two variables. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| randomness in residuals | The absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| residual plot | A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model. |
| Term | Definition |
|---|---|
| coefficient of determination | The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model. |
| coefficients | The numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| sample standard deviation | The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²). |
| simple linear regression | A regression model that describes the linear relationship between one explanatory variable and one response variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
| Term | Definition |
|---|---|
| coefficient of determination | The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| high-leverage point | A point in regression that has a substantially larger or smaller x-value than other observations in the dataset. |
| influential points | Points in a regression that, when removed, substantially change the relationship between variables, such as the slope, y-intercept, or correlation. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| natural logarithm | A mathematical transformation using the logarithm with base e, often applied to response or explanatory variables to linearize relationships. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| residual plot | A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| transformed data set | A dataset created by applying mathematical transformations (such as logarithms or powers) to the original variables to achieve a more linear relationship. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |