| Term | Definition |
|---|---|
| non-random variation | Variation in data points that follows a systematic or predictable pattern rather than occurring by chance. |
| scatter plots | A graph that displays the relationship between two quantitative variables, with each point representing an observation. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| critical value | A value from the standard normal distribution used to determine the margin of error for a given confidence level. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| linearity | The condition that the true relationship between two variables follows a straight line. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| normality | The condition that data follows an approximately normal (bell-shaped) distribution. |
| population regression line | The true linear relationship μy = α + βx between the response and explanatory variables in the entire population. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| sample regression line | The line ŷ = a + bx calculated from sample data that estimates the population regression line. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| skewed | A distribution that is not symmetric, with one tail longer or more pronounced than the other. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| standard deviation of residuals | A measure of the spread of residuals around the regression line, estimated by s = √(Σ(yi - ŷi)²/(n-2)). |
| standard deviation of x values | A measure of the spread of the x-variable values in the sample, denoted as sx in the standard error formula. |
| standard error of the slope | A measure of the variability of the slope estimate across different samples, calculated as s divided by (sx times the square root of n-1). |
| t-interval | A confidence interval procedure that uses the t-distribution, appropriate for estimating the slope of a regression model. |
| t* | The critical value from the t-distribution used to construct a confidence interval for the slope of a regression model. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| population regression model | The true regression model for an entire population, as opposed to a sample-based regression model. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| repeated random sampling | The process of taking multiple random samples from a population, each of the same size, to understand the variability of sample statistics. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| width of a confidence interval | The range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| linear relationship | A relationship between two variables that can be described by a straight line. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| skewness | A measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| t-test for a slope | A hypothesis test used to determine whether the slope of a regression model is significantly different from zero, assessing whether there is a statistically significant linear relationship between variables. |
| Term | Definition |
|---|---|
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population regression line | The true linear relationship μy = α + βx between the response and explanatory variables in the entire population. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| simple linear regression | A regression model that describes the linear relationship between one explanatory variable and one response variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |