| Term | Definition |
|---|---|
| approximately normally distributed | A description of data sets that closely follow the pattern of a normal distribution with a mound-shaped, symmetric curve. |
| empirical rule | A rule stating that for a normal distribution, approximately 68% of observations fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| normal curve | The bell-shaped graph of a normal distribution that is symmetric and mound-shaped. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| normally distributed random variable | A random variable that follows a normal distribution, allowing for the calculation of probabilities for specific intervals. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| percentile | A value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population standard deviation | A measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| relative position | The location of a data point within a data set, often expressed in comparison to other values or as a measure of how it ranks relative to the distribution. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| standard normal table | A reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution. |
| z-score | A standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean. |
| Term | Definition |
|---|---|
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| variable | A characteristic that changes from one individual to another in a set of data. |
| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| frequency table | A table that displays the number of cases or observations falling into each category. |
| percentage | A proportion expressed as a number out of 100, calculated by multiplying the relative frequency by 100. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| rate | A ratio that compares two quantities with different units, often used to express frequency or occurrence per unit. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| relative frequency table | A table that displays the proportion or percentage of cases falling into each category. |
| Term | Definition |
|---|---|
| bar chart | A graph that displays frequencies or relative frequencies for categorical data using rectangular bars, where the height or length represents the count or proportion in each category. |
| bar graph | A graphical representation using rectangular bars to display the frequency or count of categories in a categorical variable. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| frequencies | The count or number of observations falling within each category of categorical data. |
| frequency table | A table that displays the number of cases or observations falling into each category. |
| graphical representations | Visual displays such as bar charts, pie charts, or other graphs used to present data in a visual format. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| Term | Definition |
|---|---|
| continuous variable | A variable that can take on infinitely many values that cannot be counted, with infinitely many possible values between any two given values. |
| cumulative graph | A graph that represents the number or proportion of a data set that is less than or equal to a given number. |
| discrete variable | A variable that can take on a countable number of values, which may be finite or countably infinite. |
| dotplot | A graph that represents each observation as a dot, with position on the horizontal axis corresponding to the data value, with nearly identical values stacked vertically. |
| histogram | A graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance. |
| interval | A range of values between two boundaries, used to represent a set of outcomes in a normal distribution. |
| leaf | In a stem and leaf plot, usually the last digit of a data value. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| stem | In a stem and leaf plot, the first digit or digits of a data value. |
| stem and leaf plot | A graphical representation where each data value is split into a stem (first digit or digits) and a leaf (usually the last digit). |
| Term | Definition |
|---|---|
| bimodal | A distribution with two prominent peaks. |
| center | A measure indicating the middle or typical value of a distribution. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| descriptive statistics | Methods used to summarize and describe the characteristics of a data set without making inferences about a larger population. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| gap | Regions of a distribution between two data values where there are no observed data. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| quantitative data | Data that consists of numerical values that can be measured and analyzed mathematically. |
| shape | The overall form or pattern of a distribution, including characteristics like skewness and modality. |
| skewed left | A distribution with a longer tail extending to the left, where the mean is typically less than the median. |
| skewed right | A distribution with a longer tail extending to the right, where the mean is typically greater than the median. |
| symmetric | A distribution where the left half is the mirror image of the right half. |
| uniform | A distribution where each bar height is approximately the same with no prominent peaks. |
| unimodal | A distribution with one main peak. |
| variability | The spread or dispersion of data values in a distribution. |
| Term | Definition |
|---|---|
| first quartile | The median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls. |
| interquartile range | A measure of variability calculated as the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of data. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| measures of center | Numerical summaries that describe the central tendency of a data set, including the mean and median. |
| measures of position | Numerical summaries that describe the location of data values within a distribution, including quartiles and percentiles. |
| measures of variability | Statistical measures that describe how spread out or dispersed data values are in a distribution. |
| median | The middle value when data are ordered; for an even number of data points, typically the average of the two middle values. |
| nonresistant | A characteristic of a statistic that is significantly affected or influenced by outliers; also called non-robust. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| percentile | A value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution. |
| Q1 | The first quartile; the value below which 25% of the data falls. |
| Q3 | The third quartile; the value below which 75% of the data falls. |
| quartile | A value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values. |
| range | A measure of variability calculated as the difference between the maximum and minimum data values in a dataset. |
| resistant | A characteristic of a statistic that is not greatly affected by outliers; also called robust. |
| sample standard deviation | The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²). |
| sample variance | The square of the sample standard deviation, denoted by s², representing variability in squared units. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| statistic | Numerical summaries or measures calculated from sample data, such as mean, median, or standard deviation. |
| third quartile | The median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls. |
| Term | Definition |
|---|---|
| boxplot | A graphical representation of the five-number summary showing the distribution of data through a box and whiskers. |
| first quartile | The median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls. |
| five-number summary | A set of five values that describe a dataset: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. |
| maximum | The largest value in a dataset. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| median | The middle value when data are ordered; for an even number of data points, typically the average of the two middle values. |
| minimum | The smallest value in a dataset. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| quantitative data | Data that consists of numerical values that can be measured and analyzed mathematically. |
| quartile | A value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values. |
| skewed left | A distribution with a longer tail extending to the left, where the mean is typically less than the median. |
| skewed right | A distribution with a longer tail extending to the right, where the mean is typically greater than the median. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| symmetric distribution | A distribution where data is evenly distributed around the center, with the mean and median approximately equal. |
| third quartile | The median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls. |
| whiskers | Lines extending from the ends of a boxplot that reach to the most extreme data points that are not outliers. |
| Term | Definition |
|---|---|
| center | A measure indicating the middle or typical value of a distribution. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| gap | Regions of a distribution between two data values where there are no observed data. |
| graphical representations | Visual displays such as bar charts, pie charts, or other graphs used to present data in a visual format. |
| histogram | A graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance. |
| independent samples | Two or more separate groups of data where the values in one group do not influence or depend on the values in another group. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| side-by-side boxplots | A graphical representation that displays multiple boxplots arranged next to each other to compare the distributions of different groups or samples. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| variability | The spread or dispersion of data values in a distribution. |
| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| patterns | Recurring or observable regularities in data that may suggest a relationship between variables. |
| relationships | Connections or associations between two or more variables in a dataset. |
| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| joint relative frequency | A cell frequency in a two-way table divided by the total number of observations in the entire table, expressing the proportion of the total for a specific combination of categories. |
| mosaic plots | A graphical representation of two categorical variables where rectangles are sized proportionally to represent the frequency or relative frequency of each combination of categories. |
| segmented bar graphs | A graphical representation where bars are divided into segments, with each segment representing a category of a second categorical variable, showing the composition within each category of the first variable. |
| side-by-side bar graphs | A graphical representation that displays bars for one categorical variable grouped side-by-side for each category of another categorical variable, allowing for easy comparison between groups. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| conditional relative frequency | A relative frequency for a specific part of a contingency table, such as cell frequencies in a row divided by the total for that row. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| marginal relative frequencies | The row and column totals in a two-way table divided by the total for the entire table, representing the proportion of observations in each row or column. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| bivariate quantitative data | A data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| direction | The type of association between two variables in a scatter plot, described as positive or negative. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| form | The pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear. |
| linear | A form of association in a scatter plot where the points follow a straight-line pattern. |
| negative association | A relationship between two variables where as values of one variable increase, values of the other variable tend to decrease. |
| non-linear | A form of association in a scatter plot where the points do not follow a straight-line pattern. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| positive association | A relationship between two variables where as values of one variable increase, values of the other variable tend to increase. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| scatter plot | A graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane. |
| strength | A measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak. |
| Term | Definition |
|---|---|
| causation | A relationship where changes in one variable directly cause changes in another variable. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| linear model | A mathematical representation of the linear relationship between two variables. |
| linear relationship | A relationship between two variables that can be described by a straight line. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| Term | Definition |
|---|---|
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| extrapolation | Predicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| linear regression model | An equation that uses an explanatory variable to predict a response variable in a linear relationship. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
| Term | Definition |
|---|---|
| actual value | The observed or measured response value in a dataset, denoted as y. |
| bivariate data | Data involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them. |
| form of association | The pattern or type of relationship between two variables, such as linear, curved, or no relationship. |
| linear model | A mathematical representation of the linear relationship between two variables. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| randomness in residuals | The absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| residual plot | A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model. |
| Term | Definition |
|---|---|
| coefficient of determination | The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model. |
| coefficients | The numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| sample standard deviation | The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²). |
| simple linear regression | A regression model that describes the linear relationship between one explanatory variable and one response variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
| Term | Definition |
|---|---|
| coefficient of determination | The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model. |
| correlation | A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1. |
| high-leverage point | A point in regression that has a substantially larger or smaller x-value than other observations in the dataset. |
| influential points | Points in a regression that, when removed, substantially change the relationship between variables, such as the slope, y-intercept, or correlation. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| natural logarithm | A mathematical transformation using the logarithm with base e, often applied to response or explanatory variables to linearize relationships. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| residual plot | A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| transformed data set | A dataset created by applying mathematical transformations (such as logarithms or powers) to the original variables to achieve a more linear relationship. |
| y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
| Term | Definition |
|---|---|
| chance | Randomness or probability-based selection used in data collection to reduce bias and ensure representativeness. |
| data collection methods | The procedures and techniques used to gather information or data from a population or sample. |
| Term | Definition |
|---|---|
| causal relationships | A relationship between variables where one variable directly causes changes in another variable. |
| experiment | A study in which different conditions or treatments are assigned to experimental units to investigate cause-and-effect relationships. |
| experimental unit | The participants or subjects to which treatments are assigned in an experiment. |
| generalizations | Conclusions or statements about a larger population based on data from a sample. |
| observational study | A study in which treatments are not imposed; investigators examine data from a sample to investigate a topic of interest about the population. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| prospective | An observational study approach where investigators follow a sample of individuals into the future, collecting data over time. |
| randomly selected | A sampling method where every member of the population has an equal chance of being chosen. |
| representative sample | A sample that accurately reflects the characteristics and composition of the population from which it was drawn. |
| retrospective | An observational study approach where investigators examine data from a sample of individuals based on past information. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| sample survey | A type of observational study that collects data from a sample to learn about the population from which the sample was taken. |
| treatment | Different conditions assigned to experimental units in an experiment. |
| variable | A characteristic that changes from one individual to another in a set of data. |
| Term | Definition |
|---|---|
| census | A data collection method that selects all items or subjects in a population. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| cluster sample | A sampling method in which a population is divided into smaller groups called clusters, and a simple random sample of clusters is selected, with data collected from all observations in the selected clusters. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| random number generator | A tool or method used to randomly select items from a population for inclusion in a simple random sample. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| sampling method | A specific procedure or technique used to select a subset of individuals from a population for data collection and analysis. |
| sampling with replacement | A sampling method in which an item selected from a population can be selected again in subsequent draws. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| strata | Separate groups within a population created by dividing it based on shared attributes or characteristics for stratified sampling. |
| stratified random sample | A sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum. |
| systematic random sample | A sampling method in which sample members are selected from a population according to a random starting point and a fixed, periodic interval. |
| Term | Definition |
|---|---|
| bias | A systematic tendency for certain responses to be favored over others in a sample, resulting in a sample that does not accurately represent the population. |
| convenience sampling | A non-random sampling method where individuals are selected based on their accessibility or ease of inclusion, introducing potential bias. |
| non-random sampling | Sampling methods that do not use chance to select individuals from the population, introducing potential for bias. |
| nonresponse bias | Bias that occurs when individuals chosen for the sample cannot provide data or refuse to respond, and these individuals differ from those who do respond. |
| question wording bias | A type of response bias caused by confusing or leading questions in a survey or data collection instrument. |
| response bias | Bias that results from problems in the data gathering instrument or process, such as confusing or leading questions. |
| self-reported responses | Data collected directly from individuals about their own characteristics, behaviors, or opinions, which may introduce response bias. |
| undercoverage bias | Bias that occurs when part of the population has a reduced chance of being included in the sample, resulting in an unrepresentative sample. |
| voluntary response bias | Bias that occurs when a sample is comprised entirely of volunteers or people who choose to participate, making the sample unrepresentative of the population. |
| Term | Definition |
|---|---|
| blocking | A technique that groups experimental units into blocks where units within each block are similar with respect to at least one blocking variable. |
| blocking variable | A variable used to group experimental units into blocks so that natural variability can be separated from differences due to that variable. |
| completely randomized design | An experimental design where treatments are assigned to experimental units completely at random to balance the effects of confounding variables. |
| confounding variable | A variable that is related to the explanatory variable and influences the response variable, potentially creating a false perception of association between them. |
| control group | A group in an experiment that receives no treatment or a standard/baseline treatment, used as a reference for comparison. |
| double-blind experiment | An experiment where neither the subjects nor the members of the research team who interact with them know which treatment a subject is receiving. |
| experimental unit | The participants or subjects to which treatments are assigned in an experiment. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| factor | An explanatory variable in an experiment whose levels are manipulated intentionally. |
| matched pairs design | A special case of a randomized block design where subjects are arranged in pairs matched on relevant factors, and each pair receives both treatments. |
| participant | Human subjects or individuals who are assigned treatments in an experiment. |
| placebo | An inactive substance given to a control group to determine if a treatment of interest has an effect. |
| placebo effect | A response to a placebo that occurs when experimental units react to receiving a treatment, even though the treatment is inactive. |
| random assignment | The process of randomly allocating experimental units to different treatment groups to ensure unbiased distribution and reduce bias. |
| randomized complete block design | An experimental design where treatments are assigned completely at random within each block to control for a blocking variable. |
| replication | The use of multiple experimental units in each treatment group to increase reliability and reduce the effect of random variation. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| single-blind experiment | An experiment where subjects do not know which treatment they are receiving, but members of the research team do, or vice versa. |
| treatment | Different conditions assigned to experimental units in an experiment. |
| treatment groups | Distinct groups in an experiment that receive different treatments or conditions being compared. |
| Term | Definition |
|---|---|
| experimental design | A structured plan for conducting an experiment that specifies how treatments will be assigned to experimental units and how data will be collected. |
| experimental unit | The participants or subjects to which treatments are assigned in an experiment. |
| Term | Definition |
|---|---|
| experimental unit | The participants or subjects to which treatments are assigned in an experiment. |
| generalize | The process of extending conclusions from an experiment conducted on a sample to a larger population. |
| random assignment | The process of randomly allocating experimental units to different treatment groups to ensure unbiased distribution and reduce bias. |
| random sampling | A method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population. |
| representative | A characteristic of a sample that accurately reflects the key features and distribution of the larger population from which it was drawn. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| statistically significant | A result indicating that an observed difference is large enough that it is unlikely to have occurred by chance alone. |
| treatment | Different conditions assigned to experimental units in an experiment. |
| Term | Definition |
|---|---|
| binomial distribution | A probability distribution that describes the number of successes in a fixed number of independent trials, each with the same probability of success. |
| binomial probability function | The formula P(X=x)=C(n,x)p^x(1-p)^(n-x) that calculates the probability of exactly x successes in n independent trials with probability of success p. |
| binomial random variable | A random variable that counts the number of successes in a fixed number of repeated independent trials, where each trial has two possible outcomes. |
| independent trials | Repeated experiments or observations where the outcome of one trial does not affect the outcome of any other trial. |
| number of failures | The count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition. |
| number of successes | The count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition. |
| probability distribution | A function that describes the likelihood of all possible values of a random variable. |
| probability of success | The constant probability p that an individual trial results in a success in a binomial experiment. |
| random number generator | A tool or method used to randomly select items from a population for inclusion in a simple random sample. |
| simulation | A method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities. |
| Term | Definition |
|---|---|
| patterns in data | Observable regularities or trends that appear in a dataset, which may or may not indicate non-random behavior. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| binomial distribution | A probability distribution that describes the number of successes in a fixed number of independent trials, each with the same probability of success. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| random variable | A variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| Term | Definition |
|---|---|
| geometric distribution | A probability distribution that models the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials, each with the same probability of success. |
| geometric probability function | The formula P(X=x)=(1-p)^(x-1)p that calculates the probability that the first success occurs on trial x. |
| geometric random variable | A random variable that represents the number of the trial on which the first success occurs in a sequence of independent trials. |
| independent trials | Repeated experiments or observations where the outcome of one trial does not affect the outcome of any other trial. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| number of failures | The count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition. |
| number of successes | The count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| probability of success | The constant probability p that an individual trial results in a success in a binomial experiment. |
| random variable | A variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| Term | Definition |
|---|---|
| event | A collection of one or more outcomes from a random process. |
| law of large numbers | The principle that simulated or empirical probabilities tend to get closer to the true probability as the number of trials increases. |
| outcome | The result of a single trial of a random process. |
| random process | A process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| simulation | A method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities. |
| Term | Definition |
|---|---|
| complement of an event | The set of all outcomes in the sample space that are not in event E, denoted E' or E^C, representing 'not E'. |
| equally likely | A condition where all outcomes in a sample space have the same probability of occurring. |
| event | A collection of one or more outcomes from a random process. |
| long run | A large number of repetitions of a probability experiment where the relative frequency of an event approaches its true probability. |
| outcome | The result of a single trial of a random process. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| random process | A process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| sample space | The set of all possible non-overlapping outcomes of a random process. |
| Term | Definition |
|---|---|
| intersection | The set of outcomes that belong to both event A and event B, denoted A ∩ B. |
| joint probability | The probability that two events A and B both occur, denoted P(A ∩ B). |
| mutually exclusive | Two events that cannot occur at the same time; events with no outcomes in common. |
| Term | Definition |
|---|---|
| conditional probability | The probability that one event will occur given that another event has already occurred, denoted P(A | B). |
| joint probability | The probability that two events A and B both occur, denoted P(A ∩ B). |
| multiplication rule | A probability rule stating that P(A ∩ B) = P(A) · P(B | A), used to find the probability that two events both occur. |
| Term | Definition |
|---|---|
| addition rule | A probability rule stating that P(A ∪ B) = P(A) + P(B) - P(A ∩ B), used to find the probability of the union of two events. |
| conditional probability | The probability that one event will occur given that another event has already occurred, denoted P(A | B). |
| independent events | Events A and B are independent if knowing whether event A has occurred does not change the probability that event B will occur. |
| intersection | The set of outcomes that belong to both event A and event B, denoted A ∩ B. |
| union of events | The event that either event A or event B or both will occur, denoted P(A ∪ B). |
| Term | Definition |
|---|---|
| center | A measure indicating the middle or typical value of a distribution. |
| cumulative probability distribution | A representation (as a table or function) showing the probability that a random variable is less than or equal to each of its possible values. |
| discrete random variable | A random variable that takes on a countable number of distinct values, often representing counts or categorical outcomes. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| probability distribution | A function that describes the likelihood of all possible values of a random variable. |
| random process | A process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance. |
| random variable | A variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities. |
| shape | The overall form or pattern of a distribution, including characteristics like skewness and modality. |
| spread | A measure of how dispersed or variable the outcomes of a probability distribution are, such as range, variance, or standard deviation. |
| Term | Definition |
|---|---|
| discrete random variable | A random variable that takes on a countable number of distinct values, often representing counts or categorical outcomes. |
| expected value | The long-run average outcome of a random variable, equivalent to the mean of a discrete random variable. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| Term | Definition |
|---|---|
| independent random variables | Random variables where knowing the value or probability distribution of one does not change the probability distribution of the other. |
| linear combinations | Expressions of the form aX + bY where X and Y are random variables and a and b are real number coefficients. |
| linear transformations | Changes to a random variable of the form Y = a + bX, where a and b are constants that shift and scale the distribution. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| probability distribution | A function that describes the likelihood of all possible values of a random variable. |
| random variable | A variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| variance | A measure of the spread or dispersion of a probability distribution, denoted as σ², indicating how far values typically deviate from the mean. |
| Term | Definition |
|---|---|
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| statistic | Numerical summaries or measures calculated from sample data, such as mean, median, or standard deviation. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| area | The region under the normal distribution curve, representing the probability or proportion of values within a specified interval. |
| bell-shaped | The characteristic shape of a normal distribution, with a peak at the center and tails that extend symmetrically on both sides. |
| boundaries | The endpoints of an interval that define where a specified area or probability begins and ends in a normal distribution. |
| continuous random variable | A variable that can take on any value within a specified domain, with every interval having an associated probability. |
| inequalities | Mathematical expressions using symbols such as <, >, ≤, or ≥ to describe the relationship between a variable and the boundaries of an interval. |
| interval | A range of values between two boundaries, used to represent a set of outcomes in a normal distribution. |
| normal curve | The bell-shaped graph of a normal distribution that is symmetric and mound-shaped. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| probability approximation | Using a known distribution (such as the normal distribution) to estimate probabilities for an unknown or complex distribution. |
| standard normal table | A reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution. |
| symmetrical | A property of a distribution where the left and right sides are mirror images of each other around the center. |
| z-score | A standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean. |
| Term | Definition |
|---|---|
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| central limit theorem | A theorem stating that when the sample size is sufficiently large, the sampling distribution of the mean of a random variable will be approximately normally distributed. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| simulation | A method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities. |
| statistic | Numerical summaries or measures calculated from sample data, such as mean, median, or standard deviation. |
| Term | Definition |
|---|---|
| biased | A property of an estimator where the average value of the estimator does not equal the population parameter being estimated. |
| estimator | A statistic used to estimate or approximate the value of a population parameter based on sample data. |
| population parameter | A numerical characteristic of an entire population, such as the mean, proportion, or standard deviation. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| unbiased | A property of an estimator where the average value of the estimator equals the population parameter being estimated. |
| variability | The spread or dispersion of data values in a distribution. |
| Term | Definition |
|---|---|
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| independent samples | Two or more separate groups of data where the values in one group do not influence or depend on the values in another group. |
| mean of the sampling distribution | The expected value of a sample statistic; for sample proportions, μp̂ = p. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| sample proportion | The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂). |
| sample size condition | The requirement that np ≥ 10 and n(1-p) ≥ 10 must be satisfied for a sampling distribution of a sample proportion to be approximately normal. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling with replacement | A sampling method in which an item selected from a population can be selected again in subsequent draws. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| standard deviation of the sampling distribution | The measure of variability in a sampling distribution; for sample proportions, σp̂ = √(p(1-p)/n). |
| Term | Definition |
|---|---|
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| difference in proportions | The difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations. |
| difference in sample proportions | The difference between two sample proportions (p̂₁ - p̂₂) used to compare proportions from two different samples. |
| independent populations | Two populations from which samples are drawn such that the selection from one population does not affect the selection from the other. |
| mean of the sampling distribution | The expected value of a sample statistic; for sample proportions, μp̂ = p. |
| normality conditions | The requirements that must be met for a sampling distribution to be approximately normal, such as n₁p₁ ≥ 10, n₁(1-p₁) ≥ 10, n₂p₂ ≥ 10, and n₂(1-p₂) ≥ 10. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| sample proportion | The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂). |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling with replacement | A sampling method in which an item selected from a population can be selected again in subsequent draws. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| standard deviation of the sampling distribution | The measure of variability in a sampling distribution; for sample proportions, σp̂ = √(p(1-p)/n). |
| Term | Definition |
|---|---|
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| population distribution | The distribution of all values of a variable across the entire population. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population size | The total number of individuals or items in an entire population. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| random sampling with replacement | A sampling method where each selected item is returned to the population before the next selection, allowing the same item to be selected multiple times. |
| random sampling without replacement | A sampling method where each selected item is not returned to the population, so each item can only be selected once. |
| sample mean | The average of all values in a sample, denoted as x̄, used as an estimate of the population mean. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| Term | Definition |
|---|---|
| difference in sample means | The result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂. |
| independent populations | Two populations from which samples are drawn such that the selection from one population does not affect the selection from the other. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| population distribution | The distribution of all values of a variable across the entire population. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population standard deviation | A measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution. |
| probability | The likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1. |
| sample mean | The average of all values in a sample, denoted as x̄, used as an estimate of the population mean. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling with replacement | A sampling method in which an item selected from a population can be selected again in subsequent draws. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| difference of two population proportions | The comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| one-sided alternative hypothesis | An alternative hypothesis that specifies the direction of the difference, either p₁ < p₂ or p₁ > p₂. |
| pooled proportion | A combined estimate of the population proportion calculated from both samples when assuming the null hypothesis is true: p̂c = (n₁p̂₁ + n₂p̂₂)/(n₁ + n₂). |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| two-sample z-test | A hypothesis test used to compare the difference between two population proportions using the standard normal distribution. |
| two-sided alternative hypothesis | An alternative hypothesis that specifies the difference could be in either direction, stated as p₁ ≠ p₂. |
| Term | Definition |
|---|---|
| distribution | The pattern of how data values are spread or arranged across a range. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| difference in sample proportions | The difference between two sample proportions (p̂₁ - p̂₂) used to compare proportions from two different samples. |
| difference of two population proportions | The comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| pooled proportion | A combined estimate of the population proportion calculated from both samples when assuming the null hypothesis is true: p̂c = (n₁p̂₁ + n₂p̂₂)/(n₁ + n₂). |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| Term | Definition |
|---|---|
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| confidence level | The probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%. |
| critical value | A value from the standard normal distribution used to determine the margin of error for a given confidence level. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| number of failures | The count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition. |
| number of successes | The count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition. |
| one-sample z-interval for a proportion | A confidence interval procedure used to estimate a population proportion based on a single sample, using the standard normal (z) distribution. |
| population parameter | A numerical characteristic of an entire population, such as the mean, proportion, or standard deviation. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample proportion | The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂). |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| standard normal distribution | A normal distribution with mean 0 and standard deviation 1, used to determine critical values for confidence intervals. |
| Term | Definition |
|---|---|
| claim | A statement or assertion about a population parameter that can be evaluated using statistical evidence. |
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| confidence level | The probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| one-sample proportion | A confidence interval or hypothesis test that estimates or tests a single population proportion based on data from one sample. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| width of a confidence interval | The range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval. |
| Term | Definition |
|---|---|
| 10% condition | The requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement. |
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| number of failures | The count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition. |
| number of successes | The count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition. |
| one-sample z-test for a population proportion | A hypothesis test used to determine whether a sample proportion provides evidence that a population proportion differs from a hypothesized value. |
| one-sided alternative hypothesis | An alternative hypothesis that specifies the direction of the difference, either p₁ < p₂ or p₁ > p₂. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample proportion | The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂). |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| two-sided alternative hypothesis | An alternative hypothesis that specifies the difference could be in either direction, stated as p₁ ≠ p₂. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| one-sample proportion | A confidence interval or hypothesis test that estimates or tests a single population proportion based on data from one sample. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| probability model | A mathematical framework that describes the probability distribution of outcomes under specified assumptions. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| theoretical distribution | A probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic. |
| z-statistic | A standardized test statistic for a population proportion calculated as (sample statistic - null value) divided by the standard deviation of the statistic. |
| z-test | A hypothesis test that uses the standard normal distribution to determine whether a sample statistic differs significantly from a population parameter. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| statistical evidence | Information from sample data that supports or fails to support a hypothesis about a population parameter. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| Term | Definition |
|---|---|
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| power of a test | The probability that a statistical test will correctly reject a false null hypothesis. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| Type I error | An error that occurs when a null hypothesis is rejected when it is actually true; the probability of committing this error is equal to the significance level (α). |
| Type II error | An error that occurs when a null hypothesis is not rejected when it is actually false. |
| Term | Definition |
|---|---|
| 10% condition | The requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement. |
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| difference in proportions | The difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations. |
| difference of two population proportions | The comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample proportion | The proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂). |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| success-failure condition | A requirement that the expected number of successes and failures in each sample (np̂ and n(1-p̂)) meet a minimum threshold, typically 5 or 10, to ensure the sampling distribution is approximately normal. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| two-sample z-interval | A confidence interval procedure that uses the standard normal distribution to estimate the difference between two population proportions based on sample data. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| difference in proportions | The difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations. |
| population proportion | The true proportion or percentage of a characteristic in an entire population, typically denoted as p. |
| random sampling | A method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| Term | Definition |
|---|---|
| probabilities of errors | The likelihood or chance that errors will occur in statistical inference. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| confidence interval procedure | A statistical method used to construct an interval estimate for a population parameter based on sample data. |
| critical value | A value from the standard normal distribution used to determine the margin of error for a given confidence level. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| density curve | A graphical representation of a probability distribution showing the relative likelihood of different values. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| matched pairs | Paired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values. |
| mean difference | The average of the differences between paired observations, denoted by μd, where the order of subtraction must be clearly defined. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| one-sample t-interval | A confidence interval for a population mean constructed using the t-distribution when the population standard deviation is unknown. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population standard deviation | A measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample mean | The average of all values in a sample, denoted as x̄, used as an estimate of the population mean. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sample standard deviation | The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²). |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| skewness | A measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| tails | The extreme regions at both ends of a probability distribution's density curve where the t-distribution allocates more area than the normal distribution. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| confidence level | The probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| matched pairs | Paired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values. |
| population | The entire group of individuals or items from which a sample is drawn and about which conclusions are to be made. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| width of a confidence interval | The range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval. |
| Term | Definition |
|---|---|
| 10% condition | The requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement. |
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| conditions for the test | The requirements that must be satisfied before conducting a hypothesis test for a population mean, including independence and normality of the sampling distribution. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| matched pairs | Paired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values. |
| mean difference | The average of the differences between paired observations, denoted by μd, where the order of subtraction must be clearly defined. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| one-sample t-test | A hypothesis test used to determine whether a population mean differs from a hypothesized value when the population standard deviation is unknown. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| skewness | A measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center. |
| Term | Definition |
|---|---|
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| matched pairs | Paired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| sample mean | The average of all values in a sample, denoted as x̄, used as an estimate of the population mean. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| Term | Definition |
|---|---|
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| confidence interval procedure | A statistical method used to construct an interval estimate for a population parameter based on sample data. |
| critical value | A value from the standard normal distribution used to determine the margin of error for a given confidence level. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| difference of population means | The difference between the mean values of two distinct populations, calculated as μ₁ - μ₂. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| independent samples | Two or more separate groups of data where the values in one group do not influence or depend on the values in another group. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population standard deviations | The measure of spread in each of two populations; when unknown, sample standard deviations are used as estimates. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample mean | The average of all values in a sample, denoted as x̄, used as an estimate of the population mean. |
| sample standard deviations | The measures of variability within each of the two samples, denoted as s₁ and s₂. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| skewed distributions | Distributions that are not symmetric, with data concentrated on one side and a tail extending to the other side. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| two-sample t-interval | A confidence interval procedure used to estimate the difference between two population means using sample data from two independent samples. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| difference in sample means | The result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂. |
| difference of population means | The difference between the mean values of two distinct populations, calculated as μ₁ - μ₂. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| random sampling | A method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| width of a confidence interval | The range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| approximately normal | A distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods. |
| difference of population means | The difference between the mean values of two distinct populations, calculated as μ₁ - μ₂. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| skewness | A measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center. |
| two-sample t-test | A statistical test used to determine whether there is a significant difference between the means of two independent population samples. |
| Term | Definition |
|---|---|
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| difference in sample means | The result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂. |
| difference of population means | The difference between the mean values of two distinct populations, calculated as μ₁ - μ₂. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| statistical reasoning | The logical process of using sample data and significance test results to draw conclusions about populations and answer research questions. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| two-sample test | A significance test used to compare the means of two different populations based on sample data from each population. |
| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| chi-square distributions | Probability distributions used to test the goodness of fit between observed and expected categorical data, characterized by positive values and right skewness. |
| chi-square statistic | A test statistic that measures the distance between observed and expected counts relative to the expected counts. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| distribution of proportions | The way in which proportions are spread across the categories of a categorical variable. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| goodness of fit | A statistical test that determines how well observed data match the expected distribution specified by a hypothesis. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| null proportion | The hypothesized proportion for each category under the null hypothesis in a chi-square goodness of fit test. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| Term | Definition |
|---|---|
| chi-square distribution | A probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| probability model | A mathematical framework that describes the probability distribution of outcomes under specified assumptions. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| theoretical distribution | A probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic. |
| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| chi-square test for homogeneity | A statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments. |
| chi-square test for independence | A statistical test used to determine whether two categorical variables in a population are associated or independent. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| homogeneity | In a chi-square test, the condition where the distribution of a categorical variable is the same across different groups or populations. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| row and column variables | The two categorical variables displayed in a two-way table, with one variable defining the rows and the other defining the columns. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| stratified random sample | A sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| chi-square distribution | A probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics. |
| chi-square statistic | A test statistic that measures the distance between observed and expected counts relative to the expected counts. |
| chi-square test for homogeneity | A statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments. |
| chi-square test for independence | A statistical test used to determine whether two categorical variables in a population are associated or independent. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| probability model | A mathematical framework that describes the probability distribution of outcomes under specified assumptions. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| research question | The specific question about a population or populations that a statistical test is designed to answer. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| non-random variation | Variation in data points that follows a systematic or predictable pattern rather than occurring by chance. |
| scatter plots | A graph that displays the relationship between two quantitative variables, with each point representing an observation. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| critical value | A value from the standard normal distribution used to determine the margin of error for a given confidence level. |
| explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
| linearity | The condition that the true relationship between two variables follows a straight line. |
| margin of error | The amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error. |
| normality | The condition that data follows an approximately normal (bell-shaped) distribution. |
| population regression line | The true linear relationship μy = α + βx between the response and explanatory variables in the entire population. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
| sample regression line | The line ŷ = a + bx calculated from sample data that estimates the population regression line. |
| sample statistic | A numerical value calculated from sample data that is used to estimate the corresponding population parameter. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| skewed | A distribution that is not symmetric, with one tail longer or more pronounced than the other. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| standard deviation of residuals | A measure of the spread of residuals around the regression line, estimated by s = √(Σ(yi - ŷi)²/(n-2)). |
| standard deviation of x values | A measure of the spread of the x-variable values in the sample, denoted as sx in the standard error formula. |
| standard error of the slope | A measure of the variability of the slope estimate across different samples, calculated as s divided by (sx times the square root of n-1). |
| t-interval | A confidence interval procedure that uses the t-distribution, appropriate for estimating the slope of a regression model. |
| t* | The critical value from the t-distribution used to construct a confidence interval for the slope of a regression model. |
| Term | Definition |
|---|---|
| confidence interval | A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence. |
| population regression model | The true regression model for an entire population, as opposed to a sample-based regression model. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| repeated random sampling | The process of taking multiple random samples from a population, each of the same size, to understand the variability of sample statistics. |
| sample | A subset of individuals or items selected from a population for the purpose of data collection and analysis. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| width of a confidence interval | The range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| linear relationship | A relationship between two variables that can be described by a straight line. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| residual | The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| skewness | A measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| t-test for a slope | A hypothesis test used to determine whether the slope of a regression model is significantly different from zero, assessing whether there is a statistically significant linear relationship between variables. |
| Term | Definition |
|---|---|
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| population regression line | The true linear relationship μy = α + βx between the response and explanatory variables in the entire population. |
| regression model | A statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x). |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| sampling distribution | The probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| simple linear regression | A regression model that describes the linear relationship between one explanatory variable and one response variable. |
| slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
| slope of a regression model | The coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation. |
| standard error | The standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples. |
| t-distribution | A probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
#
A
B