upgrade
upgrade
📊AP Statistics
Key Terms

830 essential vocabulary terms and definitions to know for your AP Statistics exam

Study AP Statistics
Practice Vocabulary
📊AP Statistics
Key Terms by Unit

👆Unit 1 – Exploring One–Variable Data

1.10 The Normal Distribution

TermDefinition
approximately normally distributedA description of data sets that closely follow the pattern of a normal distribution with a mound-shaped, symmetric curve.
empirical ruleA rule stating that for a normal distribution, approximately 68% of observations fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations.
meanThe average value of a dataset, represented by μ in the context of a population.
normal curveThe bell-shaped graph of a normal distribution that is symmetric and mound-shaped.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
normally distributed random variableA random variable that follows a normal distribution, allowing for the calculation of probabilities for specific intervals.
parameterA numerical summary that describes a characteristic of an entire population.
percentileA value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviationA measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
relative positionThe location of a data point within a data set, often expressed in comparison to other values or as a measure of how it ranks relative to the distribution.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
standard normal tableA reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution.
z-scoreA standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean.

1.2 The Language of Variation

TermDefinition
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
variableA characteristic that changes from one individual to another in a set of data.

1.3 Representing a Categorical Variable with Tables

TermDefinition
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
frequency tableA table that displays the number of cases or observations falling into each category.
percentageA proportion expressed as a number out of 100, calculated by multiplying the relative frequency by 100.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
rateA ratio that compares two quantities with different units, often used to express frequency or occurrence per unit.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
relative frequency tableA table that displays the proportion or percentage of cases falling into each category.

1.4 Representing a Categorical Variable with Graphs

TermDefinition
bar chartA graph that displays frequencies or relative frequencies for categorical data using rectangular bars, where the height or length represents the count or proportion in each category.
bar graphA graphical representation using rectangular bars to display the frequency or count of categories in a categorical variable.
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
frequenciesThe count or number of observations falling within each category of categorical data.
frequency tableA table that displays the number of cases or observations falling into each category.
graphical representationsVisual displays such as bar charts, pie charts, or other graphs used to present data in a visual format.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.

1.5 Representing a Quantitative Variable with Graphs

TermDefinition
continuous variableA variable that can take on infinitely many values that cannot be counted, with infinitely many possible values between any two given values.
cumulative graphA graph that represents the number or proportion of a data set that is less than or equal to a given number.
discrete variableA variable that can take on a countable number of values, which may be finite or countably infinite.
dotplotA graph that represents each observation as a dot, with position on the horizontal axis corresponding to the data value, with nearly identical values stacked vertically.
histogramA graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance.
intervalA range of values between two boundaries, used to represent a set of outcomes in a normal distribution.
leafIn a stem and leaf plot, usually the last digit of a data value.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
stemIn a stem and leaf plot, the first digit or digits of a data value.
stem and leaf plotA graphical representation where each data value is split into a stem (first digit or digits) and a leaf (usually the last digit).

1.6 Describing the Distribution of a Quantitative Variable

TermDefinition
bimodalA distribution with two prominent peaks.
centerA measure indicating the middle or typical value of a distribution.
clusterConcentrations of data usually separated by gaps in a distribution.
descriptive statisticsMethods used to summarize and describe the characteristics of a data set without making inferences about a larger population.
distributionThe pattern of how data values are spread or arranged across a range.
gapRegions of a distribution between two data values where there are no observed data.
outlierData points that are unusually small or large relative to the rest of the data.
quantitative dataData that consists of numerical values that can be measured and analyzed mathematically.
shapeThe overall form or pattern of a distribution, including characteristics like skewness and modality.
skewed leftA distribution with a longer tail extending to the left, where the mean is typically less than the median.
skewed rightA distribution with a longer tail extending to the right, where the mean is typically greater than the median.
symmetricA distribution where the left half is the mirror image of the right half.
uniformA distribution where each bar height is approximately the same with no prominent peaks.
unimodalA distribution with one main peak.
variabilityThe spread or dispersion of data values in a distribution.

1.7 Summary Statistics for a Quantitative Variable

TermDefinition
first quartileThe median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls.
interquartile rangeA measure of variability calculated as the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of data.
meanThe average value of a dataset, represented by μ in the context of a population.
measures of centerNumerical summaries that describe the central tendency of a data set, including the mean and median.
measures of positionNumerical summaries that describe the location of data values within a distribution, including quartiles and percentiles.
measures of variabilityStatistical measures that describe how spread out or dispersed data values are in a distribution.
medianThe middle value when data are ordered; for an even number of data points, typically the average of the two middle values.
nonresistantA characteristic of a statistic that is significantly affected or influenced by outliers; also called non-robust.
outlierData points that are unusually small or large relative to the rest of the data.
percentileA value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution.
Q1The first quartile; the value below which 25% of the data falls.
Q3The third quartile; the value below which 75% of the data falls.
quartileA value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values.
rangeA measure of variability calculated as the difference between the maximum and minimum data values in a dataset.
resistantA characteristic of a statistic that is not greatly affected by outliers; also called robust.
sample standard deviationThe standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
sample varianceThe square of the sample standard deviation, denoted by s², representing variability in squared units.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
statisticNumerical summaries or measures calculated from sample data, such as mean, median, or standard deviation.
third quartileThe median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls.

1.8 Graphical Representations of Summary Statistics

TermDefinition
boxplotA graphical representation of the five-number summary showing the distribution of data through a box and whiskers.
first quartileThe median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls.
five-number summaryA set of five values that describe a dataset: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
maximumThe largest value in a dataset.
meanThe average value of a dataset, represented by μ in the context of a population.
medianThe middle value when data are ordered; for an even number of data points, typically the average of the two middle values.
minimumThe smallest value in a dataset.
outlierData points that are unusually small or large relative to the rest of the data.
quantitative dataData that consists of numerical values that can be measured and analyzed mathematically.
quartileA value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values.
skewed leftA distribution with a longer tail extending to the left, where the mean is typically less than the median.
skewed rightA distribution with a longer tail extending to the right, where the mean is typically greater than the median.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
symmetric distributionA distribution where data is evenly distributed around the center, with the mean and median approximately equal.
third quartileThe median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls.
whiskersLines extending from the ends of a boxplot that reach to the most extreme data points that are not outliers.

1.9 Comparing Distributions of a Quantitative Variable

TermDefinition
centerA measure indicating the middle or typical value of a distribution.
clusterConcentrations of data usually separated by gaps in a distribution.
gapRegions of a distribution between two data values where there are no observed data.
graphical representationsVisual displays such as bar charts, pie charts, or other graphs used to present data in a visual format.
histogramA graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance.
independent samplesTwo or more separate groups of data where the values in one group do not influence or depend on the values in another group.
meanThe average value of a dataset, represented by μ in the context of a population.
outlierData points that are unusually small or large relative to the rest of the data.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
side-by-side boxplotsA graphical representation that displays multiple boxplots arranged next to each other to compare the distributions of different groups or samples.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
variabilityThe spread or dispersion of data values in a distribution.

✌️Unit 2 – Exploring Two–Variable Data

2.1 Introducing Statistics

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
patternsRecurring or observable regularities in data that may suggest a relationship between variables.
relationshipsConnections or associations between two or more variables in a dataset.

2.2 Representing Two Categorical Variables

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
distributionThe pattern of how data values are spread or arranged across a range.
joint relative frequencyA cell frequency in a two-way table divided by the total number of observations in the entire table, expressing the proportion of the total for a specific combination of categories.
mosaic plotsA graphical representation of two categorical variables where rectangles are sized proportionally to represent the frequency or relative frequency of each combination of categories.
segmented bar graphsA graphical representation where bars are divided into segments, with each segment representing a category of a second categorical variable, showing the composition within each category of the first variable.
side-by-side bar graphsA graphical representation that displays bars for one categorical variable grouped side-by-side for each category of another categorical variable, allowing for easy comparison between groups.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.3 Statistics for Two Categorical Variables

TermDefinition
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
conditional relative frequencyA relative frequency for a specific part of a contingency table, such as cell frequencies in a row divided by the total for that row.
distributionThe pattern of how data values are spread or arranged across a range.
marginal relative frequenciesThe row and column totals in a two-way table divided by the total for the entire table, representing the proportion of observations in each row or column.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

2.4 Representing the Relationship Between Two Quantitative Variables

TermDefinition
bivariate quantitative dataA data set consisting of observations of two different quantitative variables measured on the same individuals in a sample or population.
clusterConcentrations of data usually separated by gaps in a distribution.
directionThe type of association between two variables in a scatter plot, described as positive or negative.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
formThe pattern or shape of the relationship between two variables in a scatter plot, such as linear or non-linear.
linearA form of association in a scatter plot where the points follow a straight-line pattern.
negative associationA relationship between two variables where as values of one variable increase, values of the other variable tend to decrease.
non-linearA form of association in a scatter plot where the points do not follow a straight-line pattern.
outlierData points that are unusually small or large relative to the rest of the data.
positive associationA relationship between two variables where as values of one variable increase, values of the other variable tend to increase.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
scatter plotA graph that displays the relationship between two quantitative variables using points plotted on a coordinate plane.
strengthA measure of how closely individual points in a scatter plot follow a specific pattern, described as strong, moderate, or weak.

2.5 Correlation

TermDefinition
causationA relationship where changes in one variable directly cause changes in another variable.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
linear modelA mathematical representation of the linear relationship between two variables.
linear relationshipA relationship between two variables that can be described by a straight line.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.

2.6 Linear Regression Models

TermDefinition
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
extrapolationPredicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
linear regression modelAn equation that uses an explanatory variable to predict a response variable in a linear relationship.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.7 Residuals

TermDefinition
actual valueThe observed or measured response value in a dataset, denoted as y.
bivariate dataData involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them.
form of associationThe pattern or type of relationship between two variables, such as linear, curved, or no relationship.
linear modelA mathematical representation of the linear relationship between two variables.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
randomness in residualsThe absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plotA scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.

2.8 Least Squares Regression

TermDefinition
coefficient of determinationThe value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
coefficientsThe numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
parameterA numerical summary that describes a characteristic of an entire population.
predicted valueThe estimated response value obtained from a regression model, denoted as ŷ.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
sample standard deviationThe standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
simple linear regressionA regression model that describes the linear relationship between one explanatory variable and one response variable.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

2.9 Analyzing Departures from Linearity

TermDefinition
coefficient of determinationThe value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.
correlationA numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.
high-leverage pointA point in regression that has a substantially larger or smaller x-value than other observations in the dataset.
influential pointsPoints in a regression that, when removed, substantially change the relationship between variables, such as the slope, y-intercept, or correlation.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
natural logarithmA mathematical transformation using the logarithm with base e, often applied to response or explanatory variables to linearize relationships.
outlierData points that are unusually small or large relative to the rest of the data.
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plotA scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
transformed data setA dataset created by applying mathematical transformations (such as logarithms or powers) to the original variables to achieve a more linear relationship.
y-interceptThe value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

🔎Unit 3 – Collecting Data

3.1 Introducing Statistics

TermDefinition
chanceRandomness or probability-based selection used in data collection to reduce bias and ensure representativeness.
data collection methodsThe procedures and techniques used to gather information or data from a population or sample.

3.2 Introduction to Planning a Study

TermDefinition
causal relationshipsA relationship between variables where one variable directly causes changes in another variable.
experimentA study in which different conditions or treatments are assigned to experimental units to investigate cause-and-effect relationships.
experimental unitThe participants or subjects to which treatments are assigned in an experiment.
generalizationsConclusions or statements about a larger population based on data from a sample.
observational studyA study in which treatments are not imposed; investigators examine data from a sample to investigate a topic of interest about the population.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
prospectiveAn observational study approach where investigators follow a sample of individuals into the future, collecting data over time.
randomly selectedA sampling method where every member of the population has an equal chance of being chosen.
representative sampleA sample that accurately reflects the characteristics and composition of the population from which it was drawn.
retrospectiveAn observational study approach where investigators examine data from a sample of individuals based on past information.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
sample surveyA type of observational study that collects data from a sample to learn about the population from which the sample was taken.
treatmentDifferent conditions assigned to experimental units in an experiment.
variableA characteristic that changes from one individual to another in a set of data.

3.3 Random Sampling and Data Collection

TermDefinition
censusA data collection method that selects all items or subjects in a population.
clusterConcentrations of data usually separated by gaps in a distribution.
cluster sampleA sampling method in which a population is divided into smaller groups called clusters, and a simple random sample of clusters is selected, with data collected from all observations in the selected clusters.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
random number generatorA tool or method used to randomly select items from a population for inclusion in a simple random sample.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
sampling methodA specific procedure or technique used to select a subset of individuals from a population for data collection and analysis.
sampling with replacementA sampling method in which an item selected from a population can be selected again in subsequent draws.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
strataSeparate groups within a population created by dividing it based on shared attributes or characteristics for stratified sampling.
stratified random sampleA sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum.
systematic random sampleA sampling method in which sample members are selected from a population according to a random starting point and a fixed, periodic interval.

3.4 Potential Problems with Sampling

TermDefinition
biasA systematic tendency for certain responses to be favored over others in a sample, resulting in a sample that does not accurately represent the population.
convenience samplingA non-random sampling method where individuals are selected based on their accessibility or ease of inclusion, introducing potential bias.
non-random samplingSampling methods that do not use chance to select individuals from the population, introducing potential for bias.
nonresponse biasBias that occurs when individuals chosen for the sample cannot provide data or refuse to respond, and these individuals differ from those who do respond.
question wording biasA type of response bias caused by confusing or leading questions in a survey or data collection instrument.
response biasBias that results from problems in the data gathering instrument or process, such as confusing or leading questions.
self-reported responsesData collected directly from individuals about their own characteristics, behaviors, or opinions, which may introduce response bias.
undercoverage biasBias that occurs when part of the population has a reduced chance of being included in the sample, resulting in an unrepresentative sample.
voluntary response biasBias that occurs when a sample is comprised entirely of volunteers or people who choose to participate, making the sample unrepresentative of the population.

3.5 Introduction to Experimental Design

TermDefinition
blockingA technique that groups experimental units into blocks where units within each block are similar with respect to at least one blocking variable.
blocking variableA variable used to group experimental units into blocks so that natural variability can be separated from differences due to that variable.
completely randomized designAn experimental design where treatments are assigned to experimental units completely at random to balance the effects of confounding variables.
confounding variableA variable that is related to the explanatory variable and influences the response variable, potentially creating a false perception of association between them.
control groupA group in an experiment that receives no treatment or a standard/baseline treatment, used as a reference for comparison.
double-blind experimentAn experiment where neither the subjects nor the members of the research team who interact with them know which treatment a subject is receiving.
experimental unitThe participants or subjects to which treatments are assigned in an experiment.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
factorAn explanatory variable in an experiment whose levels are manipulated intentionally.
matched pairs designA special case of a randomized block design where subjects are arranged in pairs matched on relevant factors, and each pair receives both treatments.
participantHuman subjects or individuals who are assigned treatments in an experiment.
placeboAn inactive substance given to a control group to determine if a treatment of interest has an effect.
placebo effectA response to a placebo that occurs when experimental units react to receiving a treatment, even though the treatment is inactive.
random assignmentThe process of randomly allocating experimental units to different treatment groups to ensure unbiased distribution and reduce bias.
randomized complete block designAn experimental design where treatments are assigned completely at random within each block to control for a blocking variable.
replicationThe use of multiple experimental units in each treatment group to increase reliability and reduce the effect of random variation.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
single-blind experimentAn experiment where subjects do not know which treatment they are receiving, but members of the research team do, or vice versa.
treatmentDifferent conditions assigned to experimental units in an experiment.
treatment groupsDistinct groups in an experiment that receive different treatments or conditions being compared.

3.6 Selecting an Experimental Design

TermDefinition
experimental designA structured plan for conducting an experiment that specifies how treatments will be assigned to experimental units and how data will be collected.
experimental unitThe participants or subjects to which treatments are assigned in an experiment.

3.7 Inference and Experiments

TermDefinition
experimental unitThe participants or subjects to which treatments are assigned in an experiment.
generalizeThe process of extending conclusions from an experiment conducted on a sample to a larger population.
random assignmentThe process of randomly allocating experimental units to different treatment groups to ensure unbiased distribution and reduce bias.
random samplingA method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population.
representativeA characteristic of a sample that accurately reflects the key features and distribution of the larger population from which it was drawn.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
statistically significantA result indicating that an observed difference is large enough that it is unlikely to have occurred by chance alone.
treatmentDifferent conditions assigned to experimental units in an experiment.

🎲Unit 4 – Probability, Random Variables, and Probability Distributions

4.10 Introduction to the Binomial Distribution

TermDefinition
binomial distributionA probability distribution that describes the number of successes in a fixed number of independent trials, each with the same probability of success.
binomial probability functionThe formula P(X=x)=C(n,x)p^x(1-p)^(n-x) that calculates the probability of exactly x successes in n independent trials with probability of success p.
binomial random variableA random variable that counts the number of successes in a fixed number of repeated independent trials, where each trial has two possible outcomes.
independent trialsRepeated experiments or observations where the outcome of one trial does not affect the outcome of any other trial.
number of failuresThe count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition.
number of successesThe count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition.
probability distributionA function that describes the likelihood of all possible values of a random variable.
probability of successThe constant probability p that an individual trial results in a success in a binomial experiment.
random number generatorA tool or method used to randomly select items from a population for inclusion in a simple random sample.
simulationA method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities.

4.1 Introducing Statistics

TermDefinition
patterns in dataObservable regularities or trends that appear in a dataset, which may or may not indicate non-random behavior.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

4.11 Parameters for a Binomial Distribution

TermDefinition
binomial distributionA probability distribution that describes the number of successes in a fixed number of independent trials, each with the same probability of success.
meanThe average value of a dataset, represented by μ in the context of a population.
parameterA numerical summary that describes a characteristic of an entire population.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
random variableA variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.

4.12 The Geometric Distribution

TermDefinition
geometric distributionA probability distribution that models the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials, each with the same probability of success.
geometric probability functionThe formula P(X=x)=(1-p)^(x-1)p that calculates the probability that the first success occurs on trial x.
geometric random variableA random variable that represents the number of the trial on which the first success occurs in a sequence of independent trials.
independent trialsRepeated experiments or observations where the outcome of one trial does not affect the outcome of any other trial.
meanThe average value of a dataset, represented by μ in the context of a population.
number of failuresThe count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition.
number of successesThe count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition.
parameterA numerical summary that describes a characteristic of an entire population.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
probability of successThe constant probability p that an individual trial results in a success in a binomial experiment.
random variableA variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.

4.2 Estimating Probabilities Using Simulation

TermDefinition
eventA collection of one or more outcomes from a random process.
law of large numbersThe principle that simulated or empirical probabilities tend to get closer to the true probability as the number of trials increases.
outcomeThe result of a single trial of a random process.
random processA process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
simulationA method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities.

4.3 Introduction to Probability

TermDefinition
complement of an eventThe set of all outcomes in the sample space that are not in event E, denoted E' or E^C, representing 'not E'.
equally likelyA condition where all outcomes in a sample space have the same probability of occurring.
eventA collection of one or more outcomes from a random process.
long runA large number of repetitions of a probability experiment where the relative frequency of an event approaches its true probability.
outcomeThe result of a single trial of a random process.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
random processA process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
sample spaceThe set of all possible non-overlapping outcomes of a random process.

4.4 Mutually Exclusive Events

TermDefinition
intersectionThe set of outcomes that belong to both event A and event B, denoted A ∩ B.
joint probabilityThe probability that two events A and B both occur, denoted P(A ∩ B).
mutually exclusiveTwo events that cannot occur at the same time; events with no outcomes in common.

4.5 Conditional Probability

TermDefinition
conditional probabilityThe probability that one event will occur given that another event has already occurred, denoted P(A | B).
joint probabilityThe probability that two events A and B both occur, denoted P(A ∩ B).
multiplication ruleA probability rule stating that P(A ∩ B) = P(A) · P(B | A), used to find the probability that two events both occur.

4.6 Independent Events and Unions of Events

TermDefinition
addition ruleA probability rule stating that P(A ∪ B) = P(A) + P(B) - P(A ∩ B), used to find the probability of the union of two events.
conditional probabilityThe probability that one event will occur given that another event has already occurred, denoted P(A | B).
independent eventsEvents A and B are independent if knowing whether event A has occurred does not change the probability that event B will occur.
intersectionThe set of outcomes that belong to both event A and event B, denoted A ∩ B.
union of eventsThe event that either event A or event B or both will occur, denoted P(A ∪ B).

4.7 Introduction to Random Variables and Probability Distributions

TermDefinition
centerA measure indicating the middle or typical value of a distribution.
cumulative probability distributionA representation (as a table or function) showing the probability that a random variable is less than or equal to each of its possible values.
discrete random variableA random variable that takes on a countable number of distinct values, often representing counts or categorical outcomes.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
probability distributionA function that describes the likelihood of all possible values of a random variable.
random processA process that generates results determined by chance, where the outcome cannot be predicted with certainty in advance.
random variableA variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities.
shapeThe overall form or pattern of a distribution, including characteristics like skewness and modality.
spreadA measure of how dispersed or variable the outcomes of a probability distribution are, such as range, variance, or standard deviation.

4.8 Mean and Standard Deviation of Random Variables

TermDefinition
discrete random variableA random variable that takes on a countable number of distinct values, often representing counts or categorical outcomes.
expected valueThe long-run average outcome of a random variable, equivalent to the mean of a discrete random variable.
meanThe average value of a dataset, represented by μ in the context of a population.
parameterA numerical summary that describes a characteristic of an entire population.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.

4.9 Combining Random Variables

TermDefinition
independent random variablesRandom variables where knowing the value or probability distribution of one does not change the probability distribution of the other.
linear combinationsExpressions of the form aX + bY where X and Y are random variables and a and b are real number coefficients.
linear transformationsChanges to a random variable of the form Y = a + bX, where a and b are constants that shift and scale the distribution.
meanThe average value of a dataset, represented by μ in the context of a population.
probability distributionA function that describes the likelihood of all possible values of a random variable.
random variableA variable whose value is determined by the outcome of a random phenomenon and can take on different numerical values with associated probabilities.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
varianceA measure of the spread or dispersion of a probability distribution, denoted as σ², indicating how far values typically deviate from the mean.

📊Unit 5 – Sampling Distributions

5.1 Introducing Statistics

TermDefinition
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
statisticNumerical summaries or measures calculated from sample data, such as mean, median, or standard deviation.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

5.2 The Normal Distribution, Revisited

TermDefinition
areaThe region under the normal distribution curve, representing the probability or proportion of values within a specified interval.
bell-shapedThe characteristic shape of a normal distribution, with a peak at the center and tails that extend symmetrically on both sides.
boundariesThe endpoints of an interval that define where a specified area or probability begins and ends in a normal distribution.
continuous random variableA variable that can take on any value within a specified domain, with every interval having an associated probability.
inequalitiesMathematical expressions using symbols such as <, >, ≤, or ≥ to describe the relationship between a variable and the boundaries of an interval.
intervalA range of values between two boundaries, used to represent a set of outcomes in a normal distribution.
normal curveThe bell-shaped graph of a normal distribution that is symmetric and mound-shaped.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
probability approximationUsing a known distribution (such as the normal distribution) to estimate probabilities for an unknown or complex distribution.
standard normal tableA reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution.
symmetricalA property of a distribution where the left and right sides are mirror images of each other around the center.
z-scoreA standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean.

5.3 The Central Limit Theorem

TermDefinition
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
central limit theoremA theorem stating that when the sample size is sufficiently large, the sampling distribution of the mean of a random variable will be approximately normally distributed.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null distributionThe probability distribution of the test statistic under the assumption that the null hypothesis is true.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
simulationA method of modeling random events so that simulated outcomes closely match real-world outcomes, used to estimate probabilities.
statisticNumerical summaries or measures calculated from sample data, such as mean, median, or standard deviation.

5.4 Biased and Unbiased Point Estimates

TermDefinition
biasedA property of an estimator where the average value of the estimator does not equal the population parameter being estimated.
estimatorA statistic used to estimate or approximate the value of a population parameter based on sample data.
population parameterA numerical characteristic of an entire population, such as the mean, proportion, or standard deviation.
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
unbiasedA property of an estimator where the average value of the estimator equals the population parameter being estimated.
variabilityThe spread or dispersion of data values in a distribution.

5.5 Sampling Distributions for Sample Proportions

TermDefinition
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
independent samplesTwo or more separate groups of data where the values in one group do not influence or depend on the values in another group.
mean of the sampling distributionThe expected value of a sample statistic; for sample proportions, μp̂ = p.
parameterA numerical summary that describes a characteristic of an entire population.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
sample proportionThe proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
sample size conditionThe requirement that np ≥ 10 and n(1-p) ≥ 10 must be satisfied for a sampling distribution of a sample proportion to be approximately normal.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling with replacementA sampling method in which an item selected from a population can be selected again in subsequent draws.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
standard deviation of the sampling distributionThe measure of variability in a sampling distribution; for sample proportions, σp̂ = √(p(1-p)/n).

5.6 Sampling Distributions for Differences in Sample Proportions

TermDefinition
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
difference in proportionsThe difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations.
difference in sample proportionsThe difference between two sample proportions (p̂₁ - p̂₂) used to compare proportions from two different samples.
independent populationsTwo populations from which samples are drawn such that the selection from one population does not affect the selection from the other.
mean of the sampling distributionThe expected value of a sample statistic; for sample proportions, μp̂ = p.
normality conditionsThe requirements that must be met for a sampling distribution to be approximately normal, such as n₁p₁ ≥ 10, n₁(1-p₁) ≥ 10, n₂p₂ ≥ 10, and n₂(1-p₂) ≥ 10.
parameterA numerical summary that describes a characteristic of an entire population.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
sample proportionThe proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling with replacementA sampling method in which an item selected from a population can be selected again in subsequent draws.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
standard deviation of the sampling distributionThe measure of variability in a sampling distribution; for sample proportions, σp̂ = √(p(1-p)/n).

5.7 Sampling Distributions for Sample Means

TermDefinition
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
parameterA numerical summary that describes a characteristic of an entire population.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
population distributionThe distribution of all values of a variable across the entire population.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population sizeThe total number of individuals or items in an entire population.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
random sampling with replacementA sampling method where each selected item is returned to the population before the next selection, allowing the same item to be selected multiple times.
random sampling without replacementA sampling method where each selected item is not returned to the population, so each item can only be selected once.
sample meanThe average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.

5.8 Sampling Distributions for Differences in Sample Means

TermDefinition
difference in sample meansThe result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂.
independent populationsTwo populations from which samples are drawn such that the selection from one population does not affect the selection from the other.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
parameterA numerical summary that describes a characteristic of an entire population.
population distributionThe distribution of all values of a variable across the entire population.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviationA measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution.
probabilityThe likelihood or chance that a particular outcome or event will occur, expressed as a value between 0 and 1.
sample meanThe average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling with replacementA sampling method in which an item selected from a population can be selected again in subsequent draws.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.

⚖️Unit 6 – Proportions

6.10 Setting Up a Test for the Difference of Two Population Proportions

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
difference of two population proportionsThe comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
one-sided alternative hypothesisAn alternative hypothesis that specifies the direction of the difference, either p₁ < p₂ or p₁ > p₂.
pooled proportionA combined estimate of the population proportion calculated from both samples when assuming the null hypothesis is true: p̂c = (n₁p̂₁ + n₂p̂₂)/(n₁ + n₂).
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
two-sample z-testA hypothesis test used to compare the difference between two population proportions using the standard normal distribution.
two-sided alternative hypothesisAn alternative hypothesis that specifies the difference could be in either direction, stated as p₁ ≠ p₂.

6.1 Introducing Statistics

TermDefinition
distributionThe pattern of how data values are spread or arranged across a range.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

6.11 Carrying Out a Test for the Difference of Two Population Proportions

TermDefinition
difference in sample proportionsThe difference between two sample proportions (p̂₁ - p̂₂) used to compare proportions from two different samples.
difference of two population proportionsThe comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
pooled proportionA combined estimate of the population proportion calculated from both samples when assuming the null hypothesis is true: p̂c = (n₁p̂₁ + n₂p̂₂)/(n₁ + n₂).
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.

6.2 Constructing a Confidence Interval for a Population Proportion

TermDefinition
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
confidence levelThe probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%.
critical valueA value from the standard normal distribution used to determine the margin of error for a given confidence level.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
number of failuresThe count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition.
number of successesThe count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition.
one-sample z-interval for a proportionA confidence interval procedure used to estimate a population proportion based on a single sample, using the standard normal (z) distribution.
population parameterA numerical characteristic of an entire population, such as the mean, proportion, or standard deviation.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample proportionThe proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
standard normal distributionA normal distribution with mean 0 and standard deviation 1, used to determine critical values for confidence intervals.

6.3 Justifying a Claim Based on a Confidence Interval for a Population Proportion

TermDefinition
claimA statement or assertion about a population parameter that can be evaluated using statistical evidence.
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
confidence levelThe probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
one-sample proportionA confidence interval or hypothesis test that estimates or tests a single population proportion based on data from one sample.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
width of a confidence intervalThe range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval.

6.4 Setting Up a Test for a Population Proportion

TermDefinition
10% conditionThe requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement.
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
number of failuresThe count of unfavorable outcomes in a sample, denoted as n(1-p̂), used to verify the normality condition.
number of successesThe count of favorable outcomes in a sample, denoted as np̂, used to verify the normality condition.
one-sample z-test for a population proportionA hypothesis test used to determine whether a sample proportion provides evidence that a population proportion differs from a hypothesized value.
one-sided alternative hypothesisAn alternative hypothesis that specifies the direction of the difference, either p₁ < p₂ or p₁ > p₂.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample proportionThe proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
two-sided alternative hypothesisAn alternative hypothesis that specifies the difference could be in either direction, stated as p₁ ≠ p₂.

6.5 Interpreting p-Values

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
null distributionThe probability distribution of the test statistic under the assumption that the null hypothesis is true.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
one-sample proportionA confidence interval or hypothesis test that estimates or tests a single population proportion based on data from one sample.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
probability modelA mathematical framework that describes the probability distribution of outcomes under specified assumptions.
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
theoretical distributionA probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic.
z-statisticA standardized test statistic for a population proportion calculated as (sample statistic - null value) divided by the standard deviation of the statistic.
z-testA hypothesis test that uses the standard normal distribution to determine whether a sample statistic differs significantly from a population parameter.

6.6 Concluding a Test for a Population Proportion

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
statistical evidenceInformation from sample data that supports or fails to support a hypothesis about a population parameter.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.

6.7 Potential Errors When Performing Tests

TermDefinition
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
parameterA numerical summary that describes a characteristic of an entire population.
power of a testThe probability that a statistical test will correctly reject a false null hypothesis.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
Type I errorAn error that occurs when a null hypothesis is rejected when it is actually true; the probability of committing this error is equal to the significance level (α).
Type II errorAn error that occurs when a null hypothesis is not rejected when it is actually false.

6.8 Confidence Intervals for the Difference of Two Proportions

TermDefinition
10% conditionThe requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement.
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
difference in proportionsThe difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations.
difference of two population proportionsThe comparison between two population proportions, expressed as p₁ - p₂, to determine if they differ significantly.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample proportionThe proportion of individuals in a sample that have a particular characteristic, denoted as p-hat (p̂).
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
success-failure conditionA requirement that the expected number of successes and failures in each sample (np̂ and n(1-p̂)) meet a minimum threshold, typically 5 or 10, to ensure the sampling distribution is approximately normal.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
two-sample z-intervalA confidence interval procedure that uses the standard normal distribution to estimate the difference between two population proportions based on sample data.

6.9 Justifying a Claim Based on a Confidence Interval for a Difference of Population Proportions

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
difference in proportionsThe difference between two population proportions, calculated as p₁ - p₂, used to compare the prevalence of a characteristic across two populations.
population proportionThe true proportion or percentage of a characteristic in an entire population, typically denoted as p.
random samplingA method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population.
sample sizeThe number of observations or data points collected in a sample, denoted as n.

😼Unit 7 – Means

7.1 Introducing Statistics

TermDefinition
probabilities of errorsThe likelihood or chance that errors will occur in statistical inference.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

7.2 Constructing a Confidence Interval for a Population Mean

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
confidence interval procedureA statistical method used to construct an interval estimate for a population parameter based on sample data.
critical valueA value from the standard normal distribution used to determine the margin of error for a given confidence level.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
density curveA graphical representation of a probability distribution showing the relative likelihood of different values.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
matched pairsPaired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values.
mean differenceThe average of the differences between paired observations, denoted by μd, where the order of subtraction must be clearly defined.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
one-sample t-intervalA confidence interval for a population mean constructed using the t-distribution when the population standard deviation is unknown.
outlierData points that are unusually small or large relative to the rest of the data.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviationA measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample meanThe average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sample standard deviationThe standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
skewnessA measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
t-distributionA probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution.
tailsThe extreme regions at both ends of a probability distribution's density curve where the t-distribution allocates more area than the normal distribution.

7.3 Justifying a Claim About a Population Mean Based on a Confidence Interval

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
confidence levelThe probability that a confidence interval will contain the true population parameter, typically expressed as a percentage such as 90%, 95%, or 99%.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
matched pairsPaired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values.
populationThe entire group of individuals or items from which a sample is drawn and about which conclusions are to be made.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
width of a confidence intervalThe range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval.

7.4 Setting Up a Test for a Population Mean

TermDefinition
10% conditionThe requirement that sample size n is at most 10% of the population size N to ensure independence when sampling without replacement.
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
conditions for the testThe requirements that must be satisfied before conducting a hypothesis test for a population mean, including independence and normality of the sampling distribution.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
matched pairsPaired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values.
mean differenceThe average of the differences between paired observations, denoted by μd, where the order of subtraction must be clearly defined.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
one-sample t-testA hypothesis test used to determine whether a population mean differs from a hypothesized value when the population standard deviation is unknown.
outlierData points that are unusually small or large relative to the rest of the data.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
skewnessA measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center.

7.5 Carrying Out a Test for a Population Mean

TermDefinition
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
matched pairsPaired observations where two measurements are taken on the same subject or on subjects that are matched according to specific criteria, used to analyze the mean difference between the paired values.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
sample meanThe average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
t-distributionA probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.

7.6 Confidence Intervals for the Difference of Two Means

TermDefinition
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
confidence interval procedureA statistical method used to construct an interval estimate for a population parameter based on sample data.
critical valueA value from the standard normal distribution used to determine the margin of error for a given confidence level.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
difference of population meansThe difference between the mean values of two distinct populations, calculated as μ₁ - μ₂.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
independent samplesTwo or more separate groups of data where the values in one group do not influence or depend on the values in another group.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviationsThe measure of spread in each of two populations; when unknown, sample standard deviations are used as estimates.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample meanThe average of all values in a sample, denoted as x̄, used as an estimate of the population mean.
sample standard deviationsThe measures of variability within each of the two samples, denoted as s₁ and s₂.
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
skewed distributionsDistributions that are not symmetric, with data concentrated on one side and a tail extending to the other side.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
t-distributionA probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution.
two-sample t-intervalA confidence interval procedure used to estimate the difference between two population means using sample data from two independent samples.

7.7 Justifying a Claim About the Difference of Two Means Based on a Confidence Interval

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
difference in sample meansThe result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂.
difference of population meansThe difference between the mean values of two distinct populations, calculated as μ₁ - μ₂.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
random samplingA method of selecting samples from a population where each member has an equal chance of being chosen, ensuring the sample is representative of the population.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
width of a confidence intervalThe range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval.

7.8 Setting Up a Test for the Difference of Two Population Means

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
approximately normalA distribution that closely follows the shape of a normal distribution, allowing for the use of normal probability methods.
difference of population meansThe difference between the mean values of two distinct populations, calculated as μ₁ - μ₂.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
outlierData points that are unusually small or large relative to the rest of the data.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
skewnessA measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center.
two-sample t-testA statistical test used to determine whether there is a significant difference between the means of two independent population samples.

7.9 Carrying Out a Test for the Difference of Two Population Means

TermDefinition
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
difference in sample meansThe result of subtracting one sample mean from another sample mean, calculated as x̄₁ - x̄₂.
difference of population meansThe difference between the mean values of two distinct populations, calculated as μ₁ - μ₂.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
statistical reasoningThe logical process of using sample data and significance test results to draw conclusions about populations and answer research questions.
t-distributionA probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
two-sample testA significance test used to compare the means of two different populations based on sample data from each population.

✳️Unit 8 – Chi–Squares

8.1 Introducing Statistics

TermDefinition
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
observed countThe actual frequency or number of observations in each cell of a contingency table from the collected data.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

8.2 Setting Up a Chi Square Goodness of Fit Test

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
chi-square distributionsProbability distributions used to test the goodness of fit between observed and expected categorical data, characterized by positive values and right skewness.
chi-square statisticA test statistic that measures the distance between observed and expected counts relative to the expected counts.
chi-square testA statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
distribution of proportionsThe way in which proportions are spread across the categories of a categorical variable.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
goodness of fitA statistical test that determines how well observed data match the expected distribution specified by a hypothesis.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
null proportionThe hypothesized proportion for each category under the null hypothesis in a chi-square goodness of fit test.
observed countThe actual frequency or number of observations in each cell of a contingency table from the collected data.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.

8.3 Carrying Out a Chi Square Goodness of Fit Test

TermDefinition
chi-square distributionA probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics.
chi-square testA statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
null distributionThe probability distribution of the test statistic under the assumption that the null hypothesis is true.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
observed countThe actual frequency or number of observations in each cell of a contingency table from the collected data.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
probability modelA mathematical framework that describes the probability distribution of outcomes under specified assumptions.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
theoretical distributionA probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic.

8.4 Expected Counts in Two Way Tables

TermDefinition
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

8.5 Setting Up a Chi-Square Test for Homogeneity or Independence

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
chi-square testA statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.
chi-square test for homogeneityA statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments.
chi-square test for independenceA statistical test used to determine whether two categorical variables in a population are associated or independent.
distributionThe pattern of how data values are spread or arranged across a range.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
homogeneityIn a chi-square test, the condition where the distribution of a categorical variable is the same across different groups or populations.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
row and column variablesThe two categorical variables displayed in a two-way table, with one variable defining the rows and the other defining the columns.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
stratified random sampleA sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

8.6 Carrying Out a Chi-Square Test for Homogeneity or Independence

TermDefinition
chi-square distributionA probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics.
chi-square statisticA test statistic that measures the distance between observed and expected counts relative to the expected counts.
chi-square test for homogeneityA statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments.
chi-square test for independenceA statistical test used to determine whether two categorical variables in a population are associated or independent.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
observed countThe actual frequency or number of observations in each cell of a contingency table from the collected data.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
probability modelA mathematical framework that describes the probability distribution of outcomes under specified assumptions.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
research questionThe specific question about a population or populations that a statistical test is designed to answer.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

📈Unit 9 – Slopes

9.1 Introducing Statistics

TermDefinition
non-random variationVariation in data points that follows a systematic or predictable pattern rather than occurring by chance.
scatter plotsA graph that displays the relationship between two quantitative variables, with each point representing an observation.
variationDifferences in data that occur by chance due to the random nature of sampling, rather than from systematic causes.

9.2 Confidence Intervals for the Slope of a Regression Model

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
critical valueA value from the standard normal distribution used to determine the margin of error for a given confidence level.
explanatory variableA variable whose values are used to explain or predict corresponding values for the response variable.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
least-squares regression lineA linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.
linearityThe condition that the true relationship between two variables follows a straight line.
margin of errorThe amount by which a sample statistic is likely to vary from the corresponding population parameter, calculated as the critical value times the standard error.
normalityThe condition that data follows an approximately normal (bell-shaped) distribution.
population regression lineThe true linear relationship μy = α + βx between the response and explanatory variables in the entire population.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
regression modelA statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x).
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
response variableA variable whose values are being explained or predicted based on the explanatory variable.
sample regression lineThe line ŷ = a + bx calculated from sample data that estimates the population regression line.
sample statisticA numerical value calculated from sample data that is used to estimate the corresponding population parameter.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
skewedA distribution that is not symmetric, with one tail longer or more pronounced than the other.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
slope of a regression modelThe coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
standard deviation of residualsA measure of the spread of residuals around the regression line, estimated by s = √(Σ(yi - ŷi)²/(n-2)).
standard deviation of x valuesA measure of the spread of the x-variable values in the sample, denoted as sx in the standard error formula.
standard error of the slopeA measure of the variability of the slope estimate across different samples, calculated as s divided by (sx times the square root of n-1).
t-intervalA confidence interval procedure that uses the t-distribution, appropriate for estimating the slope of a regression model.
t*The critical value from the t-distribution used to construct a confidence interval for the slope of a regression model.

9.3 Justifying a Claim About the Slope of a Regression Model Based on a Confidence Interval

TermDefinition
confidence intervalA range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.
population regression modelThe true regression model for an entire population, as opposed to a sample-based regression model.
regression modelA statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x).
repeated random samplingThe process of taking multiple random samples from a population, each of the same size, to understand the variability of sample statistics.
sampleA subset of individuals or items selected from a population for the purpose of data collection and analysis.
sample sizeThe number of observations or data points collected in a sample, denoted as n.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
slope of a regression modelThe coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation.
width of a confidence intervalThe range or span of a confidence interval, calculated as the difference between the upper and lower bounds of the interval.

9.4 Setting Up a Test for the Slope of a Regression Model

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
linear relationshipA relationship between two variables that can be described by a straight line.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
outlierData points that are unusually small or large relative to the rest of the data.
random sampleA sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
regression modelA statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x).
residualThe difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
skewnessA measure of the asymmetry of a distribution, indicating whether data is concentrated more on one side of the center.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
slope of a regression modelThe coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
t-test for a slopeA hypothesis test used to determine whether the slope of a regression model is significantly different from zero, assessing whether there is a statistically significant linear relationship between variables.

9.5 Carrying Out a Test for the Slope of a Regression Model

TermDefinition
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
null distributionThe probability distribution of the test statistic under the assumption that the null hypothesis is true.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
population regression lineThe true linear relationship μy = α + βx between the response and explanatory variables in the entire population.
regression modelA statistical model that describes the relationship between a response variable (y) and one or more explanatory variables (x).
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
sampling distributionThe probability distribution of a sample statistic (such as a sample proportion) obtained from repeated sampling of a population.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
simple linear regressionA regression model that describes the linear relationship between one explanatory variable and one response variable.
slopeThe value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.
slope of a regression modelThe coefficient that represents the rate of change in the predicted response variable for each unit increase in the explanatory variable in a linear regression equation.
standard errorThe standard deviation of a sampling distribution, which measures the variability of a sample statistic across repeated samples.
t-distributionA probability distribution used when the population standard deviation is unknown and the sample standard deviation is used instead, characterized by heavier tails than the normal distribution.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.

Browse All A-Z

#

A