upgrade
upgrade
👆AP Statistics Unit 1 Vocabulary

107 essential vocabulary terms and definitions for Unit 1 – Exploring One–Variable Data

Study Unit 1
Practice Vocabulary
👆Unit 1 – Exploring One–Variable Data
Topics

👆Unit 1 – Exploring One–Variable Data

1.10 The Normal Distribution

TermDefinition
approximately normally distributedA description of data sets that closely follow the pattern of a normal distribution with a mound-shaped, symmetric curve.
empirical ruleA rule stating that for a normal distribution, approximately 68% of observations fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations.
meanThe average value of a dataset, represented by μ in the context of a population.
normal curveThe bell-shaped graph of a normal distribution that is symmetric and mound-shaped.
normal distributionA probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ).
normally distributed random variableA random variable that follows a normal distribution, allowing for the calculation of probabilities for specific intervals.
parameterA numerical summary that describes a characteristic of an entire population.
percentileA value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution.
population meanThe average of all values in an entire population, denoted as μ.
population meansThe average values of two distinct populations being compared, denoted as μ₁ and μ₂.
population standard deviationA measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
relative positionThe location of a data point within a data set, often expressed in comparison to other values or as a measure of how it ranks relative to the distribution.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
standard normal tableA reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution.
z-scoreA standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean.

1.2 The Language of Variation

TermDefinition
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
variableA characteristic that changes from one individual to another in a set of data.

1.3 Representing a Categorical Variable with Tables

TermDefinition
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
frequency tableA table that displays the number of cases or observations falling into each category.
percentageA proportion expressed as a number out of 100, calculated by multiplying the relative frequency by 100.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
rateA ratio that compares two quantities with different units, often used to express frequency or occurrence per unit.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
relative frequency tableA table that displays the proportion or percentage of cases falling into each category.

1.4 Representing a Categorical Variable with Graphs

TermDefinition
bar chartA graph that displays frequencies or relative frequencies for categorical data using rectangular bars, where the height or length represents the count or proportion in each category.
bar graphA graphical representation using rectangular bars to display the frequency or count of categories in a categorical variable.
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
frequenciesThe count or number of observations falling within each category of categorical data.
frequency tableA table that displays the number of cases or observations falling into each category.
graphical representationsVisual displays such as bar charts, pie charts, or other graphs used to present data in a visual format.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.

1.5 Representing a Quantitative Variable with Graphs

TermDefinition
continuous variableA variable that can take on infinitely many values that cannot be counted, with infinitely many possible values between any two given values.
cumulative graphA graph that represents the number or proportion of a data set that is less than or equal to a given number.
discrete variableA variable that can take on a countable number of values, which may be finite or countably infinite.
dotplotA graph that represents each observation as a dot, with position on the horizontal axis corresponding to the data value, with nearly identical values stacked vertically.
histogramA graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance.
intervalA range of values between two boundaries, used to represent a set of outcomes in a normal distribution.
leafIn a stem and leaf plot, usually the last digit of a data value.
quantitative variableA variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.
stemIn a stem and leaf plot, the first digit or digits of a data value.
stem and leaf plotA graphical representation where each data value is split into a stem (first digit or digits) and a leaf (usually the last digit).

1.6 Describing the Distribution of a Quantitative Variable

TermDefinition
bimodalA distribution with two prominent peaks.
centerA measure indicating the middle or typical value of a distribution.
clusterConcentrations of data usually separated by gaps in a distribution.
descriptive statisticsMethods used to summarize and describe the characteristics of a data set without making inferences about a larger population.
distributionThe pattern of how data values are spread or arranged across a range.
gapRegions of a distribution between two data values where there are no observed data.
outlierData points that are unusually small or large relative to the rest of the data.
quantitative dataData that consists of numerical values that can be measured and analyzed mathematically.
shapeThe overall form or pattern of a distribution, including characteristics like skewness and modality.
skewed leftA distribution with a longer tail extending to the left, where the mean is typically less than the median.
skewed rightA distribution with a longer tail extending to the right, where the mean is typically greater than the median.
symmetricA distribution where the left half is the mirror image of the right half.
uniformA distribution where each bar height is approximately the same with no prominent peaks.
unimodalA distribution with one main peak.
variabilityThe spread or dispersion of data values in a distribution.

1.7 Summary Statistics for a Quantitative Variable

TermDefinition
first quartileThe median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls.
interquartile rangeA measure of variability calculated as the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of data.
meanThe average value of a dataset, represented by μ in the context of a population.
measures of centerNumerical summaries that describe the central tendency of a data set, including the mean and median.
measures of positionNumerical summaries that describe the location of data values within a distribution, including quartiles and percentiles.
measures of variabilityStatistical measures that describe how spread out or dispersed data values are in a distribution.
medianThe middle value when data are ordered; for an even number of data points, typically the average of the two middle values.
nonresistantA characteristic of a statistic that is significantly affected or influenced by outliers; also called non-robust.
outlierData points that are unusually small or large relative to the rest of the data.
percentileA value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution.
Q1The first quartile; the value below which 25% of the data falls.
Q3The third quartile; the value below which 75% of the data falls.
quartileA value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values.
rangeA measure of variability calculated as the difference between the maximum and minimum data values in a dataset.
resistantA characteristic of a statistic that is not greatly affected by outliers; also called robust.
sample standard deviationThe standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).
sample varianceThe square of the sample standard deviation, denoted by s², representing variability in squared units.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
statisticNumerical summaries or measures calculated from sample data, such as mean, median, or standard deviation.
third quartileThe median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls.

1.8 Graphical Representations of Summary Statistics

TermDefinition
boxplotA graphical representation of the five-number summary showing the distribution of data through a box and whiskers.
first quartileThe median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls.
five-number summaryA set of five values that describe a dataset: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
maximumThe largest value in a dataset.
meanThe average value of a dataset, represented by μ in the context of a population.
medianThe middle value when data are ordered; for an even number of data points, typically the average of the two middle values.
minimumThe smallest value in a dataset.
outlierData points that are unusually small or large relative to the rest of the data.
quantitative dataData that consists of numerical values that can be measured and analyzed mathematically.
quartileA value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values.
skewed leftA distribution with a longer tail extending to the left, where the mean is typically less than the median.
skewed rightA distribution with a longer tail extending to the right, where the mean is typically greater than the median.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
symmetric distributionA distribution where data is evenly distributed around the center, with the mean and median approximately equal.
third quartileThe median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls.
whiskersLines extending from the ends of a boxplot that reach to the most extreme data points that are not outliers.

1.9 Comparing Distributions of a Quantitative Variable

TermDefinition
centerA measure indicating the middle or typical value of a distribution.
clusterConcentrations of data usually separated by gaps in a distribution.
gapRegions of a distribution between two data values where there are no observed data.
graphical representationsVisual displays such as bar charts, pie charts, or other graphs used to present data in a visual format.
histogramA graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance.
independent samplesTwo or more separate groups of data where the values in one group do not influence or depend on the values in another group.
meanThe average value of a dataset, represented by μ in the context of a population.
outlierData points that are unusually small or large relative to the rest of the data.
relative frequencyThe proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total.
side-by-side boxplotsA graphical representation that displays multiple boxplots arranged next to each other to compare the distributions of different groups or samples.
standard deviationA measure of how spread out data values are from the mean, represented by σ in the context of a population.
summary statisticsNumerical measures that describe key features of a dataset, such as center, spread, and shape.
variabilityThe spread or dispersion of data values in a distribution.