| Term | Definition |
|---|---|
| approximately normally distributed | A description of data sets that closely follow the pattern of a normal distribution with a mound-shaped, symmetric curve. |
| empirical rule | A rule stating that for a normal distribution, approximately 68% of observations fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| normal curve | The bell-shaped graph of a normal distribution that is symmetric and mound-shaped. |
| normal distribution | A probability distribution that is mound-shaped and symmetric, characterized by a population mean (μ) and population standard deviation (σ). |
| normally distributed random variable | A random variable that follows a normal distribution, allowing for the calculation of probabilities for specific intervals. |
| parameter | A numerical summary that describes a characteristic of an entire population. |
| percentile | A value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution. |
| population mean | The average of all values in an entire population, denoted as μ. |
| population means | The average values of two distinct populations being compared, denoted as μ₁ and μ₂. |
| population standard deviation | A measure of the spread or dispersion of all values in a population, denoted by σ, which is a parameter of the normal distribution. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| relative position | The location of a data point within a data set, often expressed in comparison to other values or as a measure of how it ranks relative to the distribution. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| standard normal table | A reference table that provides the cumulative probabilities (areas under the curve) for the standard normal distribution. |
| z-score | A standardized score calculated as (xi - μ)/σ that measures how many standard deviations a data value is from the mean. |
| Term | Definition |
|---|---|
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| variable | A characteristic that changes from one individual to another in a set of data. |
| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| frequency table | A table that displays the number of cases or observations falling into each category. |
| percentage | A proportion expressed as a number out of 100, calculated by multiplying the relative frequency by 100. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| rate | A ratio that compares two quantities with different units, often used to express frequency or occurrence per unit. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| relative frequency table | A table that displays the proportion or percentage of cases falling into each category. |
| Term | Definition |
|---|---|
| bar chart | A graph that displays frequencies or relative frequencies for categorical data using rectangular bars, where the height or length represents the count or proportion in each category. |
| bar graph | A graphical representation using rectangular bars to display the frequency or count of categories in a categorical variable. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| frequencies | The count or number of observations falling within each category of categorical data. |
| frequency table | A table that displays the number of cases or observations falling into each category. |
| graphical representations | Visual displays such as bar charts, pie charts, or other graphs used to present data in a visual format. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| Term | Definition |
|---|---|
| continuous variable | A variable that can take on infinitely many values that cannot be counted, with infinitely many possible values between any two given values. |
| cumulative graph | A graph that represents the number or proportion of a data set that is less than or equal to a given number. |
| discrete variable | A variable that can take on a countable number of values, which may be finite or countably infinite. |
| dotplot | A graph that represents each observation as a dot, with position on the horizontal axis corresponding to the data value, with nearly identical values stacked vertically. |
| histogram | A graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance. |
| interval | A range of values between two boundaries, used to represent a set of outcomes in a normal distribution. |
| leaf | In a stem and leaf plot, usually the last digit of a data value. |
| quantitative variable | A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis. |
| stem | In a stem and leaf plot, the first digit or digits of a data value. |
| stem and leaf plot | A graphical representation where each data value is split into a stem (first digit or digits) and a leaf (usually the last digit). |
| Term | Definition |
|---|---|
| bimodal | A distribution with two prominent peaks. |
| center | A measure indicating the middle or typical value of a distribution. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| descriptive statistics | Methods used to summarize and describe the characteristics of a data set without making inferences about a larger population. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| gap | Regions of a distribution between two data values where there are no observed data. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| quantitative data | Data that consists of numerical values that can be measured and analyzed mathematically. |
| shape | The overall form or pattern of a distribution, including characteristics like skewness and modality. |
| skewed left | A distribution with a longer tail extending to the left, where the mean is typically less than the median. |
| skewed right | A distribution with a longer tail extending to the right, where the mean is typically greater than the median. |
| symmetric | A distribution where the left half is the mirror image of the right half. |
| uniform | A distribution where each bar height is approximately the same with no prominent peaks. |
| unimodal | A distribution with one main peak. |
| variability | The spread or dispersion of data values in a distribution. |
| Term | Definition |
|---|---|
| first quartile | The median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls. |
| interquartile range | A measure of variability calculated as the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of data. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| measures of center | Numerical summaries that describe the central tendency of a data set, including the mean and median. |
| measures of position | Numerical summaries that describe the location of data values within a distribution, including quartiles and percentiles. |
| measures of variability | Statistical measures that describe how spread out or dispersed data values are in a distribution. |
| median | The middle value when data are ordered; for an even number of data points, typically the average of the two middle values. |
| nonresistant | A characteristic of a statistic that is significantly affected or influenced by outliers; also called non-robust. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| percentile | A value such that p% of the data is less than or equal to it, used to describe the position of a data point within a distribution. |
| Q1 | The first quartile; the value below which 25% of the data falls. |
| Q3 | The third quartile; the value below which 75% of the data falls. |
| quartile | A value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values. |
| range | A measure of variability calculated as the difference between the maximum and minimum data values in a dataset. |
| resistant | A characteristic of a statistic that is not greatly affected by outliers; also called robust. |
| sample standard deviation | The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²). |
| sample variance | The square of the sample standard deviation, denoted by s², representing variability in squared units. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| statistic | Numerical summaries or measures calculated from sample data, such as mean, median, or standard deviation. |
| third quartile | The median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls. |
| Term | Definition |
|---|---|
| boxplot | A graphical representation of the five-number summary showing the distribution of data through a box and whiskers. |
| first quartile | The median of the lower half of an ordered data set, denoted as Q1, marking the boundary below which 25% of the data falls. |
| five-number summary | A set of five values that describe a dataset: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. |
| maximum | The largest value in a dataset. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| median | The middle value when data are ordered; for an even number of data points, typically the average of the two middle values. |
| minimum | The smallest value in a dataset. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| quantitative data | Data that consists of numerical values that can be measured and analyzed mathematically. |
| quartile | A value that divides an ordered data set into four equal parts; Q1 and Q3 form the boundaries for the middle 50% of values. |
| skewed left | A distribution with a longer tail extending to the left, where the mean is typically less than the median. |
| skewed right | A distribution with a longer tail extending to the right, where the mean is typically greater than the median. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| symmetric distribution | A distribution where data is evenly distributed around the center, with the mean and median approximately equal. |
| third quartile | The median of the upper half of an ordered data set, denoted as Q3, marking the boundary below which 75% of the data falls. |
| whiskers | Lines extending from the ends of a boxplot that reach to the most extreme data points that are not outliers. |
| Term | Definition |
|---|---|
| center | A measure indicating the middle or typical value of a distribution. |
| cluster | Concentrations of data usually separated by gaps in a distribution. |
| gap | Regions of a distribution between two data values where there are no observed data. |
| graphical representations | Visual displays such as bar charts, pie charts, or other graphs used to present data in a visual format. |
| histogram | A graph where the height of each bar represents the number or proportion of observations within an interval, with the ability to alter interval widths to change the appearance. |
| independent samples | Two or more separate groups of data where the values in one group do not influence or depend on the values in another group. |
| mean | The average value of a dataset, represented by μ in the context of a population. |
| outlier | Data points that are unusually small or large relative to the rest of the data. |
| relative frequency | The proportion of observations in a category, expressed as a decimal, fraction, or percentage of the total. |
| side-by-side boxplots | A graphical representation that displays multiple boxplots arranged next to each other to compare the distributions of different groups or samples. |
| standard deviation | A measure of how spread out data values are from the mean, represented by σ in the context of a population. |
| summary statistics | Numerical measures that describe key features of a dataset, such as center, spread, and shape. |
| variability | The spread or dispersion of data values in a distribution. |