| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| variation | Differences in data that occur by chance due to the random nature of sampling, rather than from systematic causes. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| chi-square distributions | Probability distributions used to test the goodness of fit between observed and expected categorical data, characterized by positive values and right skewness. |
| chi-square statistic | A test statistic that measures the distance between observed and expected counts relative to the expected counts. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| distribution of proportions | The way in which proportions are spread across the categories of a categorical variable. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| goodness of fit | A statistical test that determines how well observed data match the expected distribution specified by a hypothesis. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| null proportion | The hypothesized proportion for each category under the null hypothesis in a chi-square goodness of fit test. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| sample size | The number of observations or data points collected in a sample, denoted as n. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| Term | Definition |
|---|---|
| chi-square distribution | A probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| null distribution | The probability distribution of the test statistic under the assumption that the null hypothesis is true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| probability model | A mathematical framework that describes the probability distribution of outcomes under specified assumptions. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| significance test | A statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| theoretical distribution | A probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic. |
| Term | Definition |
|---|---|
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
| association | The relationship between two variables where knowing the value of one variable provides information about the other variable. |
| categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
| categorical variable | A variable that takes on values that are category names or group labels rather than numerical values. |
| chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
| chi-square test for homogeneity | A statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments. |
| chi-square test for independence | A statistical test used to determine whether two categorical variables in a population are associated or independent. |
| distribution | The pattern of how data values are spread or arranged across a range. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| homogeneity | In a chi-square test, the condition where the distribution of a categorical variable is the same across different groups or populations. |
| independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
| randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
| row and column variables | The two categorical variables displayed in a two-way table, with one variable defining the rows and the other defining the columns. |
| sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
| simple random sample | A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen. |
| statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
| stratified random sample | A sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |
| Term | Definition |
|---|---|
| chi-square distribution | A probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics. |
| chi-square statistic | A test statistic that measures the distance between observed and expected counts relative to the expected counts. |
| chi-square test for homogeneity | A statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments. |
| chi-square test for independence | A statistical test used to determine whether two categorical variables in a population are associated or independent. |
| degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
| expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
| null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
| observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
| p-value | The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true. |
| probability model | A mathematical framework that describes the probability distribution of outcomes under specified assumptions. |
| reject the null hypothesis | The decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis. |
| research question | The specific question about a population or populations that a statistical test is designed to answer. |
| significance level | The threshold probability (α) used to determine whether to reject the null hypothesis in a significance test. |
| test statistic | A calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data. |
| two-way table | A table that displays the frequency distribution of two categorical variables, organized in rows and columns. |