Fiveable

🎳Intro to Econometrics Unit 5 Review

QR code for Intro to Econometrics practice questions

5.3 Chi-square tests

5.3 Chi-square tests

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎳Intro to Econometrics
Unit & Topic Study Guides

Chi-square test overview

Chi-square tests are non-parametric statistical tests used to analyze categorical data. They compare observed frequencies (what you actually count in your data) to expected frequencies (what you'd expect if the null hypothesis were true). The gap between those two sets of numbers drives the entire test.

In econometrics, chi-square tests show up in three main contexts:

  • Goodness of fit: Does your data match a hypothesized distribution?
  • Independence: Are two categorical variables related or unrelated?
  • Homogeneity: Do different populations share the same distribution of a categorical variable?

Hypothesis testing with chi-square

Like any hypothesis test, you start by setting up two competing claims:

  • The null hypothesis (H0H_0) states there's no significant association or difference.
  • The alternative hypothesis (HaH_a) states there is a significant association or difference.

You then calculate a test statistic from your data, and either compare it to a critical value from the chi-square distribution table or look at the p-value. If the test statistic is large enough (or the p-value small enough), you reject H0H_0.

Chi-square distribution properties

The chi-square distribution is a continuous probability distribution that arises from summing squared standard normal random variables. A few key features:

  • It's always right-skewed and non-negative (values range from 0 to infinity).
  • Its shape depends entirely on the degrees of freedom (df). With low df, the distribution is heavily skewed right. As df increases, it becomes more symmetric and starts to resemble a normal distribution.
  • You'll use chi-square distribution tables (or software) to find critical values for your tests.

Degrees of freedom in chi-square

Degrees of freedom (df) represent the number of independent pieces of information that go into calculating the test statistic. The formula depends on which test you're running:

  • Goodness of fit: df=k1df = k - 1, where kk is the number of categories.
  • Independence or homogeneity (contingency table): df=(r1)×(c1)df = (r - 1) \times (c - 1), where rr is the number of rows and cc is the number of columns.

The degrees of freedom determine which chi-square distribution you compare your test statistic against, so getting this right is essential.

Chi-square goodness of fit test

The goodness of fit test checks whether the observed frequency distribution of a single categorical variable matches a hypothesized theoretical distribution (uniform, normal, Poisson, etc.). You're asking: "Could my data have plausibly come from this distribution?"

Observed vs expected frequencies

  • Observed frequencies (OiO_i) are the actual counts you collect from your data for each category.
  • Expected frequencies (EiE_i) are what you'd predict for each category if the hypothesized distribution were correct.

To calculate expected frequencies, multiply the total sample size by the probability of each category under the hypothesized distribution:

Ei=N×piE_i = N \times p_i

where NN is the total sample size and pip_i is the hypothesized probability for category ii.

Calculating the chi-square statistic

The test statistic measures how far your observed data deviates from what's expected:

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

Here's the step-by-step process:

  1. For each category, subtract the expected frequency from the observed frequency: (OiEi)(O_i - E_i).

  2. Square that difference: (OiEi)2(O_i - E_i)^2.

  3. Divide by the expected frequency: (OiEi)2Ei\frac{(O_i - E_i)^2}{E_i}.

  4. Sum across all kk categories.

A larger χ2\chi^2 value means a bigger discrepancy between what you observed and what the hypothesized distribution predicts.

Interpreting the p-value

The p-value tells you the probability of getting a chi-square statistic at least as large as yours, assuming H0H_0 is true.

  • p-value < 0.05 (at the 5% significance level): Reject H0H_0. Your data doesn't fit the hypothesized distribution.
  • p-value ≥ 0.05: Fail to reject H0H_0. Your data is consistent with the hypothesized distribution.

Note that failing to reject doesn't prove the distribution is correct. It just means you don't have enough evidence to say it's wrong.

Limitations of goodness of fit test

  • Expected frequency rule: Each category should have an expected frequency of at least 5. If not, consider combining categories or using an alternative test.
  • No direction or magnitude info: The test tells you that the fit is poor, not where or how much it's off in a meaningful way.
  • Sensitive to category choices: How you define your categories can change the result. Arbitrary binning decisions matter.
  • Alternatives for small samples: Fisher's exact test or the likelihood ratio test may be more reliable when expected frequencies are low.

Chi-square test for independence

This test determines whether two categorical variables are independent or associated. For example, you might ask: "Is there a relationship between a consumer's income bracket and their preferred payment method?"

Contingency tables for categorical data

A contingency table (also called a cross-tabulation) organizes the data for two categorical variables. Rows represent one variable's categories, columns represent the other's, and each cell contains the observed count for that combination.

The marginal totals (row sums and column sums) along the edges are critical because you'll use them to calculate expected frequencies.

Null vs alternative hypotheses

  • H0H_0: The two variables are independent. Knowing the category of one variable tells you nothing about the other.
  • HaH_a: The two variables are dependent (associated). The distribution of one variable changes depending on the category of the other.

Assumptions of the test

Before running the test, verify these conditions:

  • The sample is randomly selected from the population.
  • Observations are independent of each other (one observation doesn't influence another).
  • Expected frequencies in each cell should be at least 5. If this isn't met, consider Fisher's exact test or collapse categories.

Calculating expected frequencies

If the two variables truly are independent, the expected frequency for any cell is:

Eij=Ri×CjNE_{ij} = \frac{R_i \times C_j}{N}

where RiR_i is the total for row ii, CjC_j is the total for column jj, and NN is the grand total.

The logic: under independence, the proportion in any cell should just reflect the product of the marginal proportions.

Computing the chi-square statistic

The formula extends naturally from the goodness of fit version, but now you sum over every cell in the table:

χ2=i=1rj=1c(OijEij)2Eij\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Steps:

  1. Calculate the expected frequency for each cell using the formula above.

  2. For each cell, compute (OijEij)2/Eij(O_{ij} - E_{ij})^2 / E_{ij}.

  3. Sum all those values across every cell in the table.

Determining the critical value

The critical value comes from the chi-square distribution table, using:

  • Degrees of freedom: df=(r1)×(c1)df = (r - 1) \times (c - 1)
  • Significance level: typically α=0.05\alpha = 0.05

If your calculated χ2\chi^2 exceeds the critical value, you reject H0H_0 and conclude the variables are associated.

Hypothesis testing with chi-square, Why It Matters: Chi-Square Tests | Concepts in Statistics

Making decisions based on p-value

You can also use the p-value approach:

  • p-value < α\alpha: Reject H0H_0. There's a statistically significant association between the variables.
  • p-value ≥ α\alpha: Fail to reject H0H_0. You don't have sufficient evidence of an association.

Both the critical value approach and the p-value approach will always give you the same conclusion. Use whichever your course emphasizes.

Chi-square test for homogeneity

The homogeneity test compares the distribution of a single categorical variable across two or more separate populations. For instance: "Do consumers in different age groups have the same distribution of brand preferences?"

Comparing multiple populations

The data is organized in a contingency table just like the independence test. The difference is conceptual: here, each column (or row) represents a distinct population that was sampled separately, and you're comparing their distributions.

Null vs alternative hypotheses

  • H0H_0: All populations have the same distribution of the categorical variable.
  • HaH_a: At least one population has a different distribution.

Calculating the test statistic

The mechanics are identical to the independence test:

  1. Compute expected frequencies: Eij=Ri×CjNE_{ij} = \frac{R_i \times C_j}{N}

  2. Calculate χ2=(OijEij)2Eij\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

  3. Degrees of freedom: df=(r1)×(c1)df = (r - 1) \times (c - 1)

The formulas are the same. What changes is the research question and how the data was collected (separate samples from distinct populations vs. one sample classified on two variables).

Interpreting the results

  • If χ2\chi^2 exceeds the critical value (or p-value < 0.05), reject H0H_0. The populations don't share the same distribution.
  • If you fail to reject, the data is consistent with the populations being homogeneous.

Rejecting H0H_0 tells you the distributions differ, but it doesn't tell you which populations differ from each other. Follow-up pairwise comparisons may be needed.

Applications of chi-square tests

Market research and consumer preferences

Chi-square tests help researchers determine whether consumer preferences differ across demographic groups. For example, a test for independence could examine whether product choice is associated with age group. A test for homogeneity could compare brand preference distributions across income levels to guide marketing strategy.

Quality control and defect analysis

In manufacturing, the goodness of fit test can check whether defect counts follow a Poisson distribution, which would suggest the process is stable. The independence test can explore whether defect types are related to production factors like shift or machine, helping pinpoint quality problems.

Demographic and social science research

Researchers use chi-square tests to study relationships between categorical variables like education level and employment status. The homogeneity test can compare characteristics across populations (urban vs. rural, different regions) to identify disparities that inform policy.

Limitations and alternatives to chi-square

Small sample size and low expected frequencies

The chi-square approximation breaks down when expected cell frequencies are too small (below 5). Results become unreliable because the test statistic no longer follows the chi-square distribution closely. When this happens, you need an alternative approach.

Fisher's exact test for small samples

Fisher's exact test calculates the exact probability of observing your data (or something more extreme) under H0H_0, rather than relying on the chi-square approximation. It's most commonly applied to 2×22 \times 2 contingency tables with small samples. The trade-off: it can be computationally intensive for larger tables, but it gives accurate results when the chi-square test can't.

Yates' correction for continuity

Yates' correction adjusts for the fact that the chi-square distribution is continuous while your data is discrete. It subtracts 0.5 from the absolute difference between observed and expected frequencies before squaring:

χYates2=(OiEi0.5)2Ei\chi^2_{Yates} = \sum \frac{(|O_i - E_i| - 0.5)^2}{E_i}

This correction is typically applied to 2×22 \times 2 tables when sample sizes are moderate. Be aware that it can be overly conservative, making it harder to reject H0H_0.

Likelihood ratio tests as an alternative

The likelihood ratio test (LRT) compares the likelihood of the data under H0H_0 to the likelihood under HaH_a. The test statistic is:

G2=2Oijln(OijEij)G^2 = 2 \sum O_{ij} \ln\left(\frac{O_{ij}}{E_{ij}}\right)

LRT tends to perform better than the standard chi-square test when sample sizes are small or data is sparse. It follows the same chi-square distribution asymptotically, so you interpret it the same way (compare to a critical value or use a p-value). The downside is that it can require more computational effort.