Fiveable
Fiveable

or

Log in

Find what you need to study


Light

Find what you need to study

8.6 Carrying Out a Chi-Square Test for Homogeneity or Independence

3 min readjanuary 7, 2023

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F8-ZrRrRA3LC38j.png?alt=media&token=abf66b1a-bf3e-4c46-8a28-ea6384a58182

Source: Aponia Data

Test, Test, Test!

Now that we have chosen the correct test, checked our necessary conditions, and written our hypotheses for our test, it is now time to actually carry out the test! As with our GOF test, the test will consist of two mathematical elements: the test statistic (χ2 statistic) and our p-value. 😳

Test Statistic

The next step is to calculate the test statistic, which in this case is the chi-squared statistic. This is done by comparing the observed frequencies in the to the expected frequencies, which are calculated based on the assumption that the null hypothesis is true. The formula for the chi-squared statistic is: 🪑

χ2 = ∑ (O - E)^2 / E,

where O is the observed frequency and E is the expected frequency. The sum is taken over all cells in the .

The formula for our χ2 value can also be found using the formula on the formula sheet given for the exam. A much easier way of finding the test statistic is to use our graphing calculator. 📱

Degrees of Freedom

Our degrees of freedom are found by taking the number of rows and subtracting 1 and multiplying by the number of columns minus 1. 🔐

(number of rows - 1) * (number of columns - 1)

Let's say that you have a with 3 rows and 4 columns. To find the degrees of freedom, you would first subtract 1 from the number of rows to get 3 - 1 = 2. Then, you would subtract 1 from the number of columns to get 4 - 1 = 3. Finally, you would multiply these two values together to get the degrees of freedom, which in this case would be 2 * 3 = 6.

Hence, the degrees of freedom for a with 3 rows and 4 columns would be 6.

P-Value

Once you finally get your χ2 value, you calculate your p-value by finding the probability of getting that particular χ2 by random chance. As always, if our p is low, we reject the H0. 🅿️

As mentioned above, the best way of doing all of this together is using your graphing calculator device and performing the χ2 GOF test. Just be sure to write out your χ2 value and your p-value from your calculator output. 

Conclusion

Just as we concluded hypothesis tests in previous units, we must compare our p-value from our calculator to a given ɑ value. If it is less than our alpha, we conclude that we reject the H0 and have convincing evidence of the Ha. Otherwise, we fail to reject the null and do not have convincing evidence of the Ha. Remember two things: ❗

  1. Never “accept” anything!

  2. Include context!

"Since our p-value (~0) is less than 0.05, we reject the null hypothesis. We have convincing evidence that at least one of the proportions for how people rank on the happiness scale is incorrect."

Template 

  • First part -- "Since our (p value) is </> 0.05, we reject/fail to reject our null."

  • Second part:

    • -- "We have/do not have convincing evidence that there is an association between variable x and y in our intended population."

    • -- "We have/do not have convincing evidence that the distribution of categorical variable x is different between population x and population y."

🎥  Watch: AP Stats Unit 8 - Chi Squared Tests

Key Terms to Review (6)

Alpha (ɑ)

: Alpha (ɑ) refers to the significance level used in statistical tests. It represents the probability of making a Type I error by rejecting the null hypothesis when it is actually true.

Alternative Hypothesis (Ha)

: The alternative hypothesis, denoted as Ha, is a statement that contradicts or challenges the null hypothesis. It suggests that there is a significant relationship or difference between variables being studied.

Contingency Table

: A contingency table is a table that displays the frequencies or counts of two categorical variables. It shows how the categories of one variable are distributed across the categories of another variable.

Null Hypothesis (H0)

: The null hypothesis is a statement that assumes there is no significant difference or relationship between variables in a statistical analysis.

Test for Homogeneity

: A statistical test used to compare the distributions of multiple groups or populations based on one categorical variable. It determines whether the proportions within each group are similar or significantly different.

Test for Independence

: A statistical test used to determine if there is a relationship between two categorical variables. It assesses whether the occurrence of one variable is independent of the occurrence of another variable.

8.6 Carrying Out a Chi-Square Test for Homogeneity or Independence

3 min readjanuary 7, 2023

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F8-ZrRrRA3LC38j.png?alt=media&token=abf66b1a-bf3e-4c46-8a28-ea6384a58182

Source: Aponia Data

Test, Test, Test!

Now that we have chosen the correct test, checked our necessary conditions, and written our hypotheses for our test, it is now time to actually carry out the test! As with our GOF test, the test will consist of two mathematical elements: the test statistic (χ2 statistic) and our p-value. 😳

Test Statistic

The next step is to calculate the test statistic, which in this case is the chi-squared statistic. This is done by comparing the observed frequencies in the to the expected frequencies, which are calculated based on the assumption that the null hypothesis is true. The formula for the chi-squared statistic is: 🪑

χ2 = ∑ (O - E)^2 / E,

where O is the observed frequency and E is the expected frequency. The sum is taken over all cells in the .

The formula for our χ2 value can also be found using the formula on the formula sheet given for the exam. A much easier way of finding the test statistic is to use our graphing calculator. 📱

Degrees of Freedom

Our degrees of freedom are found by taking the number of rows and subtracting 1 and multiplying by the number of columns minus 1. 🔐

(number of rows - 1) * (number of columns - 1)

Let's say that you have a with 3 rows and 4 columns. To find the degrees of freedom, you would first subtract 1 from the number of rows to get 3 - 1 = 2. Then, you would subtract 1 from the number of columns to get 4 - 1 = 3. Finally, you would multiply these two values together to get the degrees of freedom, which in this case would be 2 * 3 = 6.

Hence, the degrees of freedom for a with 3 rows and 4 columns would be 6.

P-Value

Once you finally get your χ2 value, you calculate your p-value by finding the probability of getting that particular χ2 by random chance. As always, if our p is low, we reject the H0. 🅿️

As mentioned above, the best way of doing all of this together is using your graphing calculator device and performing the χ2 GOF test. Just be sure to write out your χ2 value and your p-value from your calculator output. 

Conclusion

Just as we concluded hypothesis tests in previous units, we must compare our p-value from our calculator to a given ɑ value. If it is less than our alpha, we conclude that we reject the H0 and have convincing evidence of the Ha. Otherwise, we fail to reject the null and do not have convincing evidence of the Ha. Remember two things: ❗

  1. Never “accept” anything!

  2. Include context!

"Since our p-value (~0) is less than 0.05, we reject the null hypothesis. We have convincing evidence that at least one of the proportions for how people rank on the happiness scale is incorrect."

Template 

  • First part -- "Since our (p value) is </> 0.05, we reject/fail to reject our null."

  • Second part:

    • -- "We have/do not have convincing evidence that there is an association between variable x and y in our intended population."

    • -- "We have/do not have convincing evidence that the distribution of categorical variable x is different between population x and population y."

🎥  Watch: AP Stats Unit 8 - Chi Squared Tests

Key Terms to Review (6)

Alpha (ɑ)

: Alpha (ɑ) refers to the significance level used in statistical tests. It represents the probability of making a Type I error by rejecting the null hypothesis when it is actually true.

Alternative Hypothesis (Ha)

: The alternative hypothesis, denoted as Ha, is a statement that contradicts or challenges the null hypothesis. It suggests that there is a significant relationship or difference between variables being studied.

Contingency Table

: A contingency table is a table that displays the frequencies or counts of two categorical variables. It shows how the categories of one variable are distributed across the categories of another variable.

Null Hypothesis (H0)

: The null hypothesis is a statement that assumes there is no significant difference or relationship between variables in a statistical analysis.

Test for Homogeneity

: A statistical test used to compare the distributions of multiple groups or populations based on one categorical variable. It determines whether the proportions within each group are similar or significantly different.

Test for Independence

: A statistical test used to determine if there is a relationship between two categorical variables. It assesses whether the occurrence of one variable is independent of the occurrence of another variable.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.