Statistical Inference

🎣Statistical Inference Unit 7 – Hypothesis Testing: Principles & Single Tests

Hypothesis testing is a statistical method used to make decisions about populations based on sample data. It involves formulating null and alternative hypotheses, collecting data, and calculating test statistics to determine whether to reject or fail to reject the null hypothesis. Key concepts include p-values, significance levels, and types of errors. The process involves stating hypotheses, choosing a test statistic, collecting data, determining p-values, and interpreting results. Various types of tests are used depending on the research question and data characteristics.

What's the Big Idea?

  • Hypothesis testing is a statistical method used to make decisions or draw conclusions about a population based on sample data
  • Involves formulating a null hypothesis (H0H_0) and an alternative hypothesis (HaH_a) about a population parameter
  • Collect sample data and calculate a test statistic to determine whether to reject or fail to reject the null hypothesis
  • The decision is based on the probability (p-value) of observing the sample data assuming the null hypothesis is true
  • Hypothesis testing allows researchers to make evidence-based decisions in various fields (psychology, medicine, business)
  • The significance level (α\alpha) is the probability of rejecting the null hypothesis when it is actually true (Type I error)
    • Commonly set at 0.05, meaning a 5% chance of making a Type I error
  • The power of a test is the probability of rejecting the null hypothesis when the alternative hypothesis is true (1 - Type II error)

Key Concepts You Need to Know

  • Null hypothesis (H0H_0): A statement of no effect or no difference, assumed to be true unless evidence suggests otherwise
  • Alternative hypothesis (HaH_a): A statement that contradicts the null hypothesis, representing the researcher's claim or theory
  • Test statistic: A value calculated from the sample data used to determine whether to reject the null hypothesis (e.g., z-score, t-score, chi-square)
  • P-value: The probability of observing the sample data or more extreme results, assuming the null hypothesis is true
  • Significance level (α\alpha): The predetermined probability threshold for rejecting the null hypothesis, typically set at 0.05
  • Type I error: Rejecting the null hypothesis when it is actually true (false positive)
  • Type II error: Failing to reject the null hypothesis when it is actually false (false negative)
  • One-tailed test: A hypothesis test where the alternative hypothesis specifies a direction (greater than or less than)
  • Two-tailed test: A hypothesis test where the alternative hypothesis does not specify a direction (not equal to)

The Hypothesis Testing Process

  1. State the null and alternative hypotheses based on the research question or problem
  2. Choose an appropriate test statistic and significance level (α\alpha)
  3. Collect sample data and calculate the test statistic
  4. Determine the p-value associated with the test statistic
  5. Compare the p-value to the significance level (α\alpha)
    • If p-value ≤ α\alpha, reject the null hypothesis in favor of the alternative hypothesis
    • If p-value > α\alpha, fail to reject the null hypothesis
  6. Interpret the results in the context of the research question or problem
  7. Consider the limitations and potential sources of error in the study

Types of Hypotheses

  • One-sample hypothesis: Tests whether a population parameter (mean, proportion) differs from a specified value
    • Example: Testing if the average height of a population differs from 170 cm
  • Two-sample hypothesis: Compares two population parameters to determine if they are significantly different
    • Example: Comparing the mean test scores of two different teaching methods
  • Paired-sample hypothesis: Tests the difference between two related or dependent samples
    • Example: Measuring blood pressure before and after a treatment for the same group of patients
  • ANOVA (Analysis of Variance): Tests the difference between three or more population means
    • Example: Comparing the average yield of four different fertilizer treatments
  • Chi-square test: Tests the association between two categorical variables
    • Example: Determining if there is a relationship between gender and political party affiliation

Common Test Statistics

  • Z-test: Used for testing hypotheses about population means or proportions when the sample size is large or the population standard deviation is known
  • T-test: Used for testing hypotheses about population means when the sample size is small and the population standard deviation is unknown
    • One-sample t-test: Tests if a sample mean differs from a hypothesized population mean
    • Independent samples t-test: Compares the means of two independent groups
    • Paired samples t-test: Compares the means of two related or dependent groups
  • Chi-square test: Used for testing the association between two categorical variables
    • Goodness-of-fit test: Compares observed frequencies to expected frequencies for a single categorical variable
    • Test of independence: Determines if two categorical variables are independent or associated
  • F-test (ANOVA): Used for comparing the means of three or more groups or treatments

Interpreting Results

  • If the p-value is less than or equal to the significance level (α\alpha), reject the null hypothesis
    • Conclude that there is sufficient evidence to support the alternative hypothesis
    • Example: If p-value ≤ 0.05, conclude that there is a significant difference between the groups
  • If the p-value is greater than the significance level (α\alpha), fail to reject the null hypothesis
    • Conclude that there is not enough evidence to support the alternative hypothesis
    • Example: If p-value > 0.05, conclude that there is no significant difference between the groups
  • Confidence intervals can be used to estimate the range of plausible values for the population parameter
    • A 95% confidence interval means that if the study were repeated many times, 95% of the intervals would contain the true population parameter
  • Effect size measures the magnitude of the difference or relationship between variables
    • Examples: Cohen's d, Pearson's r, eta-squared

Real-World Applications

  • Medical research: Testing the effectiveness of a new drug compared to a placebo
  • Psychology: Comparing the mean scores of two therapy techniques on reducing anxiety
  • Business: Determining if a new marketing campaign significantly increases sales
  • Education: Testing if a new teaching method improves student performance compared to traditional methods
  • Environmental science: Comparing the average pollution levels between two cities
  • Quality control: Testing if the proportion of defective products exceeds a specified threshold
  • Market research: Determining if there is an association between age and product preference

Potential Pitfalls and Limitations

  • Sampling bias: When the sample is not representative of the population, leading to inaccurate conclusions
  • Type I error (false positive): Rejecting the null hypothesis when it is actually true
    • Can be reduced by decreasing the significance level (α\alpha), but this may increase the risk of Type II error
  • Type II error (false negative): Failing to reject the null hypothesis when it is actually false
    • Can be reduced by increasing the sample size or using a more powerful test
  • Violation of assumptions: Most hypothesis tests rely on certain assumptions about the data (normality, homogeneity of variance)
    • Violations can lead to invalid results and conclusions
  • Multiple testing: Conducting many hypothesis tests on the same data increases the likelihood of making a Type I error
    • Bonferroni correction or other methods can be used to adjust the significance level for multiple comparisons
  • Practical significance vs. statistical significance: A statistically significant result may not be practically meaningful or important
    • Consider the effect size and real-world implications of the findings


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.