Fiveable

❤️‍🩹Intro to Public Health Unit 4 Review

QR code for Intro to Public Health practice questions

4.1 Descriptive and Inferential Statistics

❤️‍🩹Intro to Public Health
Unit 4 Review

4.1 Descriptive and Inferential Statistics

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
❤️‍🩹Intro to Public Health
Unit & Topic Study Guides

Biostatistics and data analysis are crucial tools in public health. They help researchers make sense of health data, identify trends, and draw meaningful conclusions. Descriptive and inferential statistics are two key approaches used to analyze and interpret public health information.

Descriptive statistics summarize data, showing patterns and characteristics. Inferential statistics use sample data to make predictions about larger populations. Both are essential for understanding health trends, evaluating interventions, and informing policy decisions in public health.

Descriptive vs Inferential Statistics

Characteristics and Applications

  • Descriptive statistics summarize main features of datasets including measures of central tendency, variability, and distribution shapes
  • Inferential statistics use sample data to make predictions about larger populations involving hypothesis testing and parameter estimation
  • Public health research employs both to analyze health trends, evaluate interventions, and inform policy decisions
  • Descriptive statistics explore data initially and present findings comprehensibly to stakeholders and public (health department reports, disease prevalence)
  • Inferential statistics draw conclusions about population health from limited samples, accounting for uncertainty (vaccine efficacy trials, risk factor studies)

Choosing Between Descriptive and Inferential Methods

  • Choice depends on research question, study design, and available data in public health investigations
  • Descriptive statistics suit exploratory analysis and summarizing known information (describing obesity rates)
  • Inferential statistics allow generalizing findings beyond immediate data (estimating population-wide disease prevalence from a sample)
  • Combining both provides comprehensive understanding from data exploration to population-level conclusions
  • Descriptive precedes inferential analysis, informing appropriate inferential methods (identifying skewed distributions to select non-parametric tests)

Measures of Central Tendency and Variability

Central Tendency Measures

  • Include mean, median, and mode providing insights into typical or average values
  • Arithmetic mean calculated by summing all values and dividing by number of observations
    • Sensitive to extreme values or outliers
    • Formula: xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}
  • Median represents middle value in ordered dataset
    • Less affected by outliers, useful for skewed distributions (income data, hospital length of stay)
  • Mode identifies most frequent value in dataset
    • Useful for categorical data or discrete numeric data (most common disease in a clinic)

Variability Measures

  • Quantify spread or dispersion of data points including range, variance, and standard deviation
  • Range calculated as difference between maximum and minimum values
    • Simple but sensitive to outliers
  • Variance measures average squared deviation from mean
    • Formula: s2=i=1n(xixˉ)2n1s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}
  • Standard deviation is square root of variance
    • Provides measure of average deviation from mean in original units (blood pressure variability)
    • Formula: s=i=1n(xixˉ)2n1s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}
  • Coefficient of variation (CV) allows comparison of variability between datasets with different units or scales
    • Calculated as ratio of standard deviation to mean
    • Formula: CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Sampling and Population Parameters

Sampling Concepts

  • Sampling selects subset of individuals from larger population to make inferences
  • Population parameters are fixed, unknown values describing entire population characteristics (true population mean μ, proportion π)
  • Sample statistics estimate corresponding population parameters (sample mean x̄, sample proportion p)
  • Central Limit Theorem states sampling distribution of mean approaches normal distribution as sample size increases
    • Applies regardless of population distribution shape
    • Enables use of parametric tests for large samples (n > 30)

Estimation and Error

  • Standard error quantifies variability of sample statistic
    • Decreases as sample size increases, improving precision of estimates
    • Formula for standard error of mean: SExˉ=snSE_{\bar{x}} = \frac{s}{\sqrt{n}}
  • Confidence intervals provide range of plausible values for population parameters
    • Width reflects precision of estimate (narrower intervals indicate greater precision)
    • 95% confidence interval for mean: xˉ±(1.96×SExˉ)\bar{x} \pm (1.96 \times SE_{\bar{x}})
  • Sampling error is difference between sample statistic and true population parameter
    • Inherent limitation in statistical inference
    • Considered when interpreting results (margin of error in polls)

Statistical Methods for Public Health Data

Categorical Data Analysis

  • Chi-square tests analyze categorical data
    • Compare observed frequencies to expected frequencies (goodness-of-fit test)
    • Examine associations between variables (test of independence)
    • Example: Analyzing relationship between smoking status and lung cancer incidence
  • Fisher's exact test used for small sample sizes or sparse data
    • Provides exact p-values for 2x2 contingency tables
    • Useful in rare disease studies or small clinical trials

Comparing Groups and Relationships

  • T-tests compare means between two groups (independent samples t-test, paired t-test)
    • Example: Comparing blood pressure between treatment and control groups
  • ANOVA (Analysis of Variance) compares means among three or more groups
    • One-way ANOVA for single factor, two-way ANOVA for two factors
    • Example: Comparing BMI across different age groups and genders
  • Correlation analysis measures strength and direction of linear relationships between continuous variables
    • Pearson's r common for normally distributed data
    • Spearman's rho for non-parametric or ordinal data
    • Example: Correlation between air pollution levels and respiratory disease rates

Regression Analysis

  • Simple linear regression predicts continuous outcome based on one predictor variable
    • Formula: y=β0+β1x+ϵy = \beta_0 + \beta_1x + \epsilon
  • Multiple linear regression uses multiple predictors
    • Formula: y=β0+β1x1+β2x2+...+βkxk+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon
  • Logistic regression applied when outcome is binary
    • Calculates odds ratios for risk factors
    • Example: Predicting likelihood of heart disease based on risk factors

Advanced Techniques

  • Survival analysis analyzes time-to-event data
    • Kaplan-Meier curves estimate survival probabilities
    • Cox proportional hazards models assess effect of variables on survival
    • Example: Analyzing time to cancer recurrence after different treatments
  • Non-parametric tests used when data violate parametric assumptions
    • Mann-Whitney U test (alternative to independent t-test)
    • Kruskal-Wallis test (alternative to one-way ANOVA)
    • Wilcoxon signed-rank test (alternative to paired t-test)
    • Example: Comparing patient satisfaction scores across different hospitals