๐Ÿฆ Epidemiology

Epidemiological Study Designs

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Understanding study designs is foundational to everything you'll encounter in epidemiology. You can't interpret a research finding, evaluate an intervention, or critique a public health policy without knowing how the evidence was generated. Every study design comes with trade-offs between internal validity (how confident you can be about causation), external validity (how well findings generalize), and feasibility (time, cost, and ethical constraints). These trade-offs appear constantly on exams, especially when you're asked to recommend an appropriate design for a given research question.

You're being tested on your ability to match study designs to research scenarios, identify potential biases, and interpret the appropriate measures of association (relative risk, odds ratios, prevalence). Don't just memorize definitions. Know what each design can and cannot tell you about causation, which biases threaten each design, and when one approach is preferred over another.


Experimental Designs: Testing Interventions

These designs involve researcher-controlled manipulation of exposures or interventions. Because investigators assign participants to groups, experimental designs offer the strongest evidence for causation. But they come with significant ethical and practical constraints.

Randomized Controlled Trials (RCTs)

RCTs are the gold standard for causal inference. Random assignment distributes both known and unknown confounders equally across groups, so any difference in outcomes can be attributed to the intervention rather than pre-existing differences between participants.

  • Measures efficacy directly by comparing outcomes between intervention and control groups, allowing calculation of absolute risk reduction and relative risk reduction
  • Blinding (single, double, or triple) further strengthens internal validity by preventing knowledge of group assignment from influencing behavior or outcome assessment
  • Ethical limitations restrict use. You cannot randomize participants to harmful exposures (smoking, toxins), making RCTs unsuitable for many etiologic questions about disease causation

Quasi-Experimental Studies

These designs test interventions but lack randomization. Instead, they rely on pre-existing groups, natural experiments, or before-and-after comparisons when random assignment is infeasible or unethical.

  • Real-world applicability makes them essential for evaluating policy changes (e.g., the effect of a soda tax on obesity rates), community interventions, and program implementations
  • Common subtypes include interrupted time series, difference-in-differences, and regression discontinuity designs
  • Confounding is the major threat. Without randomization, systematic differences between groups may explain observed effects rather than the intervention itself. Researchers must address this through careful design choices or statistical adjustment

Compare: RCTs vs. Quasi-experimental studies: both test interventions, but RCTs eliminate confounding through randomization while quasi-experiments must address it through design or analysis. If an exam asks which design provides stronger causal evidence, RCT wins. If it asks what's practical for evaluating a new public health policy already being rolled out, quasi-experimental is your answer.


Observational Analytic Designs: Following Exposure to Outcome

These designs observe natural variation in exposures without manipulation. The key distinction is directionality: do you start with exposure status and follow forward, or start with disease status and look backward?

Cohort Studies

Cohort studies start with exposure and follow forward in time. You classify participants as exposed or unexposed, then watch both groups to see who develops the outcome. This establishes temporal sequence definitively, which is essential for causal inference.

  • Calculates incidence and relative risk directly. This is the only observational design that can measure true disease rates in exposed vs. unexposed populations. Relative risk (RR=incidenceย inย exposedincidenceย inย unexposedRR = \frac{\text{incidence in exposed}}{\text{incidence in unexposed}}) is the key measure of association
  • Can be prospective (enrolling participants now and following them into the future) or retrospective (using historical records to reconstruct a cohort from the past)
  • Resource-intensive and inefficient for rare diseases. You may need to follow thousands of participants for years to observe enough outcomes. The Framingham Heart Study, for example, has tracked participants since 1948

Case-Control Studies

Case-control studies work backward from disease. You identify people who already have the outcome (cases) and people who don't (controls), then compare their past exposure histories.

  • Efficient for rare outcomes. You can study diseases with incidence of 1 in 100,000 without needing massive sample sizes, because you're selecting cases directly rather than waiting for them to occur
  • Calculates odds ratios, not relative risk. Because you're sampling based on disease status, you can't calculate true incidence rates. The odds ratio (OR=aร—dbร—cOR = \frac{a \times d}{b \times c}) approximates relative risk when the disease is rare (the "rare disease assumption")
  • Vulnerable to recall bias and selection bias. Cases may remember or report exposures differently than controls, and controls may not accurately represent the source population from which cases arose

Nested Case-Control Studies

This is a hybrid design embedded within an existing cohort. As cases arise during cohort follow-up, controls are sampled from the same cohort members who were still at risk at the time each case occurred.

  • Eliminates recall bias because exposure data was collected prospectively, before anyone knew who would develop the disease
  • Cost-efficient for expensive biomarker or laboratory analyses, since you only need to process stored samples from cases and selected controls rather than the entire cohort
  • Maintains the advantages of the cohort framework (known source population, prospective data) while dramatically reducing costs

Compare: Cohort vs. Case-control: cohort studies follow exposure forward and calculate relative risk; case-control studies work backward from disease and calculate odds ratios. For rare diseases, case-control is practical. For rare exposures, cohort is preferred. Exam questions often ask you to justify design choice based on disease rarity, so keep this distinction sharp.


Descriptive and Hypothesis-Generating Designs

These designs describe patterns and generate hypotheses but cannot establish causation. They're the starting point of epidemiologic investigation, not the endpoint.

Cross-Sectional Studies

Cross-sectional studies are a snapshot design that measures exposure and outcome simultaneously in a defined population at a single point in time.

  • Useful for estimating prevalence (the proportion of a population with a condition at a given time), not incidence
  • Cannot establish temporality. Because exposure and outcome are measured together, you cannot determine which came first. Did depression cause unemployment, or did unemployment cause depression? A cross-sectional study can't answer that
  • Commonly used in national health surveys (NHANES, BRFSS) for population surveillance and health planning

Ecological Studies

Ecological studies use populations, not individuals, as the unit of analysis. They compare disease rates across countries, regions, or time periods using aggregate data.

  • Useful for generating hypotheses about environmental or policy-level factors. For example, comparing per-capita alcohol consumption and liver disease rates across countries
  • Quick and inexpensive because they often rely on existing data sources
  • Ecological fallacy is the critical limitation. Associations observed at the group level may not hold for individuals within those groups. A country with high fat consumption and high heart disease rates doesn't prove that the individuals eating more fat are the ones getting heart disease

Case Series and Case Reports

These provide detailed clinical documentation of individual cases without a comparison group.

  • Often the first signal of emerging diseases, adverse drug reactions, or unusual clinical presentations. Early reports of AIDS in the 1980s and vaping-related lung injury (EVALI) in 2019 were case series
  • Purely descriptive: they provide clinical detail but no measure of association or risk
  • Hypothesis-generating only. They identify patterns that require analytic studies (case-control, cohort) to confirm

Compare: Cross-sectional vs. Ecological studies: both are descriptive, but cross-sectional collects individual-level data while ecological uses population-level data. Cross-sectional can identify individual-level associations (though not causal ones); ecological cannot make individual-level inferences due to ecological fallacy.


Longitudinal Designs: Tracking Change Over Time

These designs follow participants over extended periods to capture temporal relationships and disease progression. The defining feature is repeated observation of the same individuals.

Longitudinal Studies

Longitudinal studies take repeated measurements on the same individuals over time. This captures within-person change, disease natural history, and long-term exposure effects.

  • Can be observational or experimental. "Longitudinal" describes the temporal structure (repeated measures over time), not the level of researcher control
  • Particularly valuable for studying developmental trajectories, aging, and chronic disease progression
  • Attrition threatens validity. Loss to follow-up introduces bias if dropouts differ systematically from those who remain. For example, if sicker participants drop out, the remaining sample will look healthier than the true population

Compare: Longitudinal vs. Cross-sectional: longitudinal follows individuals over time and can establish temporal sequence; cross-sectional captures one moment and cannot. If an exam scenario asks about tracking disease progression or determining whether exposure precedes outcome, longitudinal is required.


Evidence Synthesis: Combining Studies

These methods don't generate new data but systematically aggregate existing evidence. They sit at the top of the evidence hierarchy when done properly.

Systematic Reviews and Meta-Analyses

A systematic review uses a structured, reproducible protocol to identify, evaluate, and synthesize all relevant studies on a specific question. A meta-analysis goes one step further by pooling the data statistically.

  • Systematic reviews use predefined search strategies, inclusion/exclusion criteria, and quality assessment tools to minimize selection bias in evidence synthesis
  • Meta-analyses calculate an overall effect size with increased statistical precision and power beyond any single study. Results are often displayed in a forest plot, which shows individual study estimates and the pooled estimate
  • Publication bias threatens validity. Studies with positive or statistically significant results are more likely to be published, potentially skewing pooled estimates. Funnel plots and statistical tests (e.g., Egger's test) help detect this

Quick Reference Table

ConceptBest Examples
Strongest causal evidenceRCTs, Cohort studies
Efficient for rare diseasesCase-control, Nested case-control
Prevalence estimationCross-sectional studies
Hypothesis generationEcological studies, Case series
Policy/intervention evaluationQuasi-experimental, RCTs
Long-term exposure effectsLongitudinal, Cohort studies
Evidence synthesisMeta-analyses, Systematic reviews
First signal of emerging diseasesCase reports, Case series

Self-Check Questions

  1. A researcher wants to study risk factors for a rare childhood cancer. Which study design is most efficient, and what measure of association would be calculated?

  2. Compare cohort and case-control studies: What can cohort studies calculate that case-control studies cannot, and why?

  3. An FRQ describes a study comparing heart disease rates between countries with different dietary fat consumption. What type of study is this, and what major limitation threatens its conclusions?

  4. Why are RCTs considered the gold standard for causal inference, yet inappropriate for studying whether smoking causes lung cancer?

  5. A nested case-control study is conducted within an ongoing cohort. What advantages does this hybrid design offer over a traditional case-control study conducted in the general population?

Epidemiological Study Designs to Know for Epidemiology