Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
When you're studying epidemiology, you're learning to think like a disease detective—and every detective needs the right tools for the job. The type of data you collect determines what questions you can answer: Can you prove causation, or only suggest association? Can you track disease over time, or just capture a moment? Understanding these distinctions is fundamental to study design, causal inference, and evidence evaluation—concepts that appear repeatedly on exams.
You're being tested on your ability to match research questions to appropriate data types, recognize the strengths and limitations of each approach, and interpret findings within their methodological constraints. Don't just memorize definitions—know what each data type can and cannot tell you, and when you'd choose one over another. That's where the real exam points are.
The timing of data collection fundamentally shapes what conclusions you can draw. Cross-sectional studies capture a single moment, while longitudinal approaches track changes over time—each answering different epidemiological questions.
Compare: Cross-sectional vs. longitudinal data—both can assess associations, but only longitudinal data establishes temporal sequence. If an exam question asks about determining whether exposure preceded outcome, longitudinal is your answer.
Study design determines whether researchers start with exposure and look for outcomes, or start with outcomes and investigate past exposures. This directionality affects efficiency, bias potential, and the types of measures you can calculate.
Compare: Cohort vs. case-control data—both assess exposure-outcome relationships, but cohort studies move forward in time (exposure → outcome) while case-control studies move backward (outcome → exposure). Cohort data gives you incidence; case-control data gives you odds ratios.
Where you draw your analytical boundaries—individual people or entire groups—determines what inferences are valid. Ecological studies offer efficiency but carry unique interpretive risks.
Compare: Ecological vs. surveillance data—both operate at population level, but ecological data compares across populations to find associations, while surveillance data monitors within populations over time to detect changes. Surveillance is about action; ecological analysis is about hypothesis generation.
The gold standard for causal inference requires experimental manipulation, but ethical and practical constraints often limit researchers to observational approaches. Understanding this hierarchy of evidence is essential for evaluating study validity.
Compare: Experimental vs. observational data—RCTs can establish causation through controlled intervention, while observational studies (cohort, case-control, cross-sectional) can only demonstrate association. However, when randomization is unethical or impractical, well-designed observational studies remain essential.
| Concept | Best Examples |
|---|---|
| Single time point measurement | Cross-sectional data |
| Tracking changes over time | Longitudinal data, time-series data, cohort data |
| Retrospective exposure assessment | Case-control data |
| Prospective outcome tracking | Cohort data (prospective design) |
| Population-level analysis | Ecological data, surveillance data |
| Establishing causation | Experimental data (RCTs) |
| Hypothesis generation | Qualitative data, ecological data, cross-sectional data |
| Rare disease investigation | Case-control data |
A researcher wants to determine whether a new vaccine causes reduced infection rates. Which data type provides the strongest evidence for causation, and why can't observational data achieve the same level of certainty?
Compare and contrast cohort data and case-control data: What measure of association does each calculate, and which would you choose to study a disease affecting only 1 in 100,000 people?
A study finds that countries with higher chocolate consumption have more Nobel Prize winners. What type of data is this, and what logical error should you be cautious about when interpreting these findings?
Which two data types both involve tracking information over time but differ in whether they follow the same individuals? Explain how this difference affects the research questions each can answer.
An FRQ asks you to design a study investigating why patients in a specific community don't adhere to diabetes medication regimens. Which data type would best capture the contextual factors involved, and how might you combine it with another data type for a more complete picture?