Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
The type of data you collect in an epidemiological study determines what questions you can answer. Can you prove causation, or only suggest association? Can you track disease over time, or just capture a single moment? These distinctions are fundamental to study design, causal inference, and evidence evaluation.
On exams, you'll need to match research questions to appropriate data types, recognize the strengths and limitations of each approach, and interpret findings within their methodological constraints. Don't just memorize definitions. Know what each data type can and cannot tell you, and when you'd choose one over another.
The timing of data collection shapes what conclusions you can draw. Cross-sectional studies capture a single moment, while longitudinal approaches track changes over time. Each answers different epidemiological questions.
Cross-sectional data is like a photograph of population health at one point in time. It measures prevalence (the proportion of people with a condition at that moment), not incidence (new cases over time).
Longitudinal data tracks the same subjects over multiple time points, letting researchers observe how health status develops and changes.
Time-series data analyzes trends across sequential time points, typically at the population level rather than tracking specific individuals.
Compare: Cross-sectional vs. longitudinal data. Both can assess associations, but only longitudinal data establishes temporal sequence. If an exam question asks about determining whether exposure preceded outcome, longitudinal is your answer.
Study design determines whether researchers start with exposure and look for outcomes, or start with outcomes and investigate past exposures. This directionality affects efficiency, bias potential, and the types of measures you can calculate.
Cohort studies follow groups of exposed and unexposed individuals forward in time to see who develops the outcome. This is the classic prospective design, though retrospective cohorts also exist (using historical records to reconstruct the same forward-looking logic).
Case-control studies work in the opposite direction. You start by identifying cases (people with the disease) and controls (people without it), then look backward to compare their past exposures.
Compare: Cohort vs. case-control data. Both assess exposure-outcome relationships, but cohort studies move forward (exposure โ outcome) while case-control studies move backward (outcome โ exposure). Cohort data gives you incidence and relative risk; case-control data gives you odds ratios.
Where you draw your analytical boundaries determines what inferences are valid. Studying groups rather than individuals offers efficiency but carries unique interpretive risks.
Ecological studies analyze aggregate data at the group or population level, comparing disease rates across countries, states, or communities rather than tracking individuals.
Surveillance involves the systematic, ongoing collection of health data for monitoring and response. It includes passive reporting systems (where clinicians report cases), active case finding (where health departments seek out cases), and sentinel surveillance networks (selected sites that report on specific conditions).
Compare: Ecological vs. surveillance data. Both operate at the population level, but ecological data compares across populations to find associations, while surveillance data monitors within populations over time to detect changes. Surveillance is about action; ecological analysis is about hypothesis generation.
The gold standard for causal inference requires experimental manipulation, but ethical and practical constraints often limit researchers to observational approaches. Understanding this hierarchy of evidence is essential for evaluating study validity.
In experimental studies, the researcher controls the intervention and uses randomization to assign participants to groups. The randomized controlled trial (RCT) is the strongest design for establishing causality.
Quantitative data consists of numerical measurements amenable to statistical analysis: counts, rates, proportions, and continuous measurements like blood pressure or BMI.
Qualitative data captures experiences, perceptions, and context through non-numerical methods like interviews, focus groups, and ethnographic observation.
Compare: Experimental vs. observational data. RCTs can establish causation through controlled intervention, while observational studies (cohort, case-control, cross-sectional) can demonstrate association but not definitively prove causation. However, when randomization is unethical or impractical, well-designed observational studies remain essential tools.
| Concept | Best Examples |
|---|---|
| Single time point measurement | Cross-sectional data |
| Tracking changes over time | Longitudinal data, time-series data, cohort data |
| Retrospective exposure assessment | Case-control data |
| Prospective outcome tracking | Cohort data (prospective design) |
| Population-level analysis | Ecological data, surveillance data |
| Establishing causation | Experimental data (RCTs) |
| Hypothesis generation | Qualitative data, ecological data, cross-sectional data |
| Rare disease investigation | Case-control data |
A researcher wants to determine whether a new vaccine causes reduced infection rates. Which data type provides the strongest evidence for causation, and why can't observational data achieve the same level of certainty?
Compare cohort data and case-control data: What measure of association does each calculate, and which would you choose to study a disease affecting only 1 in 100,000 people?
A study finds that countries with higher chocolate consumption have more Nobel Prize winners. What type of data is this, and what logical error should you watch for when interpreting these findings?
Which two data types both involve tracking information over time but differ in whether they follow the same individuals? Explain how this difference affects the research questions each can answer.
You need to design a study investigating why patients in a specific community don't adhere to diabetes medication regimens. Which data type would best capture the contextual factors involved, and how might you combine it with another data type for a more complete picture?