Cohort studies follow groups of people over time to see whether a particular exposure leads to a particular outcome. By tracking who develops disease and who doesn't, researchers can measure incidence directly and calculate how much an exposure increases (or decreases) risk. This makes cohort studies one of the strongest observational designs for investigating causal relationships.

Features of Cohort Studies

A cohort study starts by defining groups based on their exposure status, not their disease status. Researchers identify a group of people who share a common experience or characteristic (the "cohort"), then follow them forward to see what happens.

Key characteristics:

Exposure-defined groups: Participants are classified as exposed or unexposed at the start, then both groups are followed over time
Longitudinal design: Data collection happens over a period of time, not at a single snapshot
Incidence measurement: Because you're watching outcomes develop in real time, you can calculate incidence rates and risks directly
Multiple outcomes from one exposure: A single cohort study can track several different health outcomes at once. For example, a cohort of people exposed to air pollution could be monitored for asthma, lung cancer, and cardiovascular disease simultaneously.
Well-suited for rare exposures: If you want to study an uncommon exposure like a specific occupational chemical, you can deliberately recruit people with that exposure and follow them. This would be very difficult with a case-control design.

A classic example: researchers followed cohorts of smokers and non-smokers over decades to compare lung cancer incidence between the two groups.

Prospective vs. Retrospective Designs

Cohort studies come in two main forms, depending on when the exposure and follow-up happen relative to the start of the study.

Prospective cohort studies enroll participants in the present, collect baseline exposure data, and then follow them forward into the future to observe outcomes as they occur. The Framingham Heart Study is a well-known example: researchers enrolled thousands of adults and have tracked cardiovascular outcomes for decades. The advantage here is that exposure data is collected before outcomes develop, which reduces certain biases. The downside is that these studies can take years or even decades to produce results.

Retrospective cohort studies use historical records to identify a cohort that was exposed in the past, then trace their outcomes from that point forward to the present. For instance, researchers might use factory employment records to identify workers exposed to asbestos in the 1970s, then check medical records and death certificates to see what diseases developed. These studies are faster and cheaper, but they depend on the quality and completeness of existing records.

Both designs maintain the core logic of a cohort study: define groups by exposure, then compare outcomes. The difference is just the direction in time relative to when the researcher begins.

Features of cohort studies, Large cell carcinoma of the lung epidemiology and demographics - wikidoc

Cohort Study Analysis and Interpretation

Measures of Association in Cohorts

Because cohort studies follow people over time and measure who develops disease, they can directly calculate incidence. This allows for two important measures of association:

Relative risk (RR) compares the incidence of disease in the exposed group to the incidence in the unexposed group:

$RR = \frac{\text{Incidence in exposed}}{\text{Incidence in unexposed}}$

An RR of 1.0 means no difference between groups
An RR greater than 1.0 suggests the exposure increases risk
An RR less than 1.0 suggests the exposure is protective

For example, if the incidence of lung cancer is 15 per 1,000 in smokers and 3 per 1,000 in non-smokers, the RR = 15/3 = 5.0. Smokers have five times the risk.

Hazard ratio (HR) is used in time-to-event (survival) analysis. It accounts for the fact that participants may enter and leave the study at different times or be followed for different durations. The HR compares the rate at which events occur in exposed vs. unexposed groups at any given point in time. You'll often see HRs reported in studies that use Cox proportional hazards regression.

Features of cohort studies, Frontiers | EGFR-Mutated Squamous Cell Lung Cancer and Its Association With Outcomes

Bias and Confounding in Cohorts

Even well-designed cohort studies face threats to validity. Recognizing these is critical for interpreting results.

Selection bias can occur in several ways:

Loss to follow-up: If participants who drop out differ systematically from those who stay (e.g., sicker people are more likely to leave), the remaining sample no longer represents the original cohort
Healthy worker effect: Occupational cohorts tend to be healthier than the general population simply because people must be healthy enough to work, which can make an exposure look less harmful than it actually is

Information bias arises from errors in measuring exposure or outcome:

In retrospective designs, exposure data may be incomplete or inaccurate because it relies on existing records
Measurement methods may change over a long follow-up period, introducing inconsistency

Confounding occurs when a third variable is associated with both the exposure and the outcome, distorting the apparent relationship. For example, in a study of coffee drinking and heart disease, smoking could be a confounder if coffee drinkers are also more likely to smoke.

Strategies to control confounding include:

Matching exposed and unexposed participants on key confounders at enrollment
Stratification during analysis to examine the association within subgroups
Multivariate regression to statistically adjust for multiple confounders simultaneously
Propensity score methods to balance groups on observed characteristics

Note: randomization is used in experimental studies (like RCTs), not in observational cohort studies. Cohort studies rely on the analytical methods listed above to handle confounding.

Strengths and Limitations of Cohorts

Strengths	Limitations
Establishes temporal sequence (exposure before outcome)	Time-consuming and expensive, especially prospective designs
Directly calculates incidence rates and relative risk	Not efficient for studying rare outcomes (you'd need a huge sample)
Can study multiple outcomes from a single exposure	Vulnerable to loss to follow-up over long study periods
Well-suited for rare exposures	Confounding can be difficult to fully control

How cohort studies compare to other designs:

Cohort vs. case-control: Cohort studies are better for rare exposures and for studying multiple outcomes. Case-control studies are better for rare diseases because they start by selecting people who already have the outcome, making them faster and cheaper for uncommon conditions.

Cohort vs. cross-sectional: Cohort studies establish temporality (you know exposure came before outcome) and measure incidence. Cross-sectional studies capture a single point in time, measure prevalence rather than incidence, and cannot determine which came first. Cross-sectional designs are quicker and less expensive, but they're weaker for causal inference.

2,589 studying →