Experimental design is the backbone of Unit 3 and shows up repeatedly throughout AP Statistics—from understanding how data should be collected to interpreting results in inference problems. When you see an FRQ asking whether a study can establish causation or merely association, you're being tested on these principles. The concepts here—randomization, control, blocking, and replication—aren't just vocabulary words; they're the tools that separate valid experiments from flawed ones.
Here's the key insight: every design choice exists to solve a specific problem. Randomization eliminates confounding. Blocking reduces variability. Control groups provide a baseline. Blinding prevents bias. Don't just memorize these terms—understand what problem each principle solves, because that's exactly what the exam will ask you to explain.
Establishing Causation: The Core Principles
The fundamental goal of an experiment is to establish a cause-and-effect relationship. These three principles work together to make that possible—without all three, your experiment cannot support causal conclusions.
Randomization
Random assignment of treatments—ensures each experimental unit has an equal chance of receiving any treatment, creating roughly equivalent groups
Eliminates confounding variables by distributing both known and unknown lurking variables evenly across treatment groups
Enables causal inference—this is the key distinction between experiments and observational studies; without randomization, you can only claim association
Replication
Multiple experimental units per treatment—allows you to estimate the natural variability in responses and distinguish real effects from random chance
Increases statistical power by reducing the standard error of treatment effect estimates (recall: SE=ns)
Validates results by demonstrating that observed differences are consistent across many subjects, not just a few unusual cases
Control Groups
Provide a baseline for comparison—without knowing what happens with no treatment, you can't measure the treatment's effect
Can be placebo or standard treatment depending on the research question and ethical considerations
Isolate the treatment effect by holding all other conditions constant between groups
Compare: Randomization vs. Replication—both are essential for valid experiments, but they solve different problems. Randomization creates comparable groups (controls confounding), while replication provides enough data to detect effects (controls variability). If an FRQ asks why an experiment can establish causation, mention both.
Reducing Bias: Blinding and Placebo Controls
Even with perfect randomization, human psychology can introduce systematic errors. These techniques address bias that comes from participants' and researchers' expectations—the mind can create effects that don't actually exist.
Blinding
Single-blind design—participants don't know which treatment they receive, preventing their expectations from influencing their responses
Double-blind design—neither participants nor researchers measuring outcomes know group assignments, eliminating bias from both sides
Essential for subjective outcomes like pain levels, mood, or perceived improvement where expectations strongly influence reported results
Placebo Effect
Psychological response to perceived treatment—participants may improve simply because they believe they're receiving help, not because the treatment works
Requires placebo control groups to separate genuine treatment effects from expectation-driven changes
Particularly important in medical and behavioral studies where outcomes depend partly on participant beliefs and attitudes
Compare: Single-blind vs. Double-blind—single-blind controls participant bias only, while double-blind also prevents researchers from unconsciously treating groups differently or interpreting results favorably. Double-blind is the gold standard, but single-blind may be necessary when researchers must know treatments (e.g., surgical procedures).
Controlling Variability: Blocking Strategies
When experimental units differ in ways that affect the response, blocking groups similar units together before randomization. This reduces "noise" in your data—think of it as sorting before shuffling.
Blocking
Groups similar experimental units based on a characteristic expected to affect the response (age, gender, location, baseline ability)
Randomization occurs within blocks—each block contains all treatments, so block differences can't confound treatment comparisons
Reduces unexplained variability by accounting for known sources of variation, making treatment effects easier to detect
Randomized Block Design
Divide units into homogeneous blocks first, then randomly assign treatments within each block
Each treatment appears in every block—this ensures fair comparison across all levels of the blocking variable
More precise than completely randomized design when the blocking variable is strongly related to the response
Matched-Pairs Design
Special case of blocking with only two treatments—pairs are formed based on similar characteristics, then one member of each pair is randomly assigned to each treatment
Each subject can serve as their own control in before/after designs, eliminating individual differences entirely
Analyzed using paired differences (xˉd and sd), which typically have smaller variability than two independent samples
Compare: Randomized Block Design vs. Matched-Pairs—both use blocking, but matched-pairs is specifically for two-treatment comparisons and often uses subjects as their own controls. On FRQs, identify matched-pairs when the same subjects receive both treatments or when subjects are explicitly paired before assignment.
Experimental Design Structures
Different research questions call for different experimental frameworks. The choice depends on how many factors you're studying, what resources you have, and the nature of your experimental units.
Completely Randomized Design (CRD)
All units assigned to treatments purely by chance—no blocking or matching, the simplest experimental structure
Best for homogeneous populations where experimental units are similar enough that blocking wouldn't help
Straightforward analysis but may miss effects if there's substantial variability among units that blocking could control
Factorial Design
Studies multiple factors simultaneously—for example, a 2×2 factorial examines two factors, each at two levels, creating four treatment combinations
Reveals interaction effects—whether the effect of one factor depends on the level of another factor
More efficient than one-factor-at-a-time experiments because every observation provides information about all factors
Crossover Design
Each participant receives all treatments in sequence—subjects serve as their own controls, dramatically reducing between-subject variability
Requires washout periods between treatments to prevent carryover effects from one treatment influencing response to the next
Ideal for chronic conditions where treatment effects are temporary and reversible
Compare: Completely Randomized vs. Randomized Block Design—CRD is simpler but ignores known sources of variability, while RBD accounts for them through blocking. Choose CRD when units are homogeneous; choose RBD when you can identify a variable that affects the response. FRQs often ask you to justify why blocking improves an experiment.
Threats to Validity: Confounding and Bias
Understanding what can go wrong helps you design better experiments and critique flawed studies—a common FRQ task is identifying problems in a described study.
Confounding Variables
Alternative explanations for observed effects—variables associated with both the treatment and the response that make it impossible to isolate the treatment's true effect
Managed through randomization (distributes confounders evenly), blocking (controls known confounders), or holding constant (same conditions for all groups)
The reason observational studies can't establish causation—without random assignment, treatment groups may differ systematically
Bias Reduction Techniques
Randomization eliminates selection bias by preventing systematic differences between treatment groups
Blinding eliminates response and measurement bias by preventing expectations from influencing outcomes or data collection
Standardized protocols reduce procedural bias by ensuring all subjects are treated identically except for the treatment itself
Sample Size Determination
Larger samples increase power—the ability to detect a real treatment effect when one exists
Calculated from effect size, variability, and significance level—smaller expected effects or higher variability require larger samples
Balances precision against practical constraints like cost, time, and availability of experimental units
Compare: Confounding vs. Bias—both threaten validity, but confounding is about alternative explanations (a third variable causes both treatment and response), while bias is about systematic errors in measurement or selection. Randomization addresses confounding; blinding addresses bias.
A researcher wants to test whether a new fertilizer increases tomato yield. She has 30 plants of varying ages and sizes. Should she use a completely randomized design or a randomized block design? Explain what blocking variable she might use and why it would improve the experiment.
What do randomization and blinding have in common, and how do they differ? Which one allows an experiment to establish causation, and which one prevents psychological bias?
An experiment compares two pain medications by giving each participant both drugs (one per week) in random order. What design is this, and why is a "washout period" necessary between treatments?
A study finds that coffee drinkers have lower rates of heart disease. A journalist claims coffee prevents heart disease. Explain why this conclusion is flawed and identify at least one potential confounding variable.
Compare matched-pairs design and randomized block design. When would you choose matched-pairs over blocking with multiple treatments? How does the analysis of matched-pairs data differ from analyzing two independent samples?