Experimental design is how researchers structure studies to draw valid cause-and-effect conclusions. Without careful design, bias and confounding variables can creep in and make your results meaningless. This section covers the core components of randomized experiments, common sources of bias, and how to evaluate whether a study's conclusions actually hold up.

more resources to help you study

practice questions

Components of Randomized Experiments

A randomized experiment assigns subjects to treatment groups using a random process. This is the single most important feature of a true experiment because it makes the groups roughly equivalent on both known and unknown characteristics, reducing the influence of confounding variables.

Here are the key building blocks:

Explanatory variable (independent variable): the variable the researcher deliberately manipulates. For example, a researcher might vary drug dosage across groups (0 mg, 50 mg, 100 mg).
Response variable (dependent variable): the outcome you measure to see if the treatment had an effect. In a drug trial, this could be the change in blood pressure.
Treatments: the specific conditions applied to each group. If you're testing two dosage levels plus a placebo, that's three treatments.
Control group: receives no treatment or a standard/baseline treatment (like a placebo or the current standard of care). This group exists so you can compare the treatment groups against something meaningful.
Sample size: the number of subjects in the study. Larger samples give the experiment more statistical power, meaning a better chance of detecting a real effect if one exists. A study with 20 subjects is far less reliable than one with 2,000.

The combination of random assignment + a control group is what allows experiments (unlike observational studies) to establish causation.

Components of randomized experiments, Introduction to the Treatment of Psychological Disorders | Boundless Psychology

Sources of Experimental Bias

Even well-designed experiments can be undermined by bias. Here are the major threats:

Lurking (confounding) variables are factors not accounted for in the design that influence the response variable. If you're testing whether a new teaching method improves test scores but one group happens to have stronger students, prior ability is a confound. Randomization is the primary defense, though researchers can also control for known confounders through techniques like blocking or stratification.

Placebo effect is a real, measurable response that occurs simply because a subject believes they're receiving treatment. Someone given a sugar pill might report feeling better. To account for this, researchers give the control group a placebo (an inactive treatment that looks identical to the real one) so that any improvement beyond the placebo group's response can be attributed to the actual treatment.

Blinding prevents the power of suggestion from distorting results:

Single-blind: subjects don't know which treatment they're receiving (real drug vs. placebo).
Double-blind: neither the subjects nor the researchers interacting with them know the assignments. This prevents researchers from unconsciously treating groups differently or interpreting results with bias.

Sampling bias occurs when the sample doesn't represent the target population. Convenience sampling (surveying only people who happen to walk by your table) is a classic example. Random sampling techniques like simple random sampling help ensure representativeness.

Components of randomized experiments, Hypothesis Test for Difference in Two Population Proportions (1 of 6) | Concepts in Statistics

Experimental Validity and Significance

Validity tells you whether a study's conclusions are trustworthy:

Internal validity: How well does the study establish a cause-and-effect relationship? High internal validity means you've controlled for confounding variables and can confidently say the treatment caused the observed change.
External validity (generalizability): Can the results be applied beyond the specific study? A drug tested only on 20-year-old males may not generalize to elderly women.

These two types of validity often trade off. A tightly controlled lab experiment may have high internal validity but low external validity because the setting is artificial.

Causation vs. correlation is a distinction you'll see throughout statistics. Two variables can be correlated (they move together) without one causing the other. Only a properly designed randomized experiment can establish causation. Observational studies, no matter how large, can only show association.

Statistical significance indicates whether the results are unlikely to have occurred by chance alone. Researchers use p-values and confidence intervals to assess this. A result is typically considered statistically significant if the p-value falls below a predetermined threshold (often 0.05), suggesting the observed effect is probably real rather than random noise.

Ethical Considerations

Ethics in Statistical Studies

Research involving human subjects carries serious ethical responsibilities. These aren't just guidelines; they're enforced requirements.

Protection of human subjects rests on three core practices:

Informed consent: Participants must be told what the study involves, what risks they face, and that they can withdraw at any time without penalty. This is typically documented through a signed consent form. Deception in studies (common in psychology) requires special justification and debriefing afterward.
Confidentiality and anonymity: Personal information must be protected. Researchers replace names with unique ID numbers and store identifying data separately from responses. There's a distinction here: anonymity means even the researchers can't link data to individuals, while confidentiality means they can but won't disclose it.
Minimizing harm: Physical and psychological risks must be kept as low as possible and clearly communicated to participants before they consent.

Prevention of data falsification protects the integrity of science:

Data integrity means recording and reporting data accurately. Fabricating data, manipulating results, or selectively omitting inconvenient findings are all forms of scientific misconduct.
Replication and transparency require researchers to share their methods, data, and analysis openly enough that others can reproduce the study. This is a key safeguard against fraud.
Peer review is the process where other experts evaluate a study before publication. It helps catch errors, weak methodology, and ethical violations. When misconduct is discovered after publication, papers are retracted.

Institutional Review Boards (IRBs) are committees that review and approve research involving human subjects before the study begins. They evaluate whether the study's benefits justify its risks and whether proper protections are in place. IRBs operate under the principles of the Belmont Report (1979), which established three foundational ethical principles:

Respect for persons: Treat individuals as autonomous agents; provide extra protection for those with diminished autonomy.
Beneficence: Maximize benefits and minimize harm.
Justice: Distribute the burdens and benefits of research fairly across populations (don't exploit vulnerable groups for research that only benefits the privileged).

2,589 studying →