Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Policy evaluation is how governments figure out whether programs are actually solving problems or wasting resources. For this course, you need to distinguish between different evaluation approaches and explain when and why each one is appropriate. Expect questions that ask you to recommend an evaluation strategy for a given scenario or identify the strengths and limitations of different approaches.
These methods fall into distinct categories based on what question they're trying to answer: Is this policy working? How do we know it's the policy causing the change? What do people actually think about it? How is implementation going? Don't just memorize the names. Know what type of evidence each method produces and when evaluators would choose one approach over another.
These approaches attempt to establish causal relationships, proving that a policy actually caused an observed outcome rather than just being correlated with it. The key mechanism is comparison: what happened to people affected by the policy versus what would have happened without it.
RCTs are considered the gold standard for causal inference. Participants are randomly assigned to a treatment group (receives the policy intervention) and a control group (does not). Because assignment is random, any difference in outcomes between the two groups can be attributed to the policy itself rather than to pre-existing differences between the groups. This eliminates selection bias, which is the major weakness of most other methods.
The tradeoff: RCTs are expensive, time-consuming, and raise ethical concerns. You can't randomly deny people access to essential services like healthcare or housing just to create a control group. For example, a city testing a new homelessness prevention program would face serious ethical pushback if it randomly excluded eligible families.
Regression analysis is a statistical technique that lets evaluators isolate the relationship between a policy and an outcome while controlling for other factors (called confounding variables) that might distort the picture. For instance, if crime drops after a new policing strategy is introduced, regression can help account for other things that changed at the same time, like economic conditions or seasonal trends.
This method works well for before-and-after comparisons using existing data, making it more practical than an RCT in many situations. However, results are only as reliable as the data and the assumptions built into the model. If an important variable is left out, the conclusions can be misleading.
Compare: RCTs vs. Regression Analysis: both seek causal evidence, but RCTs create comparison through randomization while regression creates it statistically. If a question asks about evaluating a new job training program, RCTs give cleaner results, but regression works when randomization isn't feasible or ethical.
These methods ask the fundamental question: Did the policy achieve what it was supposed to achieve? They focus on measuring results rather than understanding the internal mechanics of a program.
CBA compares total costs to total benefits, both expressed in monetary terms. This forces policymakers to explicitly weigh trade-offs and opportunity costs. If a highway expansion costs million but produces million in economic benefits (reduced commute times, fewer accidents, increased commerce), the net benefit is million.
The strength of CBA is that it allows direct comparison across very different policy options using a common metric. The weakness is that some outcomes are genuinely hard to monetize. How do you assign a dollar value to a life saved, a species preserved, or a community's sense of safety? These judgment calls introduce subjectivity into what looks like an objective calculation.
Impact assessment evaluates the broader effects of a policy on target populations and the surrounding environment. Unlike CBA, it doesn't try to convert everything into dollars. Instead, it captures the full range of consequences, including unintended ones, both positive and negative.
For example, an impact assessment of a new factory regulation might find that it reduced pollution (intended) but also caused small manufacturers to relocate to neighboring states (unintended). This comprehensive view makes impact assessment especially useful for informing policy redesign.
Performance measurement tracks key performance indicators (KPIs) over time to see whether a program is hitting its targets. Think of metrics like "number of students graduating," "average wait time at the DMV," or "percentage of applicants processed within 30 days."
This method is great for accountability and transparency because it gives stakeholders concrete numbers. But it tells you what is happening, not why. If graduation rates are flat despite a new tutoring program, performance measurement flags the problem but won't explain the cause.
Compare: CBA vs. Impact Assessment: CBA quantifies everything in dollars for direct comparison, while impact assessment captures qualitative and distributional effects that resist monetization. Use CBA when comparing budget alternatives; use impact assessment when equity concerns or unintended consequences matter most.
Sometimes policies fail not because they're bad ideas but because they're poorly implemented. These methods examine how programs operate in practice.
Process evaluation examines whether a program is operating as it was designed to. It identifies gaps between policy on paper and policy in practice. For example, a federal nutrition program might look great in its guidelines, but a process evaluation could reveal that local offices lack the staff to process applications on time, creating long delays that discourage eligible families from participating.
This type of evaluation reveals barriers and facilitators to effective implementation, explaining why some sites succeed while others struggle. You can't fix what you don't understand, and process evaluation pinpoints where the breakdown is occurring.
Program evaluation is the most comprehensive approach. It assesses a program's design, implementation, and outcomes together, combining multiple methods to build a complete picture. A program evaluation typically uses a mixed-methods approach, integrating quantitative data (numbers served, costs incurred, outcome metrics) with qualitative data (participant interviews, staff perspectives).
The goal is to generate evidence-based recommendations for improving current programs and designing future ones. Because of its breadth, program evaluation is often what agencies commission when they need to decide whether to continue, expand, or restructure a program.
Compare: Process Evaluation vs. Program Evaluation: process evaluation zooms in on implementation fidelity, while program evaluation takes a broader view encompassing design and outcomes. Think of process evaluation as one component within a comprehensive program evaluation.
Policies affect real people whose perspectives matter both ethically and practically. These methods capture human experiences and viewpoints.
Surveys collect standardized data from large populations, gauging opinions, behaviors, and experiences related to a policy's impacts. A state agency might survey 5,000 residents to measure public support for a proposed transit expansion, or a school district might survey parents about satisfaction with a new curriculum.
Surveys are relatively affordable and can reach many people quickly. But their validity depends on careful design. Poorly worded questions can lead respondents toward a particular answer, unrepresentative samples can skew results, and low response rates can make findings unreliable.
Stakeholder analysis maps the interests, influence, and needs of individuals and groups affected by a policy. It identifies who stands to gain, who stands to lose, and how much power each group has to shape the outcome.
This is a strategic tool. It helps policymakers anticipate resistance, identify potential allies, and build coalitions. Policies designed without stakeholder input often face implementation challenges because affected groups weren't consulted and may actively oppose the program.
A case study is a deep examination of a specific implementation instance, exploring how a policy plays out in a real-world context with all its complexity. Rather than surveying hundreds of sites superficially, a case study might spend months examining one or two sites in detail through interviews, observation, and document review.
Case studies generate rich qualitative understanding of mechanisms, context, and nuance that quantitative methods miss. They complement statistical approaches by explaining the "why" behind the numbers.
Compare: Surveys vs. Case Studies: surveys provide breadth (many respondents, standardized questions), while case studies provide depth (few cases, rich detail). Surveys are better for measuring how widespread something is; case studies are better for understanding how and why something is happening.
| Evaluation Question | Best Methods |
|---|---|
| Did the policy cause the outcome? | RCTs, Regression Analysis |
| Is the policy worth the investment? | Cost-Benefit Analysis |
| What effects did the policy produce? | Impact Assessment, Program Evaluation |
| Is the program running as intended? | Process Evaluation, Performance Measurement |
| What do affected people think? | Surveys, Stakeholder Analysis |
| How does implementation work in context? | Case Studies, Process Evaluation |
| Should we continue or expand the program? | Program Evaluation, CBA, Impact Assessment |
A city wants to know whether its new after-school program is actually reducing juvenile crime or whether crime was already declining. Which evaluation method would provide the strongest causal evidence, and why might it be difficult to implement?
Compare and contrast Cost-Benefit Analysis and Impact Assessment. In what situation would a policymaker choose one over the other?
Which two methods would you combine to understand both whether a policy is working and why it's working (or failing) in different locations?
A state agency has limited budget and needs to quickly assess public support for a proposed policy change. Which method is most appropriate, and what validity concerns should they address?
If a question describes a program that's achieving good outcomes in some districts but poor outcomes in others, which evaluation approach would best explain this variation? What would it examine?