Policy evaluation is how we figure out whether a policy or program actually works. It provides evidence-based feedback to policymakers, stakeholders, and the public, moving decisions beyond guesswork and political intuition.

At its core, evaluation answers three questions: Is the policy meeting its objectives? Where can it improve? What should happen next? The answers promote accountability and transparency, and they directly inform whether a policy gets continued, modified, or terminated. Evaluation findings also shape practical decisions like budget allocations and staffing levels.

Evaluation can happen at different stages of the policy cycle:

Ex-ante evaluation takes place before implementation. It assesses potential impact and feasibility, helping identify challenges and unintended consequences before money is spent. An environmental impact assessment is a classic example.
Ongoing (mid-term) evaluation happens during implementation. It monitors progress in real time so adjustments can be made. Think of formative assessment in education, where teachers check student understanding throughout a unit rather than only at the end.
Ex-post evaluation occurs after a policy or program is completed. It measures overall effectiveness and outcomes. For instance, an impact evaluation of a social welfare program would assess whether it actually reduced poverty rates among its target population.

Timing and Use of Policy Evaluation

The timing of an evaluation shapes what questions it can answer. Ex-ante evaluation is forward-looking: it asks "What's likely to happen?" Ongoing evaluation asks "Is this working so far, and what needs fixing?" Ex-post evaluation looks back and asks "Did it work, and was it worth it?"

Each type feeds into different decisions. Mid-term findings might lead to course corrections while a program is still running. Ex-post findings might justify scaling a successful pilot to new regions, or shutting down an ineffective program. In all cases, evaluation findings can be communicated to the public through annual reports and public hearings, strengthening democratic accountability.

Formative vs. Summative Evaluation

These two categories describe the purpose of an evaluation, not just its timing.

Purpose and Importance of Policy Evaluation, Advocacy evaluation - Wikipedia

Formative Evaluation

Formative evaluation focuses on process. It's conducted during development and implementation to provide ongoing feedback that improves the policy or program as it unfolds.

Aims to identify strengths, weaknesses, and areas for refinement in design, delivery, and management
Tends to be exploratory and flexible, relying heavily on qualitative methods like interviews, focus groups, and observations
Supports iterative improvement, similar to how agile project management uses short feedback loops to adapt a product during development

A good example: usability testing of a new government online platform. Evaluators watch real users navigate the site, identify where people get confused, and recommend changes before the full launch.

Summative Evaluation

Summative evaluation focuses on results. It's conducted after a policy or program wraps up (or reaches a major milestone) to judge overall effectiveness.

Aims to determine whether the policy achieved its intended objectives and to what extent
Tends to be more structured, relying on quantitative methods like surveys, experiments, and statistical analysis
Provides a final judgment on the merit and worth of a program, often through tools like cost-benefit analysis

The findings directly inform high-stakes decisions: Should the program continue? Should it expand to new populations? Should it be terminated? Some legislation even includes sunset provisions that automatically end a program unless evaluation evidence justifies renewal.

The key distinction: Formative evaluation asks "How can we make this better?" Summative evaluation asks "Did this work?"

Research Methods for Policy Evaluation

Purpose and Importance of Policy Evaluation, Policy Implementation Matrix Template | tools4dev

Quantitative Methods

Quantitative methods collect numerical data that can be measured, compared, and analyzed statistically. They're especially useful when you need to assess a policy's impact across a large population.

Surveys collect data from a large sample using standardized questionnaires (often with tools like Likert scales for measuring attitudes). They can be administered online, by mail, or in person. Their standardization makes results comparable across groups.
Experiments randomly assign participants to treatment and control groups to test the causal impact of a policy. Randomized controlled trials (RCTs) are the gold standard here. A/B testing in digital policy platforms works on the same principle.
Statistical analysis uses mathematical techniques to describe patterns and make inferences from data. Regression analysis, for example, can isolate the effect of one variable while controlling for others.
Administrative data analysis uses data that government agencies already collect (like unemployment insurance claims or tax records) to assess policy performance. This avoids the cost of new data collection, though the data wasn't designed for evaluation purposes, which can limit what questions it answers.

Qualitative Methods

Qualitative methods explore the why and how behind policy outcomes. They capture context, meaning, and individual experience in ways that numbers alone cannot.

Interviews are in-depth, one-on-one conversations that explore a person's perspectives and experiences. They can be structured (fixed questions), semi-structured (guided but flexible), or unstructured (open-ended). Life history interviews, for instance, trace how a policy affected someone's trajectory over time.
Focus groups bring together a small group (typically 6-12 people) to discuss a specific topic, guided by a moderator. They're valuable for understanding group dynamics and generating new ideas through participant interaction.
Observations involve systematically recording behaviors and interactions in natural settings. A researcher might observe how a new classroom policy plays out in practice. Observations can be participant (the researcher joins the activity) or non-participant (the researcher watches from outside).
Document analysis examines written materials like reports, memos, legislative records, and media coverage. Content analysis techniques can systematically code these documents to identify patterns in how a policy was discussed, implemented, or received.

Strengths and Limitations of Evaluation Methods

No single method is perfect. Strong evaluations often combine multiple methods (called mixed methods) to offset the weaknesses of any one approach.

Strengths

Method	Key Strengths
Surveys	Reach large, representative samples; produce generalizable findings; relatively quick and cost-effective, especially online
Interviews	Allow deep exploration of individual experiences; can uncover unexpected insights and nuances that surveys miss
Focus groups	Reveal group dynamics and generate new ideas through discussion; leverage collective knowledge of participants
Experiments	Establish causal relationships between a policy and its outcomes; strong internal validity (confidence that the policy, not something else, caused the result)
Observations	Provide direct evidence of behavior in real-world settings; capture complexity and context that self-reported data might miss

Limitations

Method	Key Limitations
Surveys	Vulnerable to response bias (people answer inaccurately), social desirability bias (people give "acceptable" answers), and low response rates that skew results
Interviews	Time-consuming and resource-intensive; findings are hard to generalize to larger populations; interviewer bias can shape responses
Focus groups	Susceptible to groupthink (dominant voices drowning out others) and social desirability bias; small samples limit generalizability
Experiments	May have limited external validity (results from controlled conditions don't always transfer to messy real-world settings); can raise ethical concerns about withholding a potentially beneficial program from the control group
Observations	Subject to observer bias (researchers see what they expect); reactivity can occur when people change behavior because they know they're being watched (the Hawthorne effect)

Understanding these trade-offs is central to designing credible evaluations. When you read an evaluation report, always ask: What methods did they use, and what are the blind spots of those methods?

2,589 studying →