Assessment and evaluation sit at the heart of teaching. Assessment is the process of gathering and interpreting information about student learning to make instructional decisions. Evaluation takes it a step further: it means making judgments about the quality of that learning based on the data you've collected.

These two processes work together to serve several purposes:

Diagnosing student strengths and weaknesses
Monitoring progress over time
Providing feedback to both students and teachers
Determining grades or placement

When done well, assessment and evaluation don't just measure learning. They actually promote it by helping teachers adjust their instruction and helping students understand where they stand.

Uses of Assessment and Evaluation Data

Assessment data gets used at every level of the education system, and the purpose shifts depending on the scale:

Classroom level: Teachers use data to plan lessons and differentiate instruction for individual students.
School and district level: Administrators identify areas for improvement and shape professional development priorities.
State and national level: Policymakers use data to set curriculum standards and hold schools accountable.

Two additional uses are worth understanding. First, data can be disaggregated by student subgroups (race/ethnicity, socioeconomic status, language proficiency) to reveal achievement gaps and equity issues that overall averages might hide. Second, longitudinal data tracks student progress over months or years, which helps evaluate whether programs and interventions are actually working.

Formative vs. Summative Assessment

The distinction between formative and summative assessment is one of the most fundamental ideas in this unit. The difference comes down to when and why you assess.

Formative Assessment

Formative assessment happens during the learning process. Its primary purpose is to monitor student understanding and guide instruction in real time. Think of it as a check-in, not a final judgment.

Common formative assessment strategies include:

Exit tickets: Brief questions or prompts at the end of a lesson to check for understanding. A teacher might ask students to write down one thing they learned and one thing they're still confused about.
Think-pair-share: Students think about a question individually, discuss their answer with a partner, then share with the class. This gives every student a chance to process the material, not just the ones who raise their hands.
Graphic organizers: Visual tools (concept maps, Venn diagrams, flowcharts) that help students organize relationships between ideas.
Peer feedback: Students evaluate each other's work using established criteria, which reinforces their own understanding of the standards.

The key feature of formative assessment is that the results are used to adjust teaching and learning, not to assign a final grade.

Purpose and Importance, Frontiers | The Assessment Purpose Triangle: Balancing the Purposes of Educational Assessment

Summative Assessment

Summative assessment happens after a unit, course, or learning period. Its primary purpose is to evaluate what students have learned and assign grades or scores.

Common summative assessment strategies include:

Performance tasks: Complex, authentic tasks that require students to apply knowledge and skills (e.g., designing an experiment, writing a research paper).
Portfolios: Collections of student work over time that demonstrate growth and achievement.
Rubrics: Scoring guides that define specific criteria and performance levels for a task. Rubrics make grading more transparent and consistent.
Standardized tests: These can be norm-referenced (comparing a student's performance to other students) or criterion-referenced (measuring performance against a fixed standard).

A helpful way to remember the distinction: formative assessment is for learning; summative assessment is of learning.

Principles of Assessment Design

Alignment and Validity

A well-designed assessment starts with alignment: the assessment should match the learning objectives and the instructional activities. If you taught students to analyze primary sources but your test only asks them to recall dates, the assessment isn't aligned.

Validity means the assessment actually measures what it's supposed to measure. There are three main types:

Content validity: Does the assessment adequately cover the content that was taught? A final exam that only tests material from the last two weeks of a ten-week unit has weak content validity.
Construct validity: Does the assessment measure the underlying trait or ability it claims to measure? A reading comprehension test given in English to a student who speaks limited English may actually be measuring language proficiency, not reading comprehension.
Criterion validity: Does performance on this assessment predict performance on a related measure? For example, do scores on a practice SAT correlate with actual SAT scores?

Strategies for strengthening validity:

Develop assessments directly from clearly defined learning objectives and standards.
Use a test blueprint (a planning document that maps questions to specific content areas and skill levels) to ensure adequate coverage.
Have content experts review the assessment and pilot it with a representative group of students before full use.

Purpose and Importance, 6.1 Assessment and Evaluation | Foundations of Education

Fairness and Accessibility

Assessments should measure what students know, not disadvantage them because of who they are. Bias can creep in through cultural references, language complexity, or formats that don't account for diverse learners.

Strategies for enhancing fairness and accessibility:

Use multiple measures of student learning rather than relying on a single test score. This reduces the impact of any one biased instrument.
Provide accommodations for students with disabilities or language needs, such as extended time, read-aloud options, or translated materials.
Apply universal design principles, which means building accessibility into the assessment from the start rather than retrofitting it later.
Review assessments for bias by checking whether questions contain cultural assumptions, unfamiliar contexts, or language that might confuse certain student groups.

Authenticity and Feedback

Authentic assessment asks students to apply knowledge and skills to real-world tasks or problems, rather than just recalling isolated facts. A science student designing a water filtration system is doing authentic work; filling in bubbles on a multiple-choice sheet about water filtration is less so.

Feedback is what makes assessment useful for learning. It should be clear (students understand what to improve), timely (given soon enough to act on), and actionable (pointing toward specific next steps).

Strategies for enhancing authenticity and feedback:

Use performance tasks or project-based assessments that mirror real-world contexts.
Provide rubrics so students know the expectations before they begin.
Build in formative assessment throughout a unit so feedback is ongoing, not just a grade at the end.
Involve students in the process through self-assessment and peer feedback, which builds metacognitive skills.

Validity and Reliability in Assessment

Validity and reliability are the two pillars of assessment quality. Validity asks, "Are we measuring the right thing?" Reliability asks, "Are we measuring it consistently?"

Reliability

Reliability refers to the consistency of assessment results. If a student took the same test under the same conditions twice, would they get a similar score? If two teachers graded the same essay, would they give it similar marks?

There are three main types:

Test-retest reliability: Consistency of scores when the same test is given to the same students at different times.
Parallel forms reliability: Consistency of scores across two equivalent versions of a test (e.g., Form A and Form B of a standardized exam).
Inter-rater reliability: Consistency of scores when different people grade the same student work. This is especially important for subjective assessments like essays or presentations.

Strategies for improving reliability:

Use clear, detailed scoring criteria and rubrics so grading is less subjective.
Train all raters on how to apply the rubric consistently before they begin scoring.
Use multiple measures rather than relying on a single assessment.
Analyze reliability using statistical methods like Cronbach's alpha (for internal consistency) or the Kappa coefficient (for inter-rater agreement).

Why Both Validity and Reliability Matter

An assessment can be reliable without being valid. Imagine a bathroom scale that always reads five pounds too heavy. It's consistent (reliable), but it's not giving you the right number (not valid). In education, a test might produce consistent scores but still fail to measure what it's supposed to.

On the other hand, an assessment can't truly be valid if it isn't reliable. If results are inconsistent, you can't trust that they're measuring the intended learning outcome.

When assessments lack validity or reliability, the consequences are real: students get misidentified for intervention programs, teachers make instructional decisions based on flawed data, and policy decisions rest on inaccurate information.

To maintain both, educators should:

Use multiple measures to triangulate data and reduce measurement error
Provide clear rubrics and train raters to reduce subjectivity
Regularly review and revise assessments based on validity and reliability evidence
Use statistical analysis to identify weak or problematic test items

2,589 studying →