Actor-critic models

Actor-critic models are reinforcement learning systems with two parts: the actor picks actions, and the critic judges how good those actions were. In Intro to Cognitive Science, they model learning, choice, and feedback in minds and machines.

Last updated July 2026

What are actor-critic models?

Actor-critic models are a reinforcement learning setup in Intro to Cognitive Science where one part of the system chooses actions and another part evaluates them. The actor is the decision-maker, using a policy to pick what to do next. The critic is the evaluator, estimating how good the chosen action was by comparing expected and actual outcomes.

That split matters because learning gets easier when choice and evaluation are separated. Instead of trying to relearn everything from scratch after each action, the actor updates its policy based on the critic’s feedback. If the outcome is better than expected, the action gets reinforced. If it is worse, the policy shifts away from that choice.

The critic usually works with a value function, which is a prediction of future reward from a state or from a state-action pair. That value estimate acts like a baseline. By comparing the outcome to the baseline, the model reduces noisy updates caused by random exploration, so learning is more stable and often faster than methods that only guess from raw reward signals.

In cognitive science, actor-critic models show up when the course talks about decision-making as an adaptive process. They are a good fit for problems where you do not know the best action right away, such as navigating a maze, choosing a move in a game, or learning a motor skill through practice. The model captures a very human pattern: you try something, get feedback, and gradually adjust what you do next.

These models also work with both discrete and continuous action spaces. That means the action can be a small set of choices, like left or right, or a smooth control signal, like adjusting force or angle in robotics. When the course connects machine learning to cognition, actor-critic models are one of the clearest examples of how systems can learn from consequences instead of being told the answer in advance.

Why actor-critic models matter in Intro to Cognitive Science

Actor-critic models matter in Intro to Cognitive Science because they connect two big course ideas at once: how agents learn from feedback, and how decision-making can be modeled computationally. They give you a concrete way to talk about behavior that changes over time, not just a single choice made in isolation.

They also help explain why reinforcement learning is different from supervised learning. There is no labeled correct answer after each action. Instead, the agent has to estimate whether the outcome was better or worse than expected, which is closer to real-world learning, especially in tasks with delayed rewards.

This term also shows up when the course compares machine intelligence to human cognition. If a person practices a skill, they do not usually improve by memorizing a fixed answer key. They improve through trial, error, feedback, and adjustment. Actor-critic models capture that loop in a simplified computational form.

You will often use this concept to interpret how a system balances exploration and control. The actor tries new actions, while the critic keeps learning from the results. That balance is useful for explaining why some systems learn efficiently in complex environments and why value estimates can stabilize behavior over time.

Keep studying Intro to Cognitive Science Unit 8

Visual cheatsheet

view gallery

Unit 8 study guide

How actor-critic models connect across the course

Reinforcement Learning

Actor-critic models are one kind of reinforcement learning. They fit the basic reinforcement learning loop of action, feedback, and update, but they split that loop into separate jobs. That split makes them useful for explaining how a system can learn from reward signals over many steps instead of from direct instruction.

Policy Gradient

The actor side of an actor-critic model is closely related to policy gradient methods, because both update a policy for choosing actions. The difference is that actor-critic adds a critic to estimate value and reduce noisy updates. That extra feedback often makes learning more stable than using policy updates alone.

Temporal Difference Learning

The critic often learns with temporal difference methods, which update predictions using the gap between expected and actual reward. That makes actor-critic models especially good for step-by-step learning where rewards may come later. The critic’s prediction error is what guides the actor’s next policy update.

Deep Reinforcement Learning

When actor-critic models use neural networks for the actor, critic, or both, they become deep reinforcement learning systems. This matters in cognitive science because it shows how larger, more flexible models can handle complicated tasks with many possible states, such as game playing or robot control.

Are actor-critic models on the Intro to Cognitive Science exam?

A quiz question might give you a learning scenario and ask you to identify which part is the actor and which part is the critic. You may also need to trace the update cycle: action is chosen, reward is observed, the critic compares outcome to expectation, and the actor changes its policy. If you see a prompt about why a model learns more smoothly than simple trial-and-error, mention the critic’s value estimate and baseline. On essay or discussion questions, use actor-critic models to connect reinforcement learning to behavior change, motor learning, or decision-making in cognitive systems.

Actor-critic models vs Policy Gradient

Policy gradient methods update a policy directly, while actor-critic models add a critic that estimates value and provides feedback for the actor. That extra critic usually lowers variance and makes learning more stable. If a question mentions both action selection and a separate value evaluator, it is pointing to actor-critic, not plain policy gradient.

Key things to remember about actor-critic models

Actor-critic models split learning into two jobs: the actor chooses actions and the critic evaluates them.
The critic uses value estimates to decide whether an outcome was better or worse than expected.
This setup is useful when rewards are delayed or noisy, because the critic gives the actor a steadier signal.
In Intro to Cognitive Science, actor-critic models are a clean way to describe learning, decision-making, and adaptation.
You can apply the term to both simple choice tasks and more complex systems like robotics or game play.

Frequently asked questions about actor-critic models

What is actor-critic models in Intro to Cognitive Science?

Actor-critic models are reinforcement learning systems with two components. The actor selects an action, and the critic evaluates how good that action was using a value estimate. In cognitive science, they are used to model learning from feedback and changing choices over time.

How does the critic help the actor?

The critic compares what happened to what was expected, then sends that feedback to the actor. If the result is better than expected, the actor strengthens that choice. If the result is worse, the actor shifts its policy away from it. This makes learning less noisy.

Are actor-critic models the same as policy gradient?

Not exactly. Policy gradient methods update the policy directly, while actor-critic models include a critic that estimates value and stabilizes learning. That extra evaluation step usually reduces variance and helps the model converge more smoothly.

Where do actor-critic models show up in cognitive science?

They show up in topics like decision-making, learning from reward, and machine learning models of cognition. You might see them in examples involving navigation, game playing, or motor control, where a system has to try actions, receive feedback, and improve over time.