Policy gradients are a class of algorithms used in reinforcement learning that optimize the policy directly through gradient ascent. Instead of estimating the value function, these methods focus on adjusting the policy parameters to maximize the expected reward, making them especially useful in complex environments with large action spaces.