← Back to glossary

RLHF

Reinforcement learning from human feedback (RLHF), also known as reinforcement learning from human preferences, is a technique that trains a “reward model” using human feedback. This model is then used as a reward function to optimize an agent’s policy through reinforcement learning (RL). This is achieved using an optimization algorithm such as Proximal Policy Optimization.