Reinforcement Learning from Human Feedback
RLHF is a technique used to fine-tune certain generative models based on a secondary model that is trained to estimate a measure of perceived quality of the output of the generative model. Human raters provide labels to train the secondary model, and the predictions of this model are used as a guide to fine-tune the generative model in the context of a Reinforcement Learning algorithm.