PulseAugur
LIVE 08:12:58
research · [2 sources] ·
0
research

Paper distinguishes three models for RLHF annotation: extension, evidence, and authority

A new paper proposes three distinct models for how human annotator judgments shape large language model behavior through Reinforcement Learning from Human Feedback (RLHF). These models are 'extension,' where annotators align with designers' views; 'evidence,' where annotators provide factual information; and 'authority,' where annotators represent broader societal consensus. The paper argues that RLHF pipelines should be tailored to these different roles rather than using a single unified approach. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Clarifies the normative role of human feedback in LLM alignment, potentially improving annotation strategies.

RANK_REASON Academic paper proposing new conceptual models for RLHF annotation.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Steve Coyne ·

    Three Models of RLHF Annotation: Extension, Evidence, and Authority

    arXiv:2604.25895v1 Announce Type: cross Abstract: Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments …

  2. arXiv cs.CL TIER_1 · Steve Coyne ·

    Three Models of RLHF Annotation: Extension, Evidence, and Authority

    Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conce…