PulseAugur
LIVE 01:32:43
ENTITY reinforcement learning from human feedback

reinforcement learning from human feedback

PulseAugur coverage of reinforcement learning from human feedback — every cluster mentioning reinforcement learning from human feedback across labs, papers, and developer communities, ranked by signal.

Total · 30d
9
9 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
8
8 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 2/2 · 27 TOTAL
  1. RESEARCH · CL_08537 ·

    Paper distinguishes three models for RLHF annotation: extension, evidence, and authority

    A new paper proposes three distinct models for how human annotator judgments shape large language model behavior through Reinforcement Learning from Human Feedback (RLHF). These models are 'extension,' where annotators …

  2. RESEARCH · CL_15418 ·

    LLMs know they're wrong and agree anyway, research finds

    Researchers have developed two novel methods, BAL-A and BMP-A, to efficiently poison preference datasets used in offline Reinforcement Learning from Human Feedback (RLHF) pipelines like Direct Preference Optimization (D…

  3. RESEARCH · CL_06722 ·

    Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics

    A new paper analyzes the prevalence of verbal tics, such as repetitive phrases and sycophantic openers, in eight leading large language models. Researchers developed a Verbal Tic Index (VTI) to quantify these tics, find…

  4. COMMENTARY · CL_05918 ·

    AI coding agents reshape software quality expectations; new alignment theories emerge

    Justine Moore suggests that advancements in AI coding agents are lowering tolerance for buggy or incomplete software, as these agents can quickly identify and fix issues. Separately, Jack Adler proposes that AI alignmen…

  5. RESEARCH · CL_04993 ·

    New 'Behavioral Canaries' audit LLM training data usage in RL fine-tuning

    Researchers have developed a new auditing method called Behavioral Canaries to detect if large language models (LLMs) improperly use legally protected retrieved context during Reinforcement Learning from Human Feedback …

  6. RESEARCH · CL_00955 ·

    OpenAI explores weak-to-strong generalization for AI alignment

    OpenAI has introduced a new research direction called weak-to-strong generalization, aiming to address the challenge of aligning future superintelligent AI systems with human supervision. Their initial experiments show …

  7. RESEARCH · CL_02599 ·

    OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing

    OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent beh…