ENTITY reinforcement learning from human feedback

reinforcement learning from human feedback

PulseAugur coverage of reinforcement learning from human feedback — every cluster mentioning reinforcement learning from human feedback across labs, papers, and developer communities, ranked by signal.

Total · 30d

9 over 90d

Releases · 30d

0 over 90d

Papers · 30d

8 over 90d

TIER MIX · 90D

research 6
tool 2
meme 1

RELATIONSHIPS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 2/2 · 27 TOTAL

RESEARCH · CL_08537 · Apr 28 · 17:39

Paper distinguishes three models for RLHF annotation: extension, evidence, and authority

A new paper proposes three distinct models for how human annotator judgments shape large language model behavior through Reinforcement Learning from Human Feedback (RLHF). These models are 'extension,' where annotators …
RESEARCH · CL_15418 · Apr 28 · 04:00

LLMs know they're wrong and agree anyway, research finds

Researchers have developed two novel methods, BAL-A and BMP-A, to efficiently poison preference datasets used in offline Reinforcement Learning from Human Feedback (RLHF) pipelines like Direct Preference Optimization (D…
RESEARCH · CL_06722 · Apr 28 · 04:00

Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics

A new paper analyzes the prevalence of verbal tics, such as repetitive phrases and sycophantic openers, in eight leading large language models. Researchers developed a Verbal Tic Index (VTI) to quantify these tics, find…
COMMENTARY · CL_05918 · Apr 27 · 22:44

AI coding agents reshape software quality expectations; new alignment theories emerge

Justine Moore suggests that advancements in AI coding agents are lowering tolerance for buggy or incomplete software, as these agents can quickly identify and fix issues. Separately, Jack Adler proposes that AI alignmen…
RESEARCH · CL_04993 · Apr 24 · 03:38

New 'Behavioral Canaries' audit LLM training data usage in RL fine-tuning

Researchers have developed a new auditing method called Behavioral Canaries to detect if large language models (LLMs) improperly use legally protected retrieved context during Reinforcement Learning from Human Feedback …
RESEARCH · CL_00955 · Jun 14 · 11:00

OpenAI explores weak-to-strong generalization for AI alignment

OpenAI has introduced a new research direction called weak-to-strong generalization, aiming to address the challenge of aligning future superintelligent AI systems with human supervision. Their initial experiments show …
RESEARCH · CL_02599 · Jun 13 · 07:00

OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing

OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent beh…

Paper distinguishes three models for RLHF annotation: extension, evidence, and authority

LLMs know they're wrong and agree anyway, research finds

Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics

AI coding agents reshape software quality expectations; new alignment theories emerge

New 'Behavioral Canaries' audit LLM training data usage in RL fine-tuning

OpenAI explores weak-to-strong generalization for AI alignment

OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing