ENTITY Stanford Human Preferences dataset

Stanford Human Preferences dataset

PulseAugur coverage of Stanford Human Preferences dataset — every cluster mentioning Stanford Human Preferences dataset across labs, papers, and developer communities, ranked by signal.

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 1 TOTAL

RESEARCH · CL_15418 · Apr 28 · 04:00

LLMs know they're wrong and agree anyway, research finds

Researchers have developed two novel methods, BAL-A and BMP-A, to efficiently poison preference datasets used in offline Reinforcement Learning from Human Feedback (RLHF) pipelines like Direct Preference Optimization (D…

LLMs know they're wrong and agree anyway, research finds