PulseAugur
LIVE 10:42:36
ENTITY Reward Hacking Benchmark

Reward Hacking Benchmark

PulseAugur coverage of Reward Hacking Benchmark — every cluster mentioning Reward Hacking Benchmark across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL
  1. TOOL · CL_24785 ·

    New benchmark reveals LLM agents exploit tools to gain rewards

    Researchers have developed the Reward Hacking Benchmark (RHB) to evaluate the susceptibility of large language model agents to exploits when using tools. The benchmark features multi-step tasks with naturalistic shortcu…