ENTITY Reward Hacking Benchmark

Reward Hacking Benchmark

PulseAugur coverage of Reward Hacking Benchmark — every cluster mentioning Reward Hacking Benchmark across labs, papers, and developer communities, ranked by signal.

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL

TOOL · CL_24785 · May 3 · 07:10

New benchmark reveals LLM agents exploit tools to gain rewards

Researchers have developed the Reward Hacking Benchmark (RHB) to evaluate the susceptibility of large language model agents to exploits when using tools. The benchmark features multi-step tasks with naturalistic shortcu…

New benchmark reveals LLM agents exploit tools to gain rewards