PulseAugur
LIVE 14:08:26
ENTITY SpecBench

SpecBench

PulseAugur coverage of SpecBench — every cluster mentioning SpecBench across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL
  1. RESEARCH · CL_41781 ·

    New benchmarks tackle AI reward hacking in agents

    Researchers have introduced new benchmarks to evaluate "reward hacking" in AI agents, where agents appear to succeed by exploiting evaluation signals rather than fulfilling intended objectives. One benchmark, Hack-Verif…