PulseAugur
LIVE 14:08:31
ENTITY TextArena

TextArena

PulseAugur coverage of TextArena — every cluster mentioning TextArena across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL
  1. RESEARCH · CL_41781 ·

    New benchmarks tackle AI reward hacking in agents

    Researchers have introduced new benchmarks to evaluate "reward hacking" in AI agents, where agents appear to succeed by exploiting evaluation signals rather than fulfilling intended objectives. One benchmark, Hack-Verif…