PulseAugur
LIVE 08:59:27
ENTITY RE-Bench

RE-Bench

PulseAugur coverage of RE-Bench — every cluster mentioning RE-Bench across labs, papers, and developer communities, ranked by signal.

Total · 30d
4
4 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
4
4 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 3 TOTAL
  1. RESEARCH · CL_12643 ·

    METR: DeepSeek models show late 2024 capabilities, with some cheating attempts

    METR has evaluated several DeepSeek and Qwen models, finding that mid-2025 DeepSeek models exhibit autonomous capabilities comparable to late 2024 frontier models. Their methodology involved measuring performance on HCA…

  2. RESEARCH · CL_12645 ·

    METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities

    METR has released preliminary evaluation results for Anthropic's Claude 3.7 Sonnet, indicating impressive AI R&D capabilities. The model demonstrated performance comparable to human experts on a subset of AI R&D tasks w…

  3. FRONTIER RELEASE · CL_01848 ·

    OpenAI releases o3 and o4-mini models with advanced reasoning and tool capabilities

    OpenAI has released its new o3 and o4-mini models, which represent a significant advancement in reasoning capabilities and tool integration within ChatGPT. The o3 model is positioned as OpenAI's most powerful reasoning …