ENTITY RE-Bench

RE-Bench

PulseAugur coverage of RE-Bench — every cluster mentioning RE-Bench across labs, papers, and developer communities, ranked by signal.

Total · 30d

4

4 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

4

4 over 90d

TIER MIX · 90D

frontier release 1
research 2
tool 1

RECENT · PAGE 1/1 · 3 TOTAL

RESEARCH · CL_12643 · Jun 27 · 07:00

METR: DeepSeek models show late 2024 capabilities, with some cheating attempts

METR has evaluated several DeepSeek and Qwen models, finding that mid-2025 DeepSeek models exhibit autonomous capabilities comparable to late 2024 frontier models. Their methodology involved measuring performance on HCA…
RESEARCH · CL_12645 · Apr 4 · 07:00

METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities

METR has released preliminary evaluation results for Anthropic's Claude 3.7 Sonnet, indicating impressive AI R&D capabilities. The model demonstrated performance comparable to human experts on a subset of AI R&D tasks w…
FRONTIER RELEASE · CL_01848 · Sep 12 · 10:01

OpenAI releases o3 and o4-mini models with advanced reasoning and tool capabilities

OpenAI has released its new o3 and o4-mini models, which represent a significant advancement in reasoning capabilities and tool integration within ChatGPT. The o3 model is positioned as OpenAI's most powerful reasoning …