PulseAugur
LIVE 06:49:57
significant · [2 sources] · · 中文(ZH) 百度搭子DuMate登顶PinchBench,超越Anthropic拿下全球龙虾执行争霸赛冠军
0
significant

Baidu's DuMate agent tops PinchBench and DeepResearch benchmarks

Baidu's DuMate agent has achieved top rankings on two key benchmarks, PinchBench and DeepResearch Bench. On PinchBench, which evaluates multi-step reasoning and tool use in real-world scenarios, DuMate secured the top two positions, surpassing models from Anthropic and OpenAI. The agent's success is attributed to its end-to-end collaborative Harness architecture, which intelligently handles tasks locally or in the cloud and optimizes context assembly. DuMate also led the DeepResearch Bench, designed for complex research tasks, showcasing its advanced information retrieval and analysis capabilities. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Demonstrates advanced agent capabilities, potentially setting new standards for AI task execution and research.

RANK_REASON Product release and benchmark performance announcement for an AI agent.

Read on 雷峰网 (Leiphone) →

Baidu's DuMate agent tops PinchBench and DeepResearch benchmarks

COVERAGE [2]

  1. 雷峰网 (Leiphone) TIER_1 中文(ZH) ·

    Baidu's DuMate Tops PinchBench, Surpassing Anthropic to Win Global Lobster Execution Championship

    <p>5月8日凌晨,百度搭子DuMate登顶智能体评测基准PinchBench榜首,并在前5位中占据3席,超越Anthropic和OpenAI拿下全球龙虾执行争霸赛冠军。在另外一项DeepResearch深度研究榜单中,DuMate同样位列第一。</p><p>PinchBench是OpenClaw赛道最能体现Agent真实工作能力的评测基准,重点考察Agent在23个真实工作场景下147个任务的多步推理、工具调用和任务闭环能力,并从成功率、速度、成本三个维度综合排名。榜单显示,DuMate以93.3%和93.2%的总成绩包揽前两名。作为对照,Anthro…

  2. Mastodon — sigmoid.social TIER_1 한국어(KO) · [email protected] ·

    A tweet sharing performance results that Baidu Inc. (@Baidu_Inc) DuMate agent function ranked first in PinchBench and DeepResearch Bench, respectively. The content emphasizes the benchmark performance of agent-type AI, demonstrating product competitiveness and actual task execution capabilities.

    Baidu Inc. (@Baidu_Inc) DuMate 에이전트 기능이 PinchBench와 DeepResearch Bench에서 각각 1위를 기록했다는 성능 결과를 공유한 ट्वीट입니다. 에이전트형 AI의 벤치마크 성과를 강조하는 내용으로, 제품 경쟁력과 실제 작업 수행 능력을 보여주는 중요한 업데이트입니다. https:// x.com/Baidu_Inc/status/2052672 359283458273 # agent # benchmark # ai # deeprsearch # productivi…