PulseAugur
EN
LIVE 21:27:05
ENTITY Terminal-Bench-2.0

Terminal-Bench-2.0

PulseAugur coverage of Terminal-Bench-2.0 — every cluster mentioning Terminal-Bench-2.0 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
9
9 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
5
5 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL
  1. TOOL · CL_79558 ·

    Self-Harness enables LLM agents to improve their own operational harnesses

    Researchers have developed a novel method called Self-Harness, enabling LLM-based agents to autonomously improve their own operational harnesses. This iterative process involves identifying model-specific failure patter…

  2. TOOL · CL_68283 ·

    Research: Interaction trajectories boost AI agent generalization

    A new research paper explores the effectiveness of interaction trajectories for training AI agents, finding that standalone performance doesn't dictate teaching efficacy. Surprisingly, agents fine-tuned on trajectories …

  3. TOOL · CL_60204 ·

    AI coding agents: GPT-5.5, Claude Sonnet 4.6, Gemini 3.5 Flash compared

    A recent comparison evaluated three AI coding agents: OpenAI's Codex (powered by GPT-5.5), Anthropic's Claude Code (using Claude Sonnet 4.6), and Google's Antigravity (with Gemini 3.5 Flash). The experiment focused on r…

  4. TOOL · CL_35928 ·

    Local LLMs struggle with real-world terminal tasks despite benchmark success

    Local large language models often perform poorly on multi-step terminal tasks despite excelling at standard benchmarks like MMLU. This discrepancy arises because traditional benchmarks measure single-turn reasoning, fai…

  5. TOOL · CL_34986 ·

    Llama.cpp adds MTP, new Gemma-4 finetune released, Qwen 3.6 excels locally

    The llama.cpp project has integrated Multi-head Attention Parallelism (MTP), leading to an 11.5% speed increase for 27B Qwen models in local inference. A new finetuned Gemma-4 model, optimized for creative writing and a…

  6. SIGNIFICANT · CL_26039 ·

    Qwen 3.6-Plus excels in complex AI agent tasks and coding

    Alibaba's Qwen 3.6-Plus model has demonstrated advanced capabilities in complex decision-making and agentic coding tasks, according to a recent evaluation. The model successfully generated a detailed implementation plan…

  7. RESEARCH · CL_07734 ·

    Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

    Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…

  8. RESEARCH · CL_47566 ·

    Anthropic's 'Mythos' AI too risky for public release

    Anthropic has developed a new AI model named Claude Mythos, which demonstrates significant advancements in benchmark performance, particularly in identifying software vulnerabilities. Due to its advanced capabilities in…

  9. FRONTIER RELEASE · CL_01718 ·

    Google DeepMind launches Gemini 3 Pro with advanced coding and agentic capabilities

    Google DeepMind has launched Gemini 3 Pro, their latest and most intelligent model, which demonstrates significant improvements in reasoning and coding capabilities. This new model surpasses previous versions and excels…