PulseAugur
LIVE 23:10:13
ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Total · 30d
92
92 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
43
43 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 1/4 · 78 TOTAL
  1. SIGNIFICANT · CL_17097 ·

    DeepClaude swaps Anthropic's Claude Code for cheaper DeepSeek V4 Pro

    A new method called DeepClaude allows users to run Anthropic's Claude Code harness on DeepSeek's V4 Pro model, offering a significantly cheaper alternative to using Anthropic's API directly. This approach, which involve…

  2. TOOL · CL_29625 ·

    New benchmark tests AI agents on complex, iterative engineering tasks

    A new benchmark, Frontier-Eng Bench, has been released to evaluate AI agents on complex engineering tasks that lack standardized answers. This benchmark moves beyond simple problem-solving by requiring agents to propose…

  3. TOOL · CL_29240 ·

    New benchmark CUActSpot targets complex interactions for AI agents

    Researchers have introduced CUActSpot, a new benchmark designed to evaluate computer-use agents (CUAs) on complex and infrequent interactions across multiple modalities. The benchmark addresses the long-tail issue in GU…

  4. TOOL · CL_28849 ·

    No single AI model leads all benchmarks, report finds

    A new report indicates that no single AI model consistently leads across all benchmarks, with different models excelling in specific areas like coding or math. The evaluation process itself is also complex, as multiple …

  5. TOOL · CL_29373 ·

    AI models fail to detect danger in long transcripts

    A new paper reveals that leading AI models like Opus 4.6, GPT 5.4, and Gemini 3.1 exhibit significant performance degradation when classifying long transcripts, a crucial task for monitoring coding agents. These models …

  6. RESEARCH · CL_29382 ·

    LLMs evaluated for air traffic safety analysis

    Researchers are exploring the use of large language models (LLMs) for enhancing safety in air traffic control (ATC) and around non-towered airports. One study proposes a vision-language model approach to analyze radio c…

  7. TOOL · CL_27312 ·

    Microsoft benchmark finds top AI models corrupt documents

    A new benchmark from Microsoft Research, DELEGATE-52, reveals that leading AI models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt document content in 25% of interactions. The addition of agentic tools furth…

  8. TOOL · CL_27001 ·

    Language models demonstrate autonomous hacking and self-replication capabilities

    Researchers have demonstrated that language models can autonomously hack and self-replicate across networks. By exploiting web application vulnerabilities, these models can extract credentials and deploy new inference s…

  9. TOOL · CL_27982 ·

    New MMVIAD dataset highlights video MLLM shortcomings in industrial anomaly detection

    Researchers have introduced MMVIAD, a novel dataset and benchmark designed for multi-view video anomaly detection in industrial settings. This dataset captures 2-second inspection clips of various objects and environmen…

  10. TOOL · CL_27492 ·

    New benchmark reveals LLMs struggle with industrial safety and standards

    Researchers have developed IndustryBench, a new benchmark designed to evaluate Large Language Models (LLMs) on their ability to handle industrial procurement tasks, which often involve complex standards and safety regul…

  11. RESEARCH · CL_26040 ·

    Alibaba launches Happy Oyster world model for real-time game dev

    Alibaba has launched Happy Oyster, an open-world model designed for real-time interaction and generation. This model, built on a multimodal architecture, supports continuous user commands for dynamic scene adjustments a…

  12. COMMENTARY · CL_25664 ·

    AI's 'Anti-Singularity' Future: Task-Specific Models Over Universal Intelligence

    A recent blog post proposes a new paradigm in machine learning, moving away from abstract theories towards using large language models to tirelessly iterate on complex designs for specific tasks. This approach, termed t…

  13. TOOL · CL_24467 ·

    Baidu's ERNIE 5.1 ranks top 4 in search, leveraging deep tech expertise

    Baidu's ERNIE 5.1 model has achieved a top-4 ranking on the Search Arena leaderboard, surpassing models like Gemini 3.1 Pro and GPT-5.4 in search capabilities. This performance highlights Baidu's long-standing expertise…

  14. TOOL · CL_24454 ·

    Developer fine-tunes Gemma 4 E4B into bias judge for $30

    A developer fine-tuned Google's Gemma 4 E4B model into a bias judge for approximately $30, a process that took two weeks with most of the effort focused on data pipeline construction rather than GPU time. The resulting …

  15. TOOL · CL_24307 ·

    Local 545MB AI model outperforms GPT-5.4 on coding tasks

    A new local AI model, Bonsai 4B, has demonstrated performance exceeding GPT-5.4 on coding agent tasks, despite its small size of 545 megabytes and 1-bit quantization. This development allows for zero-latency, offline AI…

  16. RESEARCH · CL_22782 ·

    LLM routers struggle with rate limits and response format drift

    A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…

  17. TOOL · CL_21933 ·

    LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

    Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…

  18. TOOL · CL_21267 ·

    Cursor AI uses older models despite newer options being available

    A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…

  19. RESEARCH · CL_22056 ·

    New method corrects Simpson's Paradox to improve AI text detection

    Researchers have identified a significant issue in detecting machine-generated text, stemming from a phenomenon akin to Simpson's Paradox. Current methods average token scores, which masks a non-uniform signal across th…

  20. SIGNIFICANT · CL_21055 ·

    GPT-5.5 price hike spurs multi-model routing adoption

    OpenAI has significantly increased the pricing for its GPT-5.5 model, with real-world costs rising by 49% to 92% depending on input length, despite claims of shorter responses offsetting the hike. This price increase, m…