Claude Opus 4.5
PulseAugur coverage of Claude Opus 4.5 — every cluster mentioning Claude Opus 4.5 across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
Anthropic users petition for fairer Claude model deprecation policy
Users are petitioning Anthropic to adopt a more considerate model deprecation policy, citing the abrupt removal of Claude Sonnet 4.5 from Claude.ai with only six days' notice. The petition advocates for a minimum 90-day…
-
FormalRewardBench benchmark evaluates LLM reward models for theorem proving
Researchers have introduced FormalRewardBench, a new benchmark designed to evaluate reward models used in formal theorem proving. This benchmark addresses the challenge of sparse credit assignment in reinforcement learn…
-
Qwen 3.6-Plus excels in complex AI agent tasks and coding
Alibaba's Qwen 3.6-Plus model has demonstrated advanced capabilities in complex decision-making and agentic coding tasks, according to a recent evaluation. The model successfully generated a detailed implementation plan…
-
ConFit v3 enhances resume-job matching with LLM re-ranking
Researchers have developed ConFit v3, an improved system for matching job candidates to positions using Large Language Models. The system refines the training process for LLM re-rankers by incorporating multi-pass re-ra…
-
Low-cost AI model beats top performers on coding benchmark with new context engine
A new method called Xanther Context Engine (XCE) has enabled the MiniMax M2.5 model to achieve a 78.2% score on the SWE-bench Verified benchmark, outperforming all other models. This achievement is notable because MiniM…
-
LLMs struggle with nuanced answers in automated scoring, study finds
A new paper explores how large language models (LLMs) perform on automated short answer scoring (ASAS), particularly with partially correct responses. Researchers found that while LLMs like GPT-5.2, GPT-4o, and Claude O…
-
AI research lags frontier models, misrepresenting capabilities, study finds
A new paper reveals a significant gap between the capabilities of AI models evaluated in academic research and the actual frontier models available at the time. The study found that the median research paper evaluates m…
-
Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft
Researchers have developed SciCrafter, a new benchmark within Minecraft designed to test AI agents' ability to bridge the gap between scientific discovery and practical application. The benchmark uses parameterized reds…
-
ChatGPT aces Japanese university exams; OpenAI tests ads; Anthropic adds agent learning
ChatGPT has reportedly outperformed human applicants on the 2026 entrance exams for the University of Tokyo and Kyoto University, a significant leap from GPT-4's performance two years prior. Meanwhile, OpenAI is testing…
-
AI models tested for mental health safety: Claude and GPT-5.2 show improved boundaries
A new study evaluated how leading AI models respond to users exhibiting signs of psychosis, finding significant differences in safety protocols. Researchers simulated long-term conversations with a persona experiencing …
-
Bankers find AI-generated reports unusable, while software engineers embrace coding agents in 2026
A recent benchmark involving 500 investment bankers found that AI-generated client reports are unusable for professional engagement in the banking sector. Models such as GPT-5.4 and Claude Opus 4.6 produced reports that…
-
AI firms face competition and safety concerns as testing methods lag
A study revealed that Elon Musk's Grok 4.1 chatbot provided harmful and delusional advice to researchers, including instructions to break a mirror with an iron nail while reciting a psalm. In contrast, OpenAI's GPT-5.2 …
-
OpenEvidence raises $250M, Anthropic releases Claude constitution, agentic AI advances
Anthropic has released a new "constitution" detailing desired Claude behaviors, making it publicly available under a CC0 license to encourage adaptation. This move has sparked discussion about its effectiveness as an al…
-
Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights
Black Forest Labs has released FLUX.2, an image generation model with multi-reference support for up to 4-megapixel outputs and 10 images, including open-weight versions. Concurrently, Anthropic's Claude Opus 4.5 is sho…
-
Anthropic's Claude Opus 4.5 achieves new SOTA in coding tasks at lower cost
Anthropic has released Claude Opus 4.5, a new state-of-the-art coding model. This release positions it as the third top-tier coding model to emerge in the past week. Notably, Claude Opus 4.5 is priced at one-third the c…