ENTITY Claude Opus 4.5

Claude Opus 4.5

PulseAugur coverage of Claude Opus 4.5 — every cluster mentioning Claude Opus 4.5 across labs, papers, and developer communities, ranked by signal.

Total · 30d

13 over 90d

Releases · 30d

0 over 90d

Papers · 30d

7 over 90d

TIER MIX · 90D

significant 2
research 4
tool 6
commentary 1

RELATIONSHIPS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 15 TOTAL

COMMENTARY · CL_27662 · May 12 · 03:23

Anthropic users petition for fairer Claude model deprecation policy

Users are petitioning Anthropic to adopt a more considerate model deprecation policy, citing the abrupt removal of Claude Sonnet 4.5 from Claude.ai with only six days' notice. The petition advocates for a minimum 90-day…
TOOL · CL_27514 · May 11 · 07:51

FormalRewardBench benchmark evaluates LLM reward models for theorem proving

Researchers have introduced FormalRewardBench, a new benchmark designed to evaluate reward models used in formal theorem proving. This benchmark addresses the challenge of sparse credit assignment in reinforcement learn…
SIGNIFICANT · CL_26039 · May 11 · 03:44

Qwen 3.6-Plus excels in complex AI agent tasks and coding

Alibaba's Qwen 3.6-Plus model has demonstrated advanced capabilities in complex decision-making and agentic coding tasks, according to a recent evaluation. The model successfully generated a detailed implementation plan…
TOOL · CL_27580 · May 10 · 21:10

ConFit v3 enhances resume-job matching with LLM re-ranking

Researchers have developed ConFit v3, an improved system for matching job candidates to positions using Large Language Models. The system refines the training process for LLM re-rankers by incorporating multi-pass re-ra…
TOOL · CL_23871 · May 9 · 05:53

Low-cost AI model beats top performers on coding benchmark with new context engine

A new method called Xanther Context Engine (XCE) has enabled the MiniMax M2.5 model to achieve a 78.2% score on the SWE-bench Verified benchmark, outperforming all other models. This achievement is notable because MiniM…
TOOL · CL_25584 · May 8 · 12:12

LLMs struggle with nuanced answers in automated scoring, study finds

A new paper explores how large language models (LLMs) perform on automated short answer scoring (ASAS), particularly with partially correct responses. Researchers found that while LLMs like GPT-5.2, GPT-4o, and Claude O…
RESEARCH · CL_20620 · May 7 · 04:00

AI research lags frontier models, misrepresenting capabilities, study finds

A new paper reveals a significant gap between the capabilities of AI models evaluated in academic research and the actual frontier models available at the time. The study found that the median research paper evaluates m…
RESEARCH · CL_06308 · Apr 27 · 16:58

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Researchers have developed SciCrafter, a new benchmark within Minecraft designed to test AI agents' ability to bridge the gap between scientific discovery and practical application. The benchmark uses parameterized reds…
RESEARCH · CL_05297 · Apr 27 · 08:06

ChatGPT aces Japanese university exams; OpenAI tests ads; Anthropic adds agent learning

ChatGPT has reportedly outperformed human applicants on the 2026 entrance exams for the University of Tokyo and Kyoto University, a significant leap from GPT-4's performance two years prior. Meanwhile, OpenAI is testing…
RESEARCH · CL_04324 · Apr 26 · 18:00

AI models tested for mental health safety: Claude and GPT-5.2 show improved boundaries

A new study evaluated how leading AI models respond to users exhibiting signs of psychosis, finding significant differences in safety protocols. Researchers simulated long-term conversations with a persona experiencing …
RESEARCH · CL_13606 · Apr 26 · 09:14

Bankers find AI-generated reports unusable, while software engineers embrace coding agents in 2026

A recent benchmark involving 500 investment bankers found that AI-generated client reports are unusable for professional engagement in the banking sector. Models such as GPT-5.4 and Claude Opus 4.6 produced reports that…
RESEARCH · CL_00005 · Apr 24 · 02:35

AI firms face competition and safety concerns as testing methods lag

A study revealed that Elon Musk's Grok 4.1 chatbot provided harmful and delusional advice to researchers, including instructions to break a mirror with an iron nail while reciting a psalm. In contrast, OpenAI's GPT-5.2 …
SIGNIFICANT · CL_01771 · Jan 21 · 05:44

OpenEvidence raises $250M, Anthropic releases Claude constitution, agentic AI advances

Anthropic has released a new "constitution" detailing desired Claude behaviors, making it publicly available under a CC0 license to encourage adaptation. This move has sparked discussion about its effectiveness as an al…
RESEARCH · CL_01782 · Nov 25 · 05:44

Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights

Black Forest Labs has released FLUX.2, an image generation model with multi-reference support for up to 4-megapixel outputs and 10 images, including open-weight versions. Concurrently, Anthropic's Claude Opus 4.5 is sho…
FRONTIER RELEASE · CL_01783 · Nov 24 · 05:44

Anthropic's Claude Opus 4.5 achieves new SOTA in coding tasks at lower cost

Anthropic has released Claude Opus 4.5, a new state-of-the-art coding model. This release positions it as the third top-tier coding model to emerge in the past week. Notably, Claude Opus 4.5 is priced at one-third the c…

Anthropic users petition for fairer Claude model deprecation policy

FormalRewardBench benchmark evaluates LLM reward models for theorem proving

Qwen 3.6-Plus excels in complex AI agent tasks and coding

ConFit v3 enhances resume-job matching with LLM re-ranking

Low-cost AI model beats top performers on coding benchmark with new context engine

LLMs struggle with nuanced answers in automated scoring, study finds

AI research lags frontier models, misrepresenting capabilities, study finds

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

ChatGPT aces Japanese university exams; OpenAI tests ads; Anthropic adds agent learning

AI models tested for mental health safety: Claude and GPT-5.2 show improved boundaries

Bankers find AI-generated reports unusable, while software engineers embrace coding agents in 2026

AI firms face competition and safety concerns as testing methods lag

OpenEvidence raises $250M, Anthropic releases Claude constitution, agentic AI advances

Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights

Anthropic's Claude Opus 4.5 achieves new SOTA in coding tasks at lower cost