ENTITY GPQA Diamond

GPQA Diamond

PulseAugur coverage of GPQA Diamond — every cluster mentioning GPQA Diamond across labs, papers, and developer communities, ranked by signal.

Total · 30d

4 over 90d

Releases · 30d

0 over 90d

Papers · 30d

4 over 90d

TIER MIX · 90D

frontier release 1
tool 2
commentary 1

RECENT · PAGE 1/1 · 8 TOTAL

RESEARCH · CL_21935 · May 8 · 04:00

Apple's RVPO framework enhances LLM alignment by penalizing reward variance

Researchers have introduced Reward-Variance Policy Optimization (RVPO), a novel framework designed to improve the alignment of large language models with multiple objectives. Unlike existing methods that average rewards…
COMMENTARY · CL_20705 · May 7 · 04:27

AI models: Choose benchmarks over hype for true performance

A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
TOOL · CL_20624 · May 7 · 04:00

New fine-tuning method boosts LLM knowledge injection without paraphrasing

Researchers have developed a new fine-tuning method called Diffusion-Inspired Masked Fine-Tuning (DMT) for autoregressive large language models (LLMs). This technique aims to improve the injection of factual knowledge i…
RESEARCH · CL_14447 · May 4 · 04:00

New method enhances LLM reasoning diversity without sacrificing stability

Researchers have introduced Expert-Sample, a novel training-free method designed to enhance the performance of fine-grained Mixture-of-Experts (MoE) models. This technique addresses the trade-off between diversity and s…
RESEARCH · CL_14144 · Apr 30 · 20:30

State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

Researchers have developed the State Stream Transformer (SST) V2, an architectural innovation designed to enhance latent space reasoning in language models. Unlike standard transformers that reset context at each step, …
RESEARCH · CL_03564 · Apr 25 · 19:13

FINAL-Bench/Darwin-36B-Opus · Hugging Face

The Darwin-36B-Opus model, a 36-billion-parameter mixture-of-experts language model, has been released. It was created using the Darwin V7 evolutionary breeding engine, combining aspects of Qwen/Qwen3.6-35B-A3B and a Cl…
RESEARCH · CL_02960 · Apr 23 · 12:36

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Researchers have developed a new framework called Verbal Process Supervision (VPS) that enhances the reasoning capabilities of large language models without requiring gradient updates. This method utilizes structured na…
FRONTIER RELEASE · CL_02231 · Dec 11 · 10:00

OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk

OpenAI has released GPT-5.2, a new model demonstrating significant advancements in mathematical and scientific reasoning. The model achieved high scores on benchmarks like GPQA Diamond and FrontierMath, indicating impro…

Apple's RVPO framework enhances LLM alignment by penalizing reward variance

AI models: Choose benchmarks over hype for true performance

New fine-tuning method boosts LLM knowledge injection without paraphrasing

New method enhances LLM reasoning diversity without sacrificing stability

State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

FINAL-Bench/Darwin-36B-Opus · Hugging Face

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk