AIME 2025
PulseAugur coverage of AIME 2025 — every cluster mentioning AIME 2025 across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
Ideogram 4.0 leads open image model releases; Microsoft details MAI-Thinking-1
Ideogram has released version 4.0 of its open-source image generation model, which is now considered the best available in its category. This release, alongside Reve's advancements, highlights significant progress in AI…
-
NVIDIA quantizes Alibaba's Qwen3.6-35B model for efficient deployment
NVIDIA has released a quantized version of Alibaba's Qwen3.6-35B-A3B model, named nvidia/Qwen3.6-35B-A3B-NVFP4. This model utilizes the NVFP4 data type, reducing memory requirements by approximately 3.06x while maintain…
-
New benchmark reveals LLM reasoning failures and Claude's refusals
Researchers have developed the Robust Reasoning Benchmark (RRB), a new evaluation pipeline that tests large language models on mathematical problems with deliberate textual perturbations. The benchmark revealed that whi…
-
New methods enhance on-policy distillation for LLM training
Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student e…
-
NVIDIA Star Elastic embeds multiple reasoning models in one checkpoint
NVIDIA researchers have introduced Star Elastic, a novel post-training method that embeds multiple reasoning models of varying parameter sizes within a single checkpoint. This approach allows for the extraction of small…
-
New RLVR method enhances LLM reasoning with positive-negative prompt pairing
Researchers have developed a new method called prompt-efficient RLVR that improves the training of large language models for reasoning tasks. This technique focuses on selecting prompts that provide both positive anchor…
-
New RL method optimizes agent training by controlling rollout pass rates
Researchers have developed a new technique called Prefix Sampling (PS) to improve the efficiency of reinforcement learning (RL) for AI agents. This method addresses wasted compute on rollout groups with skewed pass rate…
-
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
Researchers have developed a new framework called Verbal Process Supervision (VPS) that enhances the reasoning capabilities of large language models without requiring gradient updates. This method utilizes structured na…