ENTITY GSM8K

GSM8K

PulseAugur coverage of GSM8K — every cluster mentioning GSM8K across labs, papers, and developer communities, ranked by signal.

Total · 30d

21 over 90d

Releases · 30d

0 over 90d

Papers · 30d

21 over 90d

TIER MIX · 90D

RELATIONSHIPS

instance of Math-500 70%

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/2 · 21 TOTAL

TOOL · CL_30784 · May 13 · 10:09

New framework CANTANTE optimizes LLM agent systems via credit attribution

Researchers have introduced CANTANTE, a new framework designed to optimize multi-agent systems powered by large language models. This system addresses the challenge of assigning credit for performance by decomposing sys…
TOOL · CL_29427 · May 12 · 10:18

New Yoked Feature Preference Optimization enhances LLM math reasoning

Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely …
TOOL · CL_28283 · May 11 · 16:26

AI reasoning studies flawed by focus on final answer, not computation

A new research paper identifies a significant flaw in chain-of-thought (CoT) corruption studies, which are used to evaluate the faithfulness of AI reasoning. The study found that these evaluations often mistakenly ident…
TOOL · CL_25615 · May 8 · 12:58

New RL algorithm fix boosts GSM8K accuracy by 45 points

Researchers have identified a critical issue in the Group Relative Policy Optimization (GRPO) algorithm when applied to binary rewards, leading to "gradient starvation." This occurs when all responses in a group are eit…
TOOL · CL_25616 · May 8 · 12:54

New research reveals "coupling tax" limits LLM reasoning accuracy

A new research paper introduces the concept of a "coupling tax" in large language models, highlighting how shared token budgets for reasoning and final answers can hinder accuracy. The study found that for certain tasks…
TOOL · CL_25591 · May 8 · 11:19

LLM framework CIKA pinpoints causally relevant math concepts

Researchers have developed a new framework called CIKA to improve large language model (LLM) mathematical reasoning by identifying causally relevant concepts. Unlike previous methods that struggled with spurious associa…
TOOL · CL_25604 · May 8 · 07:22

LoRA rank allocation fails in RL fine-tuning, study finds

A new study on the Qwen 2.5 1.5B model reveals that adaptive rank allocation techniques, effective in supervised fine-tuning, do not translate to reinforcement learning with Group Relative Policy Optimization (GRPO). Re…
TOOL · CL_22493 · May 8 · 04:00

AI models use policy-guided routing for cost-effective reasoning on math tasks

Researchers have developed a new method for cost-effective reasoning in large language models by implementing a policy-guided stepwise model routing system. This approach formulates the routing of intermediate chain-of-…
RESEARCH · CL_18290 · May 5 · 15:44

QKVShare framework enables efficient quantized KV-cache handoff for on-device LLMs

Researchers have developed QKVShare, a framework designed to improve the efficiency of transferring latent context between agents in multi-agent LLM systems operating on edge devices. This approach utilizes quantized KV…
RESEARCH · CL_18265 · May 5 · 01:13

Researchers find Transformers know counts but struggle to output them

A new paper identifies a specific bottleneck in Transformer models that hinders their ability to perform counting tasks. Researchers found that while models like Pythia, Qwen3, and Mistral store count information accura…
RESEARCH · CL_11818 · May 1 · 04:00

New LenVM model offers token-level length control for LLMs

Researchers have developed a new framework called the Length Value Model (LenVM) that predicts the remaining generation length for tokens in large language models. This token-level approach models length as a value esti…
RESEARCH · CL_11738 · May 1 · 04:00

BoostLoRA method grows adapter rank to surpass full fine-tuning

Researchers have introduced BoostLoRA, a novel parameter-efficient fine-tuning method designed to enhance model expressivity without increasing inference overhead. This technique iteratively trains and merges small adap…
RESEARCH · CL_14144 · Apr 30 · 20:30

State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

Researchers have developed the State Stream Transformer (SST) V2, an architectural innovation designed to enhance latent space reasoning in language models. Unlike standard transformers that reset context at each step, …
RESEARCH · CL_10517 · Apr 30 · 10:24

IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…
RESEARCH · CL_06627 · Apr 28 · 04:00

New research reveals hidden states in LLMs contain task-solving information

Researchers have investigated the information encoded within the hidden states of language models during chain-of-thought (CoT) reasoning. By using activation patching on the GSM8K dataset, they found that individual Co…
RESEARCH · CL_06321 · Apr 27 · 13:45

Researchers launch Gammaf, an open-source framework for benchmarking LLM multi-agent system security

Researchers have introduced GAMMAF, an open-source framework designed to benchmark anomaly detection methods in Large Language Model (LLM) multi-agent systems. This platform addresses the lack of standardized evaluation…
RESEARCH · CL_05211 · Apr 27 · 04:00

Language agents use auction to cut communication costs and boost reasoning

Researchers have developed a new framework called DALA (Dynamic Auction-based Language Agent) to improve communication efficiency in multi-agent systems powered by large language models. This system treats communication…
RESEARCH · CL_05134 · Apr 27 · 04:00

Multi-Token Prediction via Self-Distillation

Researchers have developed a novel self-distillation technique to accelerate language model inference. This method transforms a standard autoregressive model into a faster multi-token predictor without needing auxiliary…
RESEARCH · CL_05034 · Apr 24 · 06:34

New research suggests LLM self-correction can degrade performance if not carefully managed.

A new research paper introduces a control-theoretic framework to analyze when iterative self-correction in large language models (LLMs) is beneficial or detrimental. The study proposes a diagnostic based on error correc…
RESEARCH · CL_04999 · Apr 24 · 00:20

Researchers explore optimal LoRA placement in hybrid language models

A new paper explores the optimal placement of LoRA adapters in hybrid language models, which combine attention and recurrent components. The research demonstrates that adapting the attention pathway is more effective th…

New framework CANTANTE optimizes LLM agent systems via credit attribution

New Yoked Feature Preference Optimization enhances LLM math reasoning

AI reasoning studies flawed by focus on final answer, not computation

New RL algorithm fix boosts GSM8K accuracy by 45 points

New research reveals "coupling tax" limits LLM reasoning accuracy

LLM framework CIKA pinpoints causally relevant math concepts

LoRA rank allocation fails in RL fine-tuning, study finds

AI models use policy-guided routing for cost-effective reasoning on math tasks

QKVShare framework enables efficient quantized KV-cache handoff for on-device LLMs

Researchers find Transformers know counts but struggle to output them

New LenVM model offers token-level length control for LLMs

BoostLoRA method grows adapter rank to surpass full fine-tuning

State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

New research reveals hidden states in LLMs contain task-solving information

Researchers launch Gammaf, an open-source framework for benchmarking LLM multi-agent system security

Language agents use auction to cut communication costs and boost reasoning

Multi-Token Prediction via Self-Distillation

New research suggests LLM self-correction can degrade performance if not carefully managed.

Researchers explore optimal LoRA placement in hybrid language models