PulseAugur
LIVE 04:11:46
ENTITY GSM8K

GSM8K

PulseAugur coverage of GSM8K — every cluster mentioning GSM8K across labs, papers, and developer communities, ranked by signal.

Total · 30d
21
21 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
21
21 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/2 · 21 TOTAL
  1. TOOL · CL_30784 ·

    New framework CANTANTE optimizes LLM agent systems via credit attribution

    Researchers have introduced CANTANTE, a new framework designed to optimize multi-agent systems powered by large language models. This system addresses the challenge of assigning credit for performance by decomposing sys…

  2. TOOL · CL_29427 ·

    New Yoked Feature Preference Optimization enhances LLM math reasoning

    Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely …

  3. TOOL · CL_28283 ·

    AI reasoning studies flawed by focus on final answer, not computation

    A new research paper identifies a significant flaw in chain-of-thought (CoT) corruption studies, which are used to evaluate the faithfulness of AI reasoning. The study found that these evaluations often mistakenly ident…

  4. TOOL · CL_25615 ·

    New RL algorithm fix boosts GSM8K accuracy by 45 points

    Researchers have identified a critical issue in the Group Relative Policy Optimization (GRPO) algorithm when applied to binary rewards, leading to "gradient starvation." This occurs when all responses in a group are eit…

  5. TOOL · CL_25616 ·

    New research reveals "coupling tax" limits LLM reasoning accuracy

    A new research paper introduces the concept of a "coupling tax" in large language models, highlighting how shared token budgets for reasoning and final answers can hinder accuracy. The study found that for certain tasks…

  6. TOOL · CL_25591 ·

    LLM framework CIKA pinpoints causally relevant math concepts

    Researchers have developed a new framework called CIKA to improve large language model (LLM) mathematical reasoning by identifying causally relevant concepts. Unlike previous methods that struggled with spurious associa…

  7. TOOL · CL_25604 ·

    LoRA rank allocation fails in RL fine-tuning, study finds

    A new study on the Qwen 2.5 1.5B model reveals that adaptive rank allocation techniques, effective in supervised fine-tuning, do not translate to reinforcement learning with Group Relative Policy Optimization (GRPO). Re…

  8. TOOL · CL_22493 ·

    AI models use policy-guided routing for cost-effective reasoning on math tasks

    Researchers have developed a new method for cost-effective reasoning in large language models by implementing a policy-guided stepwise model routing system. This approach formulates the routing of intermediate chain-of-…

  9. RESEARCH · CL_18290 ·

    QKVShare framework enables efficient quantized KV-cache handoff for on-device LLMs

    Researchers have developed QKVShare, a framework designed to improve the efficiency of transferring latent context between agents in multi-agent LLM systems operating on edge devices. This approach utilizes quantized KV…

  10. RESEARCH · CL_18265 ·

    Researchers find Transformers know counts but struggle to output them

    A new paper identifies a specific bottleneck in Transformer models that hinders their ability to perform counting tasks. Researchers found that while models like Pythia, Qwen3, and Mistral store count information accura…

  11. RESEARCH · CL_11818 ·

    New LenVM model offers token-level length control for LLMs

    Researchers have developed a new framework called the Length Value Model (LenVM) that predicts the remaining generation length for tokens in large language models. This token-level approach models length as a value esti…

  12. RESEARCH · CL_11738 ·

    BoostLoRA method grows adapter rank to surpass full fine-tuning

    Researchers have introduced BoostLoRA, a novel parameter-efficient fine-tuning method designed to enhance model expressivity without increasing inference overhead. This technique iteratively trains and merges small adap…

  13. RESEARCH · CL_14144 ·

    State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

    Researchers have developed the State Stream Transformer (SST) V2, an architectural innovation designed to enhance latent space reasoning in language models. Unlike standard transformers that reset context at each step, …

  14. RESEARCH · CL_10517 ·

    IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

    IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…

  15. RESEARCH · CL_06627 ·

    New research reveals hidden states in LLMs contain task-solving information

    Researchers have investigated the information encoded within the hidden states of language models during chain-of-thought (CoT) reasoning. By using activation patching on the GSM8K dataset, they found that individual Co…

  16. RESEARCH · CL_06321 ·

    Researchers launch Gammaf, an open-source framework for benchmarking LLM multi-agent system security

    Researchers have introduced GAMMAF, an open-source framework designed to benchmark anomaly detection methods in Large Language Model (LLM) multi-agent systems. This platform addresses the lack of standardized evaluation…

  17. RESEARCH · CL_05211 ·

    Language agents use auction to cut communication costs and boost reasoning

    Researchers have developed a new framework called DALA (Dynamic Auction-based Language Agent) to improve communication efficiency in multi-agent systems powered by large language models. This system treats communication…

  18. RESEARCH · CL_05134 ·

    Multi-Token Prediction via Self-Distillation

    Researchers have developed a novel self-distillation technique to accelerate language model inference. This method transforms a standard autoregressive model into a faster multi-token predictor without needing auxiliary…

  19. RESEARCH · CL_05034 ·

    New research suggests LLM self-correction can degrade performance if not carefully managed.

    A new research paper introduces a control-theoretic framework to analyze when iterative self-correction in large language models (LLMs) is beneficial or detrimental. The study proposes a diagnostic based on error correc…

  20. RESEARCH · CL_04999 ·

    Researchers explore optimal LoRA placement in hybrid language models

    A new paper explores the optimal placement of LoRA adapters in hybrid language models, which combine attention and recurrent components. The research demonstrates that adapting the attention pathway is more effective th…