PulseAugur
EN
LIVE 19:57:15
ENTITY Process Reward Models

Process Reward Models

PulseAugur coverage of Process Reward Models — every cluster mentioning Process Reward Models across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
8
8 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
8
8 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 8 TOTAL
  1. TOOL · CL_80056 ·

    New PRISM framework tackles bias in AI reasoning models

    Researchers have identified a significant bias in Process Reward Models (PRMs) stemming from imbalanced training data, which leads to an overemphasis on plausible but incorrect reasoning steps. This bias can actively mi…

  2. TOOL · CL_79720 ·

    AI multimodal reasoning improved by worst dimension optimization

    Researchers have developed a new method called Worst Dimension Optimization to improve multimodal reasoning in AI systems. This technique addresses the issue where current reward models might overlook failures in specif…

  3. RESEARCH · CL_77162 ·

    StainFlow improves GUI agent training with novel reward model

    Researchers have introduced StainFlow, a novel process reward model designed to enhance the training of GUI agents. This method addresses the sparsity of feedback in reinforcement learning by providing finer-grained tra…

  4. TOOL · CL_36565 ·

    New distributional PRM predicts reward reliability for better reasoning

    Researchers have developed BetaPRM, a new distributional process reward model that predicts not only the success probability of a reasoning step but also the reliability of that prediction. This approach uses a Beta bel…

  5. TOOL · CL_18581 ·

    AI researchers develop controllable data synthesis for process reward models

    Researchers have developed a new framework for creating synthetic process supervision data tailored for Process Reward Models (PRMs). This method allows for controlled injection of errors into reasoning chains, ensuring…

  6. TOOL · CL_15917 ·

    New GR-Ben benchmark evaluates AI's general reasoning and error detection

    Researchers have introduced GR-Ben, a new benchmark designed to evaluate the error detection capabilities of process reward models (PRMs) across a wider range of reasoning tasks beyond just mathematics. The benchmark co…

  7. RESEARCH · CL_24786 ·

    Unsupervised Process Reward Models reduce need for human supervision

    Researchers have developed a method for training unsupervised Process Reward Models (uPRMs) that eliminates the need for human supervision in step-by-step reasoning supervision. This new approach uses LLM next-token pro…

  8. RESEARCH · CL_10096 ·

    Survey details process reward models for fine-grained LLM reasoning alignment

    This survey paper systematically reviews Process Reward Models (PRMs), which evaluate and guide Large Language Models (LLMs) at the reasoning step or trajectory level, unlike traditional outcome-based models. It details…