ENTITY Process Reward Models

Process Reward Models

PulseAugur coverage of Process Reward Models — every cluster mentioning Process Reward Models across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

8 over 90d

Releases · 30d

0 over 90d

Papers · 30d

8 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 8 TOTAL

TOOL · CL_80056 · Jun 9 · 04:00

New PRISM framework tackles bias in AI reasoning models

Researchers have identified a significant bias in Process Reward Models (PRMs) stemming from imbalanced training data, which leads to an overemphasis on plausible but incorrect reasoning steps. This bias can actively mi…
TOOL · CL_79720 · Jun 9 · 04:00

AI multimodal reasoning improved by worst dimension optimization

Researchers have developed a new method called Worst Dimension Optimization to improve multimodal reasoning in AI systems. This technique addresses the issue where current reward models might overlook failures in specif…
RESEARCH · CL_77162 · Jun 5 · 08:17

StainFlow improves GUI agent training with novel reward model

Researchers have introduced StainFlow, a novel process reward model designed to enhance the training of GUI agents. This method addresses the sparsity of feedback in reinforcement learning by providing finer-grained tra…
TOOL · CL_36565 · May 15 · 01:57

New distributional PRM predicts reward reliability for better reasoning

Researchers have developed BetaPRM, a new distributional process reward model that predicts not only the success probability of a reasoning step but also the reliability of that prediction. This approach uses a Beta bel…
TOOL · CL_18581 · May 6 · 04:00

AI researchers develop controllable data synthesis for process reward models

Researchers have developed a new framework for creating synthetic process supervision data tailored for Process Reward Models (PRMs). This method allows for controlled injection of errors into reasoning chains, ensuring…
TOOL · CL_15917 · May 5 · 04:00

New GR-Ben benchmark evaluates AI's general reasoning and error detection

Researchers have introduced GR-Ben, a new benchmark designed to evaluate the error detection capabilities of process reward models (PRMs) across a wider range of reasoning tasks beyond just mathematics. The benchmark co…
RESEARCH · CL_24786 · May 4 · 09:36

Unsupervised Process Reward Models reduce need for human supervision

Researchers have developed a method for training unsupervised Process Reward Models (uPRMs) that eliminates the need for human supervision in step-by-step reasoning supervision. This new approach uses LLM next-token pro…
RESEARCH · CL_10096 · Apr 30 · 04:00

Survey details process reward models for fine-grained LLM reasoning alignment

This survey paper systematically reviews Process Reward Models (PRMs), which evaluate and guide Large Language Models (LLMs) at the reasoning step or trajectory level, unlike traditional outcome-based models. It details…

New PRISM framework tackles bias in AI reasoning models

AI multimodal reasoning improved by worst dimension optimization

StainFlow improves GUI agent training with novel reward model

New distributional PRM predicts reward reliability for better reasoning

AI researchers develop controllable data synthesis for process reward models

New GR-Ben benchmark evaluates AI's general reasoning and error detection

Unsupervised Process Reward Models reduce need for human supervision

Survey details process reward models for fine-grained LLM reasoning alignment