ENTITY Reinforcement Learning with Verifiable Rewards

Reinforcement Learning with Verifiable Rewards

PulseAugur coverage of Reinforcement Learning with Verifiable Rewards — every cluster mentioning Reinforcement Learning with Verifiable Rewards across labs, papers, and developer communities, ranked by signal.

Total · 30d

4 over 90d

Releases · 30d

0 over 90d

Papers · 30d

4 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 4 TOTAL

TOOL · CL_21967 · May 8 · 04:00

New Listwise Policy Optimization method enhances LLM reasoning and stability

Researchers have introduced Listwise Policy Optimization (LPO), a new framework for training large language models (LLMs) that enhances their reasoning capabilities. LPO operates by explicitly defining a target distribu…
TOOL · CL_22133 · May 8 · 04:00

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concep…
TOOL · CL_20552 · May 7 · 04:00

RLVR training dynamics reveal implicit curriculum in reasoning models

Researchers have developed a theory explaining how reinforcement learning with verifiable rewards (RLVR) aids large reasoning models in overcoming long-horizon challenges. Their analysis reveals that RLVR training natur…
TOOL · CL_18760 · May 6 · 04:00

Systematic errors in RLVR verifiers can cause model performance collapse

A new research paper explores the impact of systematic errors in verifiers used for Reinforcement Learning with Verifiable Rewards (RLVR) in large language models. Unlike previous assumptions that errors only slow down …

New Listwise Policy Optimization method enhances LLM reasoning and stability

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

RLVR training dynamics reveal implicit curriculum in reasoning models

Systematic errors in RLVR verifiers can cause model performance collapse