ENTITY RLVR

RLVR

PulseAugur coverage of RLVR — every cluster mentioning RLVR across labs, papers, and developer communities, ranked by signal.

Total · 30d

12 over 90d

Releases · 30d

0 over 90d

Papers · 30d

12 over 90d

TIER MIX · 90D

RELATIONSHIPS

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

TOOL · CL_28315 · May 11 · 16:16

New RLRT method enhances LLM reasoning by reversing teacher signals

Researchers have developed a new method called RLRT, which reverses the typical self-distillation process in large language models. Instead of a teacher model guiding a student, RLRT identifies and reinforces the studen…
TOOL · CL_21953 · May 8 · 04:00

New S-trace method improves RLVR efficiency and credit assignment

Researchers have introduced Selective Eligibility Traces (S-trace), a novel method designed to enhance the reasoning capabilities of large language models within the Reinforcement Learning with Verifiable Rewards (RLVR)…
TOOL · CL_22111 · May 8 · 04:00

P^2O method enhances LLM reasoning by optimizing prompts and policies

Researchers have developed a new method called P^2O (Joint Policy and Prompt Optimization) to address the issue of advantage collapse in Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. T…
TOOL · CL_22082 · May 8 · 04:00

New theory explains RLVR optimization dynamics and step-size thresholds

Researchers have developed a theoretical framework for Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to fine-tune large language models with binary feedback. The study introduces a 'Gradient Ga…
TOOL · CL_21967 · May 8 · 04:00

New Listwise Policy Optimization method enhances LLM reasoning and stability

Researchers have introduced Listwise Policy Optimization (LPO), a new framework for training large language models (LLMs) that enhances their reasoning capabilities. LPO operates by explicitly defining a target distribu…
TOOL · CL_20552 · May 7 · 04:00

RLVR training dynamics reveal implicit curriculum in reasoning models

Researchers have developed a theory explaining how reinforcement learning with verifiable rewards (RLVR) aids large reasoning models in overcoming long-horizon challenges. Their analysis reveals that RLVR training natur…
TOOL · CL_18760 · May 6 · 04:00

Systematic errors in RLVR verifiers can cause model performance collapse

A new research paper explores the impact of systematic errors in verifiers used for Reinforcement Learning with Verifiable Rewards (RLVR) in large language models. Unlike previous assumptions that errors only slow down …
RESEARCH · CL_08671 · Apr 29 · 04:00

New STEER method tackles entropy collapse in LLM reasoning training

Researchers have developed a new method called STEER to address entropy collapse in Reinforcement Learning with Verifiable Rewards (RLVR), a technique crucial for improving LLM reasoning. Existing methods for mitigating…
RESEARCH · CL_08319 · Apr 28 · 09:29

JURY-RL framework enhances LLM reasoning with label-free verifiable rewards

Researchers have developed JURY-RL, a novel framework for label-free reinforcement learning with verifiable rewards (RLVR) designed to improve the reasoning capabilities of large language models. This method separates t…
RESEARCH · CL_06623 · Apr 28 · 04:00

New method uses hidden states to improve AI reasoning credit assignment

Researchers have developed a new method called Span-level Hidden state Enabled Advantage Reweighting (SHEAR) to improve credit assignment in reinforcement learning for language models. SHEAR leverages the Wasserstein di…
RESEARCH · CL_01021 · Dec 18 · 00:00

The State Of LLMs 2025: Progress, Problems, and Predictions

The year 2025 was marked by significant advancements in large language models, particularly in the development of reasoning capabilities. A key breakthrough was DeepSeek's R1 model, which demonstrated that reasoning ski…

New RLRT method enhances LLM reasoning by reversing teacher signals

New S-trace method improves RLVR efficiency and credit assignment

P^2O method enhances LLM reasoning by optimizing prompts and policies

New theory explains RLVR optimization dynamics and step-size thresholds

New Listwise Policy Optimization method enhances LLM reasoning and stability

RLVR training dynamics reveal implicit curriculum in reasoning models

Systematic errors in RLVR verifiers can cause model performance collapse

New STEER method tackles entropy collapse in LLM reasoning training

JURY-RL framework enhances LLM reasoning with label-free verifiable rewards

New method uses hidden states to improve AI reasoning credit assignment

The State Of LLMs 2025: Progress, Problems, and Predictions