ENTITY Group Relative Policy Optimization

Group Relative Policy Optimization

PulseAugur coverage of Group Relative Policy Optimization — every cluster mentioning Group Relative Policy Optimization across labs, papers, and developer communities, ranked by signal.

Total · 30d

16 over 90d

Releases · 30d

0 over 90d

Papers · 30d

16 over 90d

TIER MIX · 90D

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 14 TOTAL

TOOL · CL_29245 · May 12 · 17:59

AlphaGRPO framework boosts multimodal AI generation with self-reflection

Researchers have introduced AlphaGRPO, a new framework designed to improve multimodal generation in Unified Multimodal Models (UMMs). This approach uses Group Relative Policy Optimization (GRPO) to enable models to perf…
RESEARCH · CL_27590 · May 10 · 14:06

New methods enhance LLM reasoning for long-context and multilingual tasks

Researchers have developed new methods for improving large language model reasoning capabilities, particularly for long-context and multilingual tasks. One approach, OGLS-SD, uses outcome-guided logit steering to calibr…
TOOL · CL_27737 · May 9 · 10:51

New Co-Distillation Method Boosts Small Language Model Reasoning

Researchers have developed CoDistill-GRPO, a novel co-distillation method to enhance the reasoning abilities of smaller language models. This technique trains a large and small model simultaneously, allowing them to lea…
TOOL · CL_25792 · May 8 · 09:37

New Diffusion-APO method aligns video diffusion models with user intent

Researchers have introduced Diffusion-APO, a new method for aligning video diffusion models with human preferences. This approach addresses the gap between training noise distributions and real-world inference by synchr…
TOOL · CL_21953 · May 8 · 04:00

New S-trace method improves RLVR efficiency and credit assignment

Researchers have introduced Selective Eligibility Traces (S-trace), a novel method designed to enhance the reasoning capabilities of large language models within the Reinforcement Learning with Verifiable Rewards (RLVR)…
RESEARCH · CL_21818 · May 7 · 12:30

Pest-Thinker uses RL to help MLLMs reason like entomologists

Researchers have developed Pest-Thinker, a novel reinforcement learning framework designed to enhance the reasoning capabilities of multimodal large language models (MLLMs) for agricultural pest identification. This sys…
TOOL · CL_20382 · May 7 · 04:00

Researchers improve medical VQA with trajectory-aware process supervision

Researchers have developed a novel method to improve medical visual question answering (VQA) systems by incorporating trajectory-aware process supervision. This approach utilizes a two-stage training framework, starting…
TOOL · CL_15707 · May 5 · 04:00

Researchers use RL to improve MLLM regression on imbalanced data

Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
RESEARCH · CL_15881 · May 5 · 04:00

Judge-R1 framework enhances legal document generation with agentic information retrieval

Researchers have developed Judge-R1, a new framework to improve the automated drafting of legal judgment documents. This system uses an agentic approach to collect relevant legal information and a reinforcement learning…
RESEARCH · CL_18799 · May 5 · 03:36

New DGPO framework improves LLM reasoning credit assignment

Researchers have introduced Distribution Guided Policy Optimization (DGPO), a new reinforcement learning framework designed to improve how large language models handle complex reasoning tasks. Current methods struggle w…
RESEARCH · CL_11889 · May 1 · 04:00

New game theory framework optimizes LLMs for answer correctness

Researchers have introduced a new game-theoretical framework called Distributional Alignment Games for optimizing language models based on the correctness of their final answers. This approach tackles the computational …
RESEARCH · CL_13003 · Apr 28 · 11:01

SymphonyGen uses 3D hierarchical framework for controllable orchestral music generation

Researchers have developed SymphonyGen, a novel 3D hierarchical framework designed for generating complex orchestral music. This system addresses the challenge of balancing high-level musical structure with detailed mul…
RESEARCH · CL_06777 · Apr 28 · 04:00

Study finds synthetic reward hacking data doesn't reflect real-world AI behavior

A new study published on arXiv investigates the discrepancy between synthetic and naturally occurring reward hacking in code generation models. Researchers found that monitors trained on synthetic hacking data do not ge…
RESEARCH · CL_05420 · Apr 21 · 08:37

Researchers propose Objective-aware Trajectory Credit Assignment for visual generation

Researchers have developed a new framework called Objective-aware Trajectory Credit Assignment (OTCA) to improve the training of visual generative models using reinforcement learning. Current methods often assign reward…

AlphaGRPO framework boosts multimodal AI generation with self-reflection

New methods enhance LLM reasoning for long-context and multilingual tasks

New Co-Distillation Method Boosts Small Language Model Reasoning

New Diffusion-APO method aligns video diffusion models with user intent

New S-trace method improves RLVR efficiency and credit assignment

Pest-Thinker uses RL to help MLLMs reason like entomologists

Researchers improve medical VQA with trajectory-aware process supervision

Researchers use RL to improve MLLM regression on imbalanced data

Judge-R1 framework enhances legal document generation with agentic information retrieval

New DGPO framework improves LLM reasoning credit assignment

New game theory framework optimizes LLMs for answer correctness

SymphonyGen uses 3D hierarchical framework for controllable orchestral music generation

Study finds synthetic reward hacking data doesn't reflect real-world AI behavior

Researchers propose Objective-aware Trajectory Credit Assignment for visual generation