Group Relative Policy Optimization (GRPO)
PulseAugur coverage of Group Relative Policy Optimization (GRPO) — every cluster mentioning Group Relative Policy Optimization (GRPO) across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
New SLAS method enhances text-to-image model training
Researchers have developed a new method called Super-Linear Advantage Shaping (SLAS) to improve text-to-image models trained with reinforcement learning. This technique addresses reward hacking by reshaping the policy s…
-
LoRA rank allocation fails in RL fine-tuning, study finds
A new study on the Qwen 2.5 1.5B model reveals that adaptive rank allocation techniques, effective in supervised fine-tuning, do not translate to reinforcement learning with Group Relative Policy Optimization (GRPO). Re…
-
New SRPO method enhances multimodal reasoning in vision-language models
Researchers have introduced Structured Role-aware Policy Optimization (SRPO), a novel method to enhance the reasoning abilities of large vision-language models (LVLMs). SRPO addresses the limitation of current reinforce…