ENTITY Kaiqi Zhang

Kaiqi Zhang

PulseAugur coverage of Kaiqi Zhang — every cluster mentioning Kaiqi Zhang across labs, papers, and developer communities, ranked by signal.

Total · 30d

1

1 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

1

1 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 1 TOTAL

TOOL · CL_22111 · May 8 · 04:00

P^2O method enhances LLM reasoning by optimizing prompts and policies

Researchers have developed a new method called P^2O (Joint Policy and Prompt Optimization) to address the issue of advantage collapse in Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. T…