PulseAugur
LIVE 11:22:54
ENTITY Kaiqi Zhang

Kaiqi Zhang

PulseAugur coverage of Kaiqi Zhang — every cluster mentioning Kaiqi Zhang across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 1 TOTAL
  1. TOOL · CL_22111 ·

    P^2O method enhances LLM reasoning by optimizing prompts and policies

    Researchers have developed a new method called P^2O (Joint Policy and Prompt Optimization) to address the issue of advantage collapse in Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. T…