ENTITY
DPO
DPO
PulseAugur coverage of DPO — every cluster mentioning DPO across labs, papers, and developer communities, ranked by signal.
Total · 30d
40
40 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
34
34 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 2 TOTAL
-
New TBPO method optimizes language models at token level
Researchers have introduced Token-level Bregman Preference Optimization (TBPO), a new method for aligning language models using pairwise preferences. Unlike existing approaches that focus on full sequences, TBPO operate…
-
EvoPref algorithm enhances LLM alignment with evolutionary optimization
Researchers have developed EvoPref, a novel multi-objective evolutionary algorithm designed to improve the alignment of large language models (LLMs). Unlike traditional gradient-based methods that can lead to preference…