On-Policy Distillation
PulseAugur coverage of On-Policy Distillation — every cluster mentioning On-Policy Distillation across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
ProteinOPD framework enhances protein design alignment with 8x speedup
Researchers have developed ProteinOPD, a new framework for aligning protein language models (PLMs) with desired functions. This method adapts pretrained PLMs into specialized teachers and distills their knowledge into a…
-
New methods enhance on-policy distillation for LLMs
Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
-
Researchers refine on-policy distillation for more stable LLM training
Researchers have identified significant empirical failure modes in on-policy distillation (OPD), a technique used for post-training large language models. The standard implementation, which relies on sampled-token log-r…