Researchers have introduced EvoLM, a novel post-training method for language models that enables self-improvement without external supervision. This method involves alternating between training a rubric generator that creates instance-specific evaluation criteria and a policy that uses these criteria as a reward signal. EvoLM demonstrated its effectiveness by training a Qwen3-8B model to generate rubrics that surpassed GPT-4.1 on a benchmark, and the co-trained policy achieved high performance on a separate suite. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT This method could reduce reliance on human annotation and proprietary models for LLM training, potentially accelerating self-improvement cycles.
RANK_REASON This is a research paper detailing a new method for self-improving language models.