Researchers have developed a novel method to mitigate biases in large language models during the decoding phase, without altering the model's weights. This approach uses a separate Process Reward Model (PRM) to score token candidates for fairness and fluency. The sequential critique-and-revise scheme proved most effective, improving bias scores by up to 0.40 while maintaining fluency. The framework was evaluated on models including GPT-4o-mini, Llama 3.2 3B, Gemma 3 4B, and Qwen 2.5 3B. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Offers a new technique for reducing LLM bias without costly retraining, potentially making safer models more accessible.
RANK_REASON The cluster contains an academic paper detailing a new method for LLM bias mitigation.