New DPO method boosts NMT model performance with preference-based post-training

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have developed a new post-training method for neural machine translation (NMT) systems that utilizes reinforcement learning and Direct Preference Optimization (DPO). This framework requires only a general text corpus and feedback from an expert translator, which can be human or AI. Experiments on English-to-German translation showed that applying this DPO-driven approach to the gemma3-1b model significantly improved its translation quality, increasing the COMET score from 0.703 to 0.747. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Enhances NMT models using preference-based post-training, potentially improving translation accuracy for various language pairs.

RANK_REASON This is a research paper detailing a new method for improving NMT models.

COVERAGE [2]

arXiv cs.CL TIER_1 · Mehrdad Ghassabi, Spehr Rajabi, Hamidreza Baradaran Kashani, Sadra Hakim, Mahshid Keivandarian · 2026-04-29 04:00

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation

arXiv:2604.25702v1 Announce Type: new Abstract: Contemporary neural machine translation (NMT) systems are almost exclusively built by training on supervised parallel data. Despite the tremendous progress achieved, these systems still exhibit persistent translation errors. This pa…
arXiv cs.CL TIER_1 · Mahshid Keivandarian · 2026-04-28 14:29

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation

Contemporary neural machine translation (NMT) systems are almost exclusively built by training on supervised parallel data. Despite the tremendous progress achieved, these systems still exhibit persistent translation errors. This paper proposes that a post-training paradigm based…