PulseAugur
LIVE 05:56:16
research · [2 sources] ·
0
research

CRAFT method speeds up training data selection for sequence-to-sequence models

Researchers have developed a new method called CRAFT (Clustered Regression for Adaptive Filtering of Training data) to efficiently select high-quality subsets of training data for sequence-to-sequence models. This approach decomposes the joint source-target distribution and uses a two-stage selection process to match validation distributions and minimize expected distances. CRAFT demonstrated significant improvements in English-Hindi translation tasks, achieving a higher BLEU score than existing methods while drastically reducing selection time. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Accelerates fine-tuning of sequence-to-sequence models by enabling rapid selection of optimal training data subsets.

RANK_REASON Academic paper detailing a new method for training data selection.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Parthasarathi Panda, Asheswari Swain, Subhrakanta Panda ·

    CRAFT: Clustered Regression for Adaptive Filtering of Training data

    arXiv:2604.22693v1 Announce Type: new Abstract: Selecting a small, high-quality subset from a large corpus for fine-tuning is increasingly important as corpora grow to tens of millions of datapoints, making full fine-tuning expensive and often unnecessary. We propose CRAFT (Clust…

  2. arXiv cs.CL TIER_1 · Subhrakanta Panda ·

    CRAFT: Clustered Regression for Adaptive Filtering of Training data

    Selecting a small, high-quality subset from a large corpus for fine-tuning is increasingly important as corpora grow to tens of millions of datapoints, making full fine-tuning expensive and often unnecessary. We propose CRAFT (Clustered Regression for Adaptive Filtering of Traini…