New benchmark and training method boost Indic language speech recognition

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced Vividh-ASR, a new benchmark designed to evaluate and improve automatic speech recognition (ASR) for Indic languages like Hindi and Malayalam. This benchmark addresses the 'studio-bias' phenomenon where ASR models fine-tuned for low-resource languages perform worse on spontaneous speech than on read speech. Through experiments with learning rates and curriculum ordering, they found that specific training strategies can significantly improve performance, particularly for spontaneous audio, and developed a parameter-efficient training recipe called reverse multi-stage fine-tuning (R-MFT). The study also revealed that effective adaptation concentrates on the decoder part of the model, preserving the encoder's core acoustic information. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new benchmark and training method to improve speech recognition for low-resource Indic languages, potentially enabling broader AI accessibility.

RANK_REASON The cluster describes a new academic paper introducing a benchmark and training methodology for speech recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Kumarmanas Nethil · 2026-05-13 06:55

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and…

COVERAGE [1]

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

RELATED ENTITIES

RELATED TOPICS