PulseAugur
LIVE 04:08:24
research · [2 sources] ·
0
research

LASE model improves cross-script voice cloning by making embeddings language-uninformative

Researchers have developed LASE, a Language-Adversarial Speaker Encoder, to improve multilingual voice cloning. Standard encoders struggle to maintain speaker identity across different scripts, particularly when projecting non-Indic voices into Indic languages. LASE utilizes a novel training approach with a supervised contrastive loss and a gradient-reversal cross-entropy objective to create language-uninformative yet speaker-informative embeddings. This method significantly reduces the identity gap across scripts and enhances cross-script speaker recall with substantially less training data. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Improves cross-script voice cloning accuracy, potentially enabling more seamless multilingual TTS systems.

RANK_REASON The cluster contains an arXiv preprint detailing a new method for speaker encoding in multilingual voice cloning.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Venkata Pushpak Teja Menta ·

    LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

    arXiv:2605.00777v1 Announce Type: cross Abstract: A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pa…

  2. arXiv cs.CL TIER_1 · Venkata Pushpak Teja Menta ·

    LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

    A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, H…