Text To Speech
PulseAugur coverage of Text To Speech — every cluster mentioning Text To Speech across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
End-to-end training unifies TTS components for better speech generation
Researchers have developed a novel end-to-end training framework for discrete token Large Language Model (LLM) based Text-to-Speech (TTS) systems. This approach unifies the training of the speech tokenizer, LLM, a flow-…
-
Tongyi Labs releases top-charting STT/TTS models, open weights questioned
Tongyi Labs has released new speech-to-text (STT) and text-to-speech (TTS) models that are reportedly topping charts. The models were released without significant fanfare, leading to community questions about whether th…
-
LLMs generate synthetic conversations to boost ASR training
Researchers have developed a novel method to enhance Automatic Speech Recognition (ASR) training for low-resource languages by generating synthetic conversational data. This pipeline uses LLMs to create dialogues, maps …
-
AMD EPYC CPUs show competitive performance for LLM and TTS inference workloads
A recent analysis by Leaseweb benchmarks the performance of AMD EPYC 9334 CPUs for Large Language Model (LLM) and Text-to-Speech (TTS) inference workloads. The study reveals that while GPUs offer higher throughput, CPUs…
-
Sakana AI's KAME architecture injects LLM knowledge into speech AI without latency
Sakana AI has developed KAME, a novel tandem architecture for speech-to-speech AI that aims to combine the speed of direct systems with the knowledge depth of LLM-based approaches. KAME operates with two asynchronous co…
-
Tamazight single-speaker speech dataset released on Hugging Face
A new single-speaker speech dataset for the Tamazight language has been released on Hugging Face and the Mozilla Data Collective. This dataset is intended for use in AI applications such as automatic speech recognition …
-
Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis
Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…
-
New benchmark evaluates Indic TTS accent fidelity across six dimensions
Researchers have introduced PSP, a new benchmark designed to evaluate the accent accuracy of text-to-speech (TTS) systems for Indic languages. Unlike existing metrics that focus on intelligibility and naturalness, PSP s…
-
New study evaluates 7 TTS systems for 10 Indian languages
Researchers have developed a new framework for evaluating Text-to-Speech (TTS) systems in Indian languages, addressing the high variance typically seen in crowdsourced evaluations. This framework uses controlled, multid…