Researchers have developed a metric-based approach to assess the quality of text-to-speech (TTS) systems by analyzing voice mapping. The study evaluated six influential TTS models, including VITS, Glow-TTS, and Tacotron 2, using metrics like crest factor, spectrum balance, and cepstral peak prominence (CPPs). Findings indicate that voice range is a key indicator of model capability, with VITS showing the broadest range, while Glow-TTS excels in soft phonation. The research also established that CPPs values between 7-8 dB correlate with natural voice quality, whereas values above 10 dB can result in a robotic sound. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces new metrics for evaluating TTS naturalness and expressiveness, potentially guiding future model development.
RANK_REASON Academic paper proposing a new evaluation framework for TTS systems. [lever_c_demoted from research: ic=1 ai=1.0]