UMo architecture enables real-time co-speech avatar animation

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced UMo, a novel architecture designed for real-time co-speech avatar animation. This system unifies the processing of text, audio, and motion data into a single formulation, enabling more expressive and coherent facial and gesture generation. UMo utilizes a sparse Mixture-of-Experts framework and a keyframe-centric approach to achieve high-fidelity animation with low latency, making it a practical solution for interactive media and virtual production. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This research offers a practical solution for generating high-fidelity, real-time animations for digital avatars, potentially enhancing virtual interactions and media production.

RANK_REASON The cluster contains a new academic paper detailing a novel architecture for a specific AI application. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

UMo
arXiv

COVERAGE [1]

arXiv cs.CV TIER_1 · Yanwen Guo · 2026-05-14 11:56

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of …

COVERAGE [1]

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

RELATED ENTITIES

RELATED TOPICS