LLM dialogue systems face routing tradeoffs for full-duplex interaction

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have explored how large language models can effectively process user input while simultaneously generating spoken responses in full-duplex dialogue systems. They compared two methods: channel fusion, which integrates user input directly into the LLM's input stream, and cross-attention routing, which uses external memory accessed via cross-attention. Channel fusion improved semantic grounding and question-answering accuracy but was susceptible to context corruption during interruptions. Cross-attention routing was more robust to interruptions by preserving the generation context, though it showed lower performance on question answering. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Investigates architectural choices for LLMs in real-time spoken dialogue, impacting future voice assistant and conversational AI development.

RANK_REASON Academic paper detailing a study on LLM dialogue system architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Zhiyong Wu · 2026-05-11 08:46

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generatio…

COVERAGE [1]

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

RELATED ENTITIES

RELATED TOPICS