New HERMES and DSCache methods improve streaming video understanding with KV cache

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 3 sources

Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory system, allowing for faster processing and better accuracy with reduced memory usage. Another method, DSCache, decouples past and present KV caches and uses position-agnostic encoding to handle unbounded streams and generalize to longer sequences than models were trained on. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT New techniques for KV cache management could significantly improve real-time video analysis capabilities for LLMs.

RANK_REASON Two arXiv papers introduce novel architectures for efficient streaming video understanding using KV cache mechanisms.

Read on arXiv cs.CL →

paper
infra

COVERAGE [3]

arXiv cs.AI TIER_1 · Yiwei Wang · 2026-05-08 15:40

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual…
arXiv cs.CL TIER_1 · Haowei Zhang, Shudong Yang, Jinlan Fu, See-Kiong Ng, Xipeng Qiu · 2026-05-08 04:00

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

arXiv:2601.14724v4 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging…
arXiv cs.CV TIER_1 · Zhanzhong Pang, Dibyadip Chatterjee, Fadime Sener, Angela Yao · 2026-05-05 04:00

Decouple and Cache: KV Cache Construction for Streaming Video Understanding

arXiv:2605.01858v1 Announce Type: new Abstract: Streaming video understanding requires processing unbounded video streams with limited memory and computation, posing two key challenges. First, continuously constructing new and evicting old key-value(KV) caches is required for unb…

COVERAGE [3]

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Decouple and Cache: KV Cache Construction for Streaming Video Understanding

RELATED ENTITIES

RELATED TOPICS