Researchers have developed new methods to accelerate large language model (LLM) inference. UniVer offers a unified approach to multi-step and multi-draft speculative decoding, improving acceptance length by up to 8.5%. Speculative speculative decoding (SSD) introduces a method to parallelize verification and speculation, with an optimized algorithm called Saguaro achieving up to 5x speedup over autoregressive decoding. Additionally, SpecKV introduces an adaptive controller that dynamically selects speculation length based on model compression and draft model signals, yielding a 56.0% improvement over fixed-length speculation. AI
Summary written by gemini-2.5-flash-lite from 7 sources. How we write summaries →
IMPACT New speculative decoding techniques promise significant speedups in LLM inference, potentially reducing computational costs and latency.
RANK_REASON Multiple arXiv papers introduce novel techniques for accelerating LLM inference.