Vision Transformer uses core-periphery attention for linear scaling

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed VECA, a novel Vision Transformer architecture that addresses the quadratic computational cost associated with high-resolution images. VECA utilizes an efficient linear-time attention mechanism by employing a small set of learned 'core' embeddings that act as a communication interface for patch tokens. This core-periphery structure allows patch tokens to interact indirectly through the cores, reducing complexity from quadratic to linear and enabling elastic trade-offs between compute and accuracy. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new attention mechanism that could enable Vision Transformers to scale more efficiently to higher resolutions and complex tasks.

RANK_REASON The cluster contains a new academic paper detailing a novel model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Andrew F. Luo · 2026-05-12 17:59

Elastic Attention Cores for Scalable Vision Transformers

Vision Transformers (ViTs) achieve strong data-driven scaling by leveraging all-to-all self-attention. However, this flexibility incurs a computational cost that scales quadratically with image resolution, limiting ViTs in high-resolution domains. Underlying this approach is the …

COVERAGE [1]

Elastic Attention Cores for Scalable Vision Transformers

RELATED ENTITIES

RELATED TOPICS