PulseAugur
LIVE 08:13:17
ENTITY softmax attention

softmax attention

PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.

Total · 30d
6
6 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
6
6 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 6 TOTAL
  1. RESEARCH · CL_20487 ·

    New research explains how transformers perform in-context learning via gradient descent

    Two new arXiv papers explore the theoretical underpinnings of in-context learning (ICL) in transformers. One paper demonstrates how transformers can perform in-context logistic regression by implicitly executing normali…

  2. RESEARCH · CL_15493 ·

    Linearizing Vision Transformer with Test-Time Training

    Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…

  3. RESEARCH · CL_14475 ·

    Transformers' expressive power explained by new measure-theoretic framework

    Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…

  4. RESEARCH · CL_11887 ·

    Sigmoid attention improves biological foundation models with faster, stable training

    Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…

  5. RESEARCH · CL_06270 ·

    Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

    Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…

  6. RESEARCH · CL_05008 ·

    New architectures and frameworks target LLM serving bottlenecks for long contexts

    Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …