ENTITY softmax attention

softmax attention

PulseAugur coverage of softmax attention — every cluster mentioning softmax attention across labs, papers, and developer communities, ranked by signal.

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

6 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 6 TOTAL

RESEARCH · CL_20487 · May 7 · 04:00

New research explains how transformers perform in-context learning via gradient descent

Two new arXiv papers explore the theoretical underpinnings of in-context learning (ICL) in transformers. One paper demonstrates how transformers can perform in-context logistic regression by implicitly executing normali…
RESEARCH · CL_15493 · May 4 · 16:16

Linearizing Vision Transformer with Test-Time Training

Researchers have developed a method to adapt pretrained Softmax attention models to linear-complexity architectures using Test-Time Training (TTT). This approach addresses the representational gap between different atte…
RESEARCH · CL_14475 · May 4 · 04:00

Transformers' expressive power explained by new measure-theoretic framework

Researchers have introduced a new measure-theoretic framework to understand the expressive power of Transformer architectures in modeling contextual relations. This framework connects standard softmax attention to entro…
RESEARCH · CL_11887 · May 1 · 04:00

Sigmoid attention improves biological foundation models with faster, stable training

Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…
RESEARCH · CL_06270 · Apr 27 · 12:59

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…
RESEARCH · CL_05008 · Apr 23 · 20:12

New architectures and frameworks target LLM serving bottlenecks for long contexts

Researchers have developed novel architectures and techniques to address the escalating latency and energy consumption challenges in serving large language models (LLMs) with long contexts. One approach, AMMA, proposes …

New research explains how transformers perform in-context learning via gradient descent

Linearizing Vision Transformer with Test-Time Training

Transformers' expressive power explained by new measure-theoretic framework

Sigmoid attention improves biological foundation models with faster, stable training

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

New architectures and frameworks target LLM serving bottlenecks for long contexts