Hugging Face introduces Graph Memory Transformer replacing FFNs with learned memory graphs

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a Graph Memory Transformer (GMT) that replaces the standard Feed-Forward Network (FFN) sublayer in decoder-only transformers with an explicit learned memory graph. This new architecture maintains causal self-attention but uses a memory cell to route token representations over a bank of centroids connected by a directed transition matrix. While the GMT model, with 82.2M parameters, trains stably and offers inspectable components, it currently underperforms a dense GPT-style baseline in validation loss and perplexity, though it shows comparable zero-shot benchmark behavior. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a novel architecture for transformers that may offer greater interpretability and potentially different scaling properties.

RANK_REASON The cluster describes a research paper introducing a novel transformer architecture.

Read on Hugging Face Daily Papers →

COVERAGE [1]

Hugging Face Daily Papers TIER_1 · 2026-04-26 20:09

Graph Memory Transformer (GMT)

We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention i…

COVERAGE [1]

Graph Memory Transformer (GMT)

RELATED ENTITIES

RELATED TOPICS