PulseAugur
LIVE 09:18:01
research · [1 source] ·
0
research

Hugging Face introduces Graph Memory Transformer replacing FFNs with learned memory graphs

Researchers have developed a Graph Memory Transformer (GMT) that replaces the standard Feed-Forward Network (FFN) sublayer in decoder-only transformers with an explicit learned memory graph. This new architecture maintains causal self-attention but uses a memory cell to route token representations over a bank of centroids connected by a directed transition matrix. While the GMT model, with 82.2M parameters, trains stably and offers inspectable components, it currently underperforms a dense GPT-style baseline in validation loss and perplexity, though it shows comparable zero-shot benchmark behavior. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a novel architecture for transformers that may offer greater interpretability and potentially different scaling properties.

RANK_REASON The cluster describes a research paper introducing a novel transformer architecture.

Read on Hugging Face Daily Papers →

COVERAGE [1]

  1. Hugging Face Daily Papers TIER_1 ·

    Graph Memory Transformer (GMT)

    We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention i…