Researchers analyze signal propagation in normalization-free transformers

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have analyzed signal propagation in normalization-free transformers using the averaged partial Jacobian norm (APJN). Their theory explains how attention mechanisms affect APJN growth in deep vision transformers. The study indicates that transformers with LayerNorm exhibit power-law APJN growth, while those using elementwise nonlinearities are subcritical, requiring careful initialization and optimization for stable training. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides theoretical insights into transformer training stability, potentially guiding future architecture design.

RANK_REASON Academic paper analyzing signal propagation in transformer architectures. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

paper
other

COVERAGE [1]

arXiv stat.ML TIER_1 · Sergey Alekseev · 2026-05-05 04:00

Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

arXiv:2604.11890v2 Announce Type: replace-cross Abstract: We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional…

COVERAGE [1]

Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

RELATED ENTITIES

RELATED TOPICS