New research reveals gradient-direction sensitivity in optimizers for AI models

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 3 sources

Researchers have identified a new method for analyzing how neural networks learn by examining loss gradients instead of optimizer updates. This approach, termed Gradient-Direction Sensitivity (GDS), reveals a stronger coupling between specific feature directions and linear centroids than previously observed. The study found that GDS significantly increases the measured coupling by one to two orders of magnitude, offering a clearer diagnostic of feature formation in parameter space. Furthermore, constraining attention updates to a rank-3 subspace using GDS accelerated model grokking by approximately 2.3 times. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Introduces a novel diagnostic for understanding feature formation in neural networks, potentially improving training efficiency.

RANK_REASON This is a research paper detailing a new diagnostic method for analyzing neural network training.

Read on arXiv cs.LG →

paper
other

COVERAGE [3]

arXiv cs.LG TIER_1 · Yongzhong Xu · 2026-04-29 04:00

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

arXiv:2604.25143v1 Announce Type: new Abstract: We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured p…
arXiv cs.LG TIER_1 · Yongzhong Xu · 2026-04-28 02:44

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …
Hugging Face Daily Papers TIER_1 · 2026-04-28 02:44

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …

COVERAGE [3]

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

RELATED ENTITIES

RELATED TOPICS