Researchers discover hidden failure modes in Adam optimizer for continual learning

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have identified a hidden failure mode when gradient modification techniques are combined with the Adam optimizer in continual learning scenarios. This issue, particularly prevalent with shared-routing projection methods, can lead to significant performance degradation, causing models to forget previously learned information. The problem stems from Adam's second-moment pathway, which can inflate effective learning rates when gradients are modified. A proposed solution, adaptive decoupled moment routing, routes modified gradients to the first moment while preserving second-moment statistics, successfully preventing performance collapse across various methods and scales. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Identifies a critical failure mode in common continual learning setups, potentially impacting model robustness and requiring re-evaluation of existing methods.

RANK_REASON Academic paper detailing a novel failure mode and a proposed solution in continual learning.

Read on arXiv cs.AI →

paper
safety

COVERAGE [2]

arXiv cs.LG TIER_1 · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song · 2026-04-27 04:00

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

arXiv:2604.22407v1 Announce Type: new Abstract: Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adap…
arXiv cs.AI TIER_1 · Li Song · 2026-04-24 10:00

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing p…

COVERAGE [2]

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

RELATED ENTITIES

RELATED TOPICS