Researchers have identified a hidden failure mode when gradient modification techniques are combined with the Adam optimizer in continual learning scenarios. This issue, particularly prevalent with shared-routing projection methods, can lead to significant performance degradation, causing models to forget previously learned information. The problem stems from Adam's second-moment pathway, which can inflate effective learning rates when gradients are modified. A proposed solution, adaptive decoupled moment routing, routes modified gradients to the first moment while preserving second-moment statistics, successfully preventing performance collapse across various methods and scales. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Identifies a critical failure mode in common continual learning setups, potentially impacting model robustness and requiring re-evaluation of existing methods.
RANK_REASON Academic paper detailing a novel failure mode and a proposed solution in continual learning.