AdamW
PulseAugur coverage of AdamW — every cluster mentioning AdamW across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
Tilde Research launches Aurora optimizer to fix neuron death in Muon
Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become pe…
-
Paper details uniform scaling limits in AdamW-trained transformers
Researchers have published a paper detailing uniform scaling limits in transformers trained with the AdamW optimizer. The study models hidden-state dynamics as an interacting particle system, demonstrating convergence t…
-
Muown optimizer improves LLM training by controlling row-norm drift
Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in…
-
New research links optimizers to mode connectivity in neural networks
Researchers have explored the role of optimizers in mode connectivity within neural networks, a concept previously underexplored. Their work demonstrates that solutions generated by a single optimizer, such as AdamW or …
-
OrScale optimization method improves neural network training
Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Fr…
-
New principle optimizes AI model training by aligning gradients and updates
Researchers have introduced a new principle called Greedy Alignment for selecting and tuning optimizer hyperparameters in machine learning. This principle treats optimizers as causal filters that map gradients to update…
-
GONO optimizer adapts Adam's momentum using directional consistency for better convergence
Researchers have introduced the GONO framework, an optimization signal designed to improve deep learning training by addressing the decoupling of directional alignment and loss convergence. Unlike existing optimizers th…
-
New research links optimizer choice to reduced forgetting in LLM finetuning
Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowled…
-
Meta AI launches NeuralBench to standardize brain signal AI model evaluation
Meta AI has introduced NeuralBench, an open-source framework designed to standardize the evaluation of AI models that analyze brain signals. The initial release, NeuralBench-EEG v1.0, is the most extensive benchmark of …
-
New MetaAdamW optimizer uses self-attention for adaptive learning rates
Researchers have developed MetaAdamW, a novel optimizer that enhances adaptive learning rates and weight decay by employing a self-attention mechanism. This Transformer-based approach dynamically adjusts hyperparameters…
-
New FIBER optimizer enhances differential privacy for AI training
Researchers have introduced FIBER, a novel differentially private optimizer designed to enhance the performance of models trained with temporally filtered gradients. FIBER addresses the issue of miscalibrated bias corre…
-
New FiBeR optimizer boosts private AI model training
Researchers have developed FiBeR, a new differentially private optimizer designed to improve training performance for models that use temporal filtering on their gradients. This method addresses issues where standard DP…
-
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning
A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…
-
AdaFRUGAL paper introduces dynamic controls for memory-efficient LLM training
Researchers have developed AdaFRUGAL, a new framework designed to make training Large Language Models (LLMs) more memory-efficient. Unlike previous methods that required manual tuning of hyperparameters, AdaFRUGAL autom…
-
New research reveals gradient-direction sensitivity in optimizers for AI models
Researchers have identified a new method for analyzing how neural networks learn by examining loss gradients instead of optimizer updates. This approach, termed Gradient-Direction Sensitivity (GDS), reveals a stronger c…
-
New Rose optimizer offers low VRAM, fast convergence, and great results
A new PyTorch optimizer named Rose has been released under the Apache 2.0 license. Developed by Matthew K., Rose is designed to be stateless, offering significantly lower VRAM usage compared to optimizers like AdamW, wi…