ENTITY AdamW

AdamW

PulseAugur coverage of AdamW — every cluster mentioning AdamW across labs, papers, and developer communities, ranked by signal.

Total · 30d

15 over 90d

Releases · 30d

0 over 90d

Papers · 30d

15 over 90d

TIER MIX · 90D

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL

RESEARCH · CL_28033 · May 12 · 08:07

Tilde Research launches Aurora optimizer to fix neuron death in Muon

Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become pe…
RESEARCH · CL_29333 · May 11 · 16:54

Paper details uniform scaling limits in AdamW-trained transformers

Researchers have published a paper detailing uniform scaling limits in transformers trained with the AdamW optimizer. The study models hidden-state dynamics as an interacting particle system, demonstrating convergence t…
RESEARCH · CL_28256 · May 11 · 16:26

Muown optimizer improves LLM training by controlling row-norm drift

Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in…
TOOL · CL_27538 · May 11 · 05:07

New research links optimizers to mode connectivity in neural networks

Researchers have explored the role of optimizers in mode connectivity within neural networks, a concept previously underexplored. Their work demonstrates that solutions generated by a single optimizer, such as AdamW or …
TOOL · CL_25579 · May 8 · 14:47

OrScale optimization method improves neural network training

Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Fr…
TOOL · CL_22088 · May 8 · 04:00

New principle optimizes AI model training by aligning gradients and updates

Researchers have introduced a new principle called Greedy Alignment for selecting and tuning optimizer hyperparameters in machine learning. This principle treats optimizers as causal filters that map gradients to update…
RESEARCH · CL_22009 · May 8 · 04:00

GONO optimizer adapts Adam's momentum using directional consistency for better convergence

Researchers have introduced the GONO framework, an optimization signal designed to improve deep learning training by addressing the decoupling of directional alignment and loss convergence. Unlike existing optimizers th…
RESEARCH · CL_22113 · May 8 · 04:00

New research links optimizer choice to reduced forgetting in LLM finetuning

Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowled…
TOOL · CL_21042 · May 7 · 08:37

Meta AI launches NeuralBench to standardize brain signal AI model evaluation

Meta AI has introduced NeuralBench, an open-source framework designed to standardize the evaluation of AI models that analyze brain signals. The initial release, NeuralBench-EEG v1.0, is the most extensive benchmark of …
TOOL · CL_20375 · May 7 · 04:00

New MetaAdamW optimizer uses self-attention for adaptive learning rates

Researchers have developed MetaAdamW, a novel optimizer that enhances adaptive learning rates and weight decay by employing a self-attention mechanism. This Transformer-based approach dynamically adjusts hyperparameters…
TOOL · CL_18808 · May 6 · 04:00

New FIBER optimizer enhances differential privacy for AI training

Researchers have introduced FIBER, a novel differentially private optimizer designed to enhance the performance of models trained with temporally filtered gradients. FIBER addresses the issue of miscalibrated bias corre…
TOOL · CL_26988 · May 5 · 07:02

New FiBeR optimizer boosts private AI model training

Researchers have developed FiBeR, a new differentially private optimizer designed to improve training performance for models that use temporal filtering on their gradients. This method addresses issues where standard DP…
RESEARCH · CL_14472 · May 4 · 04:00

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…
RESEARCH · CL_10117 · Apr 30 · 04:00

AdaFRUGAL paper introduces dynamic controls for memory-efficient LLM training

Researchers have developed AdaFRUGAL, a new framework designed to make training Large Language Models (LLMs) more memory-efficient. Unlike previous methods that required manual tuning of hyperparameters, AdaFRUGAL autom…
RESEARCH · CL_08353 · Apr 28 · 02:44

New research reveals gradient-direction sensitivity in optimizers for AI models

Researchers have identified a new method for analyzing how neural networks learn by examining loss gradients instead of optimizer updates. This approach, termed Gradient-Direction Sensitivity (GDS), reveals a stronger c…
RESEARCH · CL_03546 · Apr 24 · 11:05

New Rose optimizer offers low VRAM, fast convergence, and great results

A new PyTorch optimizer named Rose has been released under the Apache 2.0 license. Developed by Matthew K., Rose is designed to be stateless, offering significantly lower VRAM usage compared to optimizers like AdamW, wi…

Tilde Research launches Aurora optimizer to fix neuron death in Muon

Paper details uniform scaling limits in AdamW-trained transformers

Muown optimizer improves LLM training by controlling row-norm drift

New research links optimizers to mode connectivity in neural networks

OrScale optimization method improves neural network training

New principle optimizes AI model training by aligning gradients and updates

GONO optimizer adapts Adam's momentum using directional consistency for better convergence

New research links optimizer choice to reduced forgetting in LLM finetuning

Meta AI launches NeuralBench to standardize brain signal AI model evaluation

New MetaAdamW optimizer uses self-attention for adaptive learning rates

New FIBER optimizer enhances differential privacy for AI training

New FiBeR optimizer boosts private AI model training

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

AdaFRUGAL paper introduces dynamic controls for memory-efficient LLM training

New research reveals gradient-direction sensitivity in optimizers for AI models

New Rose optimizer offers low VRAM, fast convergence, and great results