Muown optimizer improves LLM training by controlling row-norm drift

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in weight matrices during training. By treating row-magnitude vectors as explicit variables, Muown enhances perplexity and learning rate stability across various model scales, outperforming existing optimizers like AdamW and Lion. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Improves LLM training efficiency and stability, potentially enabling larger models and faster development cycles.

RANK_REASON The cluster contains an academic paper detailing a new optimization method for language model training.

Read on Hugging Face Daily Papers →

COVERAGE [2]

Hugging Face Daily Papers TIER_1 · 2026-05-11 16:26

Muown: Row-Norm Control for Muon Optimization

Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…
arXiv cs.LG TIER_1 · Niao He · 2026-05-11 16:26

Muown: Row-Norm Control for Muon Optimization

Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…

COVERAGE [2]

Muown: Row-Norm Control for Muon Optimization

Muown: Row-Norm Control for Muon Optimization

RELATED ENTITIES

RELATED TOPICS