OrScale optimization method improves neural network training

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Frobenius norm of the actual parameter-space direction applied. This approach, detailed in a new paper, aims to improve upon existing methods like Muon and AdamW, particularly for language models. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new optimization technique that shows empirical improvements on benchmarks, potentially enhancing model training efficiency.

RANK_REASON The cluster contains a new academic paper detailing a novel research method. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

COVERAGE [1]

arXiv cs.CL TIER_1 · Yang You · 2026-05-08 14:47

OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling

Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominator of a layer-wise r…

COVERAGE [1]

OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling

RELATED ENTITIES

RELATED TOPICS