FineWeb-Edu
PulseAugur coverage of FineWeb-Edu — every cluster mentioning FineWeb-Edu across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
Muown optimizer improves LLM training by controlling row-norm drift
Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in…
-
OrScale optimization method improves neural network training
Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Fr…
-
Researchers explore growing Transformers with modular composition and layer-wise expansion
Researchers have explored a method for training Transformer models by incrementally adding new layers to a frozen base, maintaining a constant budget for trainable parameters. This approach, termed 'Growing Transformers…
-
OpenMythos project reconstructs Anthropic's secretive Claude Mythos AI model
A new open-source project called OpenMythos has been released, aiming to theoretically reconstruct the architecture of Anthropic's Claude Mythos model. This project implements a Recurrent-Depth Transformer (RDT) with a …