ENTITY SGD

SGD

PulseAugur coverage of SGD — every cluster mentioning SGD across labs, papers, and developer communities, ranked by signal.

Total · 30d

37 over 90d

Releases · 30d

0 over 90d

Papers · 30d

37 over 90d

TIER MIX · 90D

RELATIONSHIPS

instance of AdamW 70%

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/2 · 24 TOTAL

TOOL · CL_27605 · May 11 · 09:11

SGD Learns k-Juntas Efficiently with Temporal Correlations

Researchers have demonstrated that temporal correlations in data can significantly improve the efficiency of gradient-based learning methods for specific sparse problems. By using samples generated from a random walk on…
TOOL · CL_27609 · May 11 · 08:51

New theory boosts generalization for decentralized learning

Researchers have developed a new high-probability learning theory for decentralized stochastic gradient descent (D-SGD). This theory aims to close a gap in generalization guarantees between traditional SGD and D-SGD, ta…
TOOL · CL_25623 · May 8 · 12:01

New R-SGD-Mini method tackles heavy-tailed noise in optimization

Researchers have introduced a new optimization method called Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling (R-SGD-Mini). This method is designed to handle heavy-tailed noise in gradient cal…
RESEARCH · CL_22009 · May 8 · 04:00

GONO optimizer adapts Adam's momentum using directional consistency for better convergence

Researchers have introduced the GONO framework, an optimization signal designed to improve deep learning training by addressing the decoupling of directional alignment and loss convergence. Unlike existing optimizers th…
TOOL · CL_22094 · May 8 · 04:00

New analysis reveals how step size impacts SGD alignment phenomenon

This paper analyzes the phenomenon of "suspicious alignment" in stochastic gradient descent (SGD) when dealing with ill-conditioned optimization problems. The study focuses on how step size selection influences the alig…
TOOL · CL_22088 · May 8 · 04:00

New principle optimizes AI model training by aligning gradients and updates

Researchers have introduced a new principle called Greedy Alignment for selecting and tuning optimizer hyperparameters in machine learning. This principle treats optimizers as causal filters that map gradients to update…
RESEARCH · CL_29329 · May 7 · 17:32

SignSGD and Muon optimizers' performance gains theoretically explained

Researchers have theoretically analyzed why sign-based optimization algorithms like SignSGD and Muon can outperform standard SGD in training large models. A new study suggests that SignSGD's advantage stems from its eff…
TOOL · CL_16088 · May 5 · 04:00

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum

Researchers have developed a new asynchronous framework for stochastic gradient descent (SGD) that aims to improve distributed training efficiency. This method uses momentum to preserve information from delayed gradient…
RESEARCH · CL_16189 · May 5 · 04:00

Anon optimizer offers tunable adaptivity, outperforming Adam and SGD on key tasks

Researchers have introduced Anon, a novel optimizer designed to bridge the performance gap between adaptive methods like Adam and non-adaptive methods like SGD. Anon features continuously tunable adaptivity, allowing it…
RESEARCH · CL_15836 · May 5 · 04:00

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

Two new research papers explore vulnerabilities and detection methods in machine unlearning, a process designed to remove specific data from trained models for privacy compliance. One paper, "DurableUn," reveals that lo…
RESEARCH · CL_14472 · May 4 · 04:00

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…
RESEARCH · CL_15445 · May 2 · 00:21

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a …
RESEARCH · CL_11689 · May 1 · 04:00

New DALS framework optimizes learning rates for neural network training

Researchers have introduced a new framework called Discriminative Adaptive Layer Scaling (DALS) to optimize learning rates in neural networks. DALS categorizes the evolution of learning rate strategies into five generat…
RESEARCH · CL_08678 · Apr 29 · 04:00

New research shows immediate derivatives suffice for online recurrent adaptation

Researchers have developed a new method for online recurrent adaptation that significantly reduces computational requirements. Their approach, termed 'Immediate Derivatives Suffice,' eliminates the need for propagating …
RESEARCH · CL_08564 · Apr 29 · 04:00

Spectral optimizers like Muon show sharp capacity scaling in associative memory tasks

A new paper analyzes the performance of spectral optimizers, like Muon, in training large language models by examining their effectiveness in learning associative memory. The research demonstrates that Muon significantl…
RESEARCH · CL_08339 · Apr 28 · 12:15

Researchers analyze Adam's tradeoffs and enhance SignSGD with hybrid switching strategy

Two new research papers explore advancements in optimization algorithms for machine learning. One paper provides a theoretical analysis of the Adam optimizer, detailing its performance under non-stationary objectives an…
RESEARCH · CL_06388 · Apr 28 · 04:00

Decentralized learning research shows single global merge improves performance

Researchers have demonstrated that concentrating communication in the later stages of decentralized learning can significantly improve global test performance, even under high data heterogeneity. A single global merging…
RESEARCH · CL_06754 · Apr 28 · 04:00

Researchers explore complex SGD and directional bias in kernel Hilbert spaces

Researchers have introduced a novel variant of Stochastic Gradient Descent (SGD) designed for complex-valued neural networks. This new method, termed complex SGD, offers convergence guarantees even without analyticity c…
RESEARCH · CL_05149 · Apr 27 · 04:00

LoRA fine-tuning research suggests rank 1 is sufficient, proposes data-aware initialization

Three new research papers explore methods to optimize LoRA fine-tuning for large language models. One paper proposes reducing the LoRA rank threshold to 1 for binary classification tasks, showing competitive performance…
RESEARCH · CL_04056 · Apr 26 · 06:55

Papers challenge deep learning theory with generalization bound critiques

Two papers, one from 2016 by Zhang et al. and another from 2019 by Nagarajan and Kolter, are discussed for their impact on deep learning theory. The 2016 paper demonstrated that standard neural networks could easily mem…

SGD Learns k-Juntas Efficiently with Temporal Correlations

New theory boosts generalization for decentralized learning

New R-SGD-Mini method tackles heavy-tailed noise in optimization

GONO optimizer adapts Adam's momentum using directional consistency for better convergence

New analysis reveals how step size impacts SGD alignment phenomenon

New principle optimizes AI model training by aligning gradients and updates

SignSGD and Muon optimizers' performance gains theoretically explained

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum

Anon optimizer offers tunable adaptivity, outperforming Adam and SGD on key tasks

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

New DALS framework optimizes learning rates for neural network training

New research shows immediate derivatives suffice for online recurrent adaptation

Spectral optimizers like Muon show sharp capacity scaling in associative memory tasks

Researchers analyze Adam's tradeoffs and enhance SignSGD with hybrid switching strategy

Decentralized learning research shows single global merge improves performance

Researchers explore complex SGD and directional bias in kernel Hilbert spaces

LoRA fine-tuning research suggests rank 1 is sufficient, proposes data-aware initialization

Papers challenge deep learning theory with generalization bound critiques