PulseAugur
LIVE 03:43:43
research · [3 sources] ·
0
research

Researchers propose per-sample clipping for robust and fast AI model training

Researchers have developed a new training method called per-sample clipped SGD (PS-Clip-SGD) that improves robustness and speed for non-convex optimization problems. This method offers theoretical guarantees for convergence even with heavy-tailed gradient noise. Empirical tests showed PS-Clip-SGD outperformed standard techniques when training AlexNet on CIFAR-100, and it also demonstrated benefits when used with gradient accumulation. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Introduces a novel training technique that could lead to more efficient and stable model development.

RANK_REASON Academic paper detailing a new optimization method for machine learning.

Read on arXiv stat.ML →

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 ·

    Robust and Fast Training via Per-Sample Clipping

    We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex op…

  2. arXiv stat.ML TIER_1 · Davide Nobile, Philipp Grohs ·

    Robust and Fast Training via Per-Sample Clipping

    arXiv:2605.02701v1 Announce Type: cross Abstract: We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal …

  3. arXiv stat.ML TIER_1 · Philipp Grohs ·

    Robust and Fast Training via Per-Sample Clipping

    We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex op…