Pulse

last 48h

[4/4] 97 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

RESEARCH · X — Together (inference / OSS) English(EN) · 23h · X

RT @vipulved: PSA: Just added a few thousand chips, including B200s and B300s to our Dedicated Model Inference (https://t.co/sD3mEZtSAa).…

Together AI has significantly expanded its cloud computing resources, adding thousands of new chips including NVIDIA's B200 and B300 accelerators. This move is aimed at bolstering their dedicated model inference services, providing enhanced capabilities for AI model deployment and operation. AI

IMPACT Increases available compute for AI model inference, potentially lowering costs and improving performance for users.
RESEARCH · X — SemiAnalysis English(EN) · 1mo · [3 sources] · X

@manicely6005 The public documentation can be found here too (3/3)

NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codebase for these kernels is largely written in Python CuTe-DSL, with public documentation now available. AI

IMPACT Open-sourcing of cuDNN kernels could accelerate research and development in AI infrastructure and model optimization.
RESEARCH · X — Qwen (Alibaba) English(EN) · 1mo · [3 sources] · X

Forward and backward benchmark results across common configurations. https://t.co/IHMCZRw9AW

Alibaba's Qwen team has released FlashQLA, a new set of high-performance linear attention kernels developed using TileLang. These kernels are designed to improve the efficiency of attention mechanisms in large language models. The team also shared benchmark results for their Qwen models, showcasing performance across various configurations. AI

IMPACT Introduces optimized kernels that could improve LLM inference speed and efficiency.
RESEARCH · X — Google DeepMind English(EN) · 1mo · [6 sources] · X

This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵 https://t.co/YRmPrqIbYE

Google DeepMind has introduced Decoupled DiLoCo, a novel approach to training advanced AI models that enhances resilience and flexibility across data centers. This system can train models like Google's 12B Gemma model across geographically dispersed regions using low-bandwidth networks and can even mix different generations of hardware, such as TPU6e and TPUv5p. Decoupled DiLoCo is designed to be self-healing, isolating and continuing training through artificial hardware failures and reintegrating units when they come back online, addressing the synchronization issues that typically stall AI training. AI

IMPACT Enables more robust and flexible large-scale AI model training, potentially reducing costs and increasing accessibility.

Pulse

RT @vipulved: PSA: Just added a few thousand chips, including B200s and B300s to our Dedicated Model Inference (https://t.co/sD3mEZtSAa).…

@manicely6005 The public documentation can be found here too (3/3)

Forward and backward benchmark results across common configurations. https://t.co/IHMCZRw9AW

This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵 https://t.co/YRmPrqIbYE