PulseAugur
LIVE 19:34:58
research · [1 source] ·

Together AI boosts training speed 90% with NVIDIA Blackwell

Together AI has announced enhanced performance for its GPU clusters utilizing NVIDIA's Blackwell platform and its own Together Kernel Collection. This optimization reportedly achieves 90% faster training speeds for a 70B parameter model compared to NVIDIA's H100 platform, reaching over 15,000 tokens per second per node. The improvements stem from custom FP8 kernels that leverage Blackwell's new Tensor Cores and on-chip memory, outperforming previous solutions like FlashAttention-3. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Accelerates AI training and inference, potentially lowering costs and enabling larger models with faster development cycles.

RANK_REASON This is a significant announcement regarding infrastructure optimization and performance gains for AI training, involving major hardware and software components. [lever_c_demoted from significant: ic=1 ai=0.7]

Read on Together AI blog →

Together AI boosts training speed 90% with NVIDIA Blackwell