PulseAugur
LIVE 09:58:30
research · [4 sources] ·
36
research

Nous Research cuts LLM pre-training time by 2.5x with Token Superposition

Nous Research has developed Token Superposition Training (TST), a new method designed to significantly accelerate the pre-training of large language models. This technique can reduce pre-training time by up to 2.5 times for models ranging from 270 million to 10 billion parameters, without altering the model's architecture or how it performs inference. TST achieves this by modifying the training loop in two phases: an initial 'superposition' phase where token embeddings are averaged and processed in larger bags, followed by a 'recovery' phase that reverts to standard training. Experiments showed TST achieving lower final training loss with substantially less compute time compared to traditional methods. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Accelerates LLM pre-training, potentially reducing compute costs and time for developing new large language models.

RANK_REASON Research paper detailing a novel method for accelerating LLM pre-training.

Read on MarkTechPost →

Nous Research cuts LLM pre-training time by 2.5x with Token Superposition

COVERAGE [4]

  1. MarkTechPost TIER_1 · Asif Razzaq ·

    Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

    <p>Nous Research releases Token Superposition Training (TST), a two-phase pre-training method that cuts wall-clock training time by up to 2.5x at matched FLOPs by averaging contiguous token embeddings into bags during Phase 1 and reverting to standard next-token prediction in Pha…

  2. Mastodon — mastodon.social TIER_1 · [email protected] ·

    Nous Research has released Token Superposition Training, a technique that speeds up LLM pre-training by up to 2.5x across models from 270M to 10B parameters. Th

    Nous Research has released Token Superposition Training, a technique that speeds up LLM pre-training by up to 2.5x across models from 270M to 10B parameters. The approach could reduce compute costs significantly for AI labs. https://www. marktechpost.com/2026/05/13/no us-research…

  3. Mastodon — mastodon.social TIER_1 · aihaberleri ·

    📰 Token Superposition Training: Nous Research Speeds LLM Pre-Training 2.5x in 2026 Nous Research has unveiled Token Superposition Training (TST), a novel two-ph

    📰 Token Superposition Training: Nous Research Speeds LLM Pre-Training 2.5x in 2026 Nous Research has unveiled Token Superposition Training (TST), a novel two-phase method that accelerates large language model pre-training by up to 2.5 times without altering model architecture or …

  4. Mastodon — mastodon.social TIER_1 Türkçe(TR) · aihaberleri ·

    📰 LLM Training Accelerates 250% with Token Superposition (2026) Nous Research, 2.5x faster pre-training of large language models (LLMs) from 270M to 10B parameters

    📰 Token Süperpozisyonu ile LLM Eğitimi %250 Hızlanıyor (2026) Nous Research, büyük dil modellerinin (LLM) ön eğitimini 270M'den 10B parametreye kadar 2.5 kata kadar hızlandıran çığır açıcı bir yöntem olan Token Süperpozisyonu Eğitimini duyurdu. Bu teknik, mevcut süperpozisyon teo…