NVIDIA enables 4-bit LLM pretraining with NVFP4 methodology

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method, supported by NVIDIA's Blackwell Tensor Cores, was validated by successfully pretraining a 12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens. The resulting model achieved performance comparable to an FP8 baseline, demonstrating the viability of 4-bit precision for large-scale model training. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables more efficient training of large language models by reducing precision requirements, potentially lowering compute costs and accelerating development.

RANK_REASON The cluster describes a new pretraining methodology and its validation, presented as a research finding. [lever_c_demoted from research: ic=1 ai=1.0]

Read on MarkTechPost →

infra
paper

NVIDIA enables 4-bit LLM pretraining with NVFP4 methodology

COVERAGE [1]

MarkTechPost TIER_1 · Asif Razzaq · 2026-05-18 08:42

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

<p>NVIDIA introduces a 4-bit pretraining methodology built around the NVFP4 microscaling format — combining selective BF16 layers, 16×16 Random Hadamard Transforms on Wgrad inputs, 2D weight scaling, and stochastic rounding on gradients — validated on a 12B hybrid Mamba-Transform…

COVERAGE [1]

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

RELATED ENTITIES

RELATED TOPICS