LLMs are more than just next token predictors, argues LessWrong

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

The concept of Large Language Models (LLMs) being solely "next token predictors" is misleading, according to a LessWrong post. While pre-training involves predicting the subsequent token in a sequence, this process forces models to learn complex language structures, grammar, and even factual information to make accurate predictions. The author argues that this training regime, when applied extensively, pushes LLMs beyond simple guessing and towards a more sophisticated understanding. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Challenges a common framing of LLM capabilities, potentially influencing how their limitations are discussed.

RANK_REASON Opinion piece arguing against a common framing of LLMs.

Read on LessWrong (AI tag) →

COVERAGE [1]

LessWrong (AI tag) TIER_1 · Adam Newgas · 2026-05-17 11:58

Next Token Prediction is a Misleading Term

I’m fed up of hearing about how LLMs are next token predictors, and therefore they <cannot do some task> <aren’t really doing cognition> <are just guessing>.There’s lots of philosophical objections, b…

COVERAGE [1]

Next Token Prediction is a Misleading Term

RELATED ENTITIES

RELATED TOPICS