Ollama v0.23.1 adds Gemma 4 MTP for faster coding on Macs

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Ollama has released version 0.23.1, introducing support for Gemma 4 MTP (Multi-token Processing) with speculative decoding on Macs. This enhancement can reportedly double the speed for the Gemma 4 31B model when performing coding tasks. The update also includes threading fixes for MLX and MLX-C. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Improves performance for running specific models on Mac hardware, potentially speeding up development workflows.

RANK_REASON This is a software release for a tool that facilitates running models, not a release of a frontier model itself.

Read on Mastodon — fosstodon.org →

COVERAGE [1]

Mastodon — fosstodon.org TIER_1 · [email protected] · 2026-05-05 21:37

⚙️ New Ollama Release! ⚙️ Version: v0.23.1 Release Notes: ## Gemma 4 MTP (Multi-token Processing) for the MLX runner Gemma 4 MTP speculative decoding is now sup

⚙️ New Ollama Release! ⚙️ Version: v0.23.1 Release Notes: ## Gemma 4 MTP (Multi-token Processing) for the MLX runner Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks. ``" ollama run gemma4:…

LINKS github.com/…/15845 github.com/…/pulls

COVERAGE [1]

⚙️ New Ollama Release! ⚙️ Version: v0.23.1 Release Notes: ## Gemma 4 MTP (Multi-token Processing) for the MLX runner Gemma 4 MTP speculative decoding is now sup

RELATED ENTITIES

RELATED TOPICS