Llama.cpp adds MTP for Mac, improves offline builds

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

The llama.cpp project has introduced a new Metal Performance Tensors (MTP) feature for Mac hardware, showing potential gains in token generation speed. Initial tests on an M2 Ultra indicate that while prompt processing speed remains consistent, token generation can become more variable with MTP enabled, especially at higher context lengths. Additionally, the project has addressed issues with building llama.cpp on air-gapped Macs, requiring specific flags to disable UI downloads during the build process. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Improves performance and usability for local LLM inference on Mac hardware.

RANK_REASON The article discusses improvements and features for an existing open-source software project, rather than a new model release or significant industry-wide event.

Read on dev.to — LLM tag →

COVERAGE [2]

dev.to — LLM tag TIER_1 · SomeOddCodeGuy · 2026-05-18 00:13

Llama.cpp's New MTP on MacOS

<h2> MTP </h2> <p>So I decided to test out the new MTP in llama.cpp on Metal using my M2 Ultra, and figured I'd toss the results up here. This isn't meant to show the maximum tps you can get on Mac hardware; I'd have run it on the M5 Max or M3 Ultra if that were the case. My goal…
dev.to — LLM tag TIER_1 · SomeOddCodeGuy · 2026-05-18 00:03

Building and Running Llama.cpp on an Air-Gapped Mac

<p>If you ever tried to run Llama.cpp on a MacOS device that doesn't have internet on it, you've probably hit the annoying GateKeeper errors that it's downloaded from the internet and you should delete it. Generally I just build from source to avoid that, but I ran into something…

COVERAGE [2]

Llama.cpp's New MTP on MacOS

Building and Running Llama.cpp on an Air-Gapped Mac

RELATED ENTITIES

RELATED TOPICS