The llama.cpp project has introduced a new Metal Performance Tensors (MTP) feature for Mac hardware, showing potential gains in token generation speed. Initial tests on an M2 Ultra indicate that while prompt processing speed remains consistent, token generation can become more variable with MTP enabled, especially at higher context lengths. Additionally, the project has addressed issues with building llama.cpp on air-gapped Macs, requiring specific flags to disable UI downloads during the build process. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Improves performance and usability for local LLM inference on Mac hardware.
RANK_REASON The article discusses improvements and features for an existing open-source software project, rather than a new model release or significant industry-wide event.