tool · [1 source] · 2026-05-09 04:04 · Deutsch(DE) RT @leftcurvedev_: Jeder mit 8GB oder 12GB VRAM-Setup muss verstehen, dass "-ncmoe" das entscheidende Flag ist, um die Leistung auf llama.cpp zu steigern. Hier

tool

llama.cpp performance boosted by -ncmoe flag on low-VRAM setups

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

A user on Mastodon shared a tip for optimizing performance on llama.cpp, a popular inference engine for large language models. The key suggestion is to use the "-ncmoe" flag, which is reportedly crucial for boosting performance on setups with 8GB or 12GB of VRAM. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This optimization tip could improve the accessibility and performance of running LLMs on consumer-grade hardware.

RANK_REASON A user-shared tip for optimizing a specific software tool.

Read on Mastodon — mastodon.social →

infra

COVERAGE [1]

Mastodon — mastodon.social TIER_1 Deutsch(DE) · [email protected] · 2026-05-09 04:04

RT @leftcurvedev_: Everyone with an 8GB or 12GB VRAM setup needs to understand that "-ncmoe" is the crucial flag to boost performance on llama.cpp. Here

RT @leftcurvedev_: Jeder mit 8GB oder 12GB VRAM-Setup muss verstehen, dass "-ncmoe" das entscheidende Flag ist, um die Leistung auf llama.cpp zu steigern. Hier sind meine Ergebnisse für Qwen3.6 35B A3B mit 64k q80-Kontext auf einer 8GB RTX 3070Ti: ⚪️ kein Flag → 8,7 tok/s RAM: 13…

COVERAGE [1]

RT @leftcurvedev_: Everyone with an 8GB or 12GB VRAM setup needs to understand that "-ncmoe" is the crucial flag to boost performance on llama.cpp. Here

RELATED ENTITIES

RELATED TOPICS