Topic

#llama-cpp

5 articles exploring llama-cpp. Expert insights and analysis from our editorial team.

Showing 1–5 of 5 articles

Articles

Newest first
Open Source

Off Grid v0.0.88 Ships Hexagon HTP Acceleration: Auditability Is the Real Edge Over Apple Intelligence

Off Grid v0.0.88 ships Hexagon HTP/NPU text acceleration with a self-reported 3× speed gain. Auditability of the MIT source is its genuine advantage over Apple Intelligence.

Developer Tools

LiteRT-LM v0.10.1 Ships Gemma 4 MTP Heads That llama.cpp Can't Access

LiteRT-LM v0.10.1 ships Gemma 4 with Qualcomm NPU acceleration, but Google stripped MTP heads from public weights, locking peak Gemma 4 throughput to its own runtime.

Infrastructure & Runtime

MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference

MLX delivers 20-87% faster generation on Apple Silicon for models under 14B parameters. llama.cpp wins for cross-platform use and long contexts.

· 9 min read
Agents & Frameworks

GGML Joins Hugging Face: What It Means for Local AI

Hugging Face acquired ggml-org, the team behind llama.cpp, on February 20, 2026. This strategic move ensures the long-term sustainability of the world's most popular local AI inference framework while accelerating its integration with the broader ML ecosystem.

· 8 min read
Infrastructure & Runtime

The Complete Guide to Local LLMs in 2026

Why [running AI on your own hardware](/articles/vllm-block-level-preemption-and-flexkv-shift-the-long-context-bottleneck-from/) is becoming the default choice for privacy-conscious developers and enterprises alike