Ollama is now powered by MLX on Apple Silicon in preview
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Ollama optimized for Apple Silicon, important for local LLM inference.
Ollama 0.19 preview on Apple Silicon uses MLX to achieve up to 1810 tokens/s prefill and 112 tokens/s decode with Qwen3.5-35B-A3B in NVFP4 format, doubling speed over 0.18. It leverages M5's GPU Neural Accelerators and unified memory, with enhanced caching for coding agents like Claude Code. Requires Macs with >32GB RAM for optimal performance.
Upgrade to Ollama 0.19 on Apple Silicon to leverage MLX acceleration and NVFP4 for faster local inference with coding agents.
This boosts local inference performance for AI agent development, enabling faster iteration on orchestration systems like Claude Code without cloud latency or costs.