Ollama is now powered by MLX on Apple Silicon in preview

8.3 relevance

Ollama optimized for Apple Silicon, important for local LLM inference.

2026-03-31 general Hacker News (100+)

Ollama is now powered by MLX on Apple Silicon in preview

Summary

Ollama 0.19 preview on Apple Silicon uses MLX to achieve up to 1810 tokens/s prefill and 112 tokens/s decode with Qwen3.5-35B-A3B in NVFP4 format, doubling speed over 0.18. It leverages M5's GPU Neural Accelerators and unified memory, with enhanced caching for coding agents like Claude Code. Requires Macs with >32GB RAM for optimal performance.

Key Takeaway

Upgrade to Ollama 0.19 on Apple Silicon to leverage MLX acceleration and NVFP4 for faster local inference with coding agents.

Why it matters

This boosts local inference performance for AI agent development, enabling faster iteration on orchestration systems like Claude Code without cloud latency or costs.