Skip to content

How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

7.3 relevance
Score Breakdown
technical depth
8
novelty
7
actionability
8
community
5
strategic
5
personal
9

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Guide to replacing hosted LLM with self-hosted, practical and directly relevant.

AI/ML dev.to
How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps
Summary

A developer migrated two production apps — a terminal-style portfolio and PayChasers email generator — from Gemini 3 Flash to a self-hosted Qwen 3.5 model via Ollama, driven by cost shape, privacy, and the desire to treat inference as shared infrastructure rather than a metered API. The model runs on a Mac mini at home, exposed through a Cloudflare Tunnel reverse proxy with no open ports, while an Oracle Cloud ARM instance serves as a fallback backup. The move required building a lightweight proxy and accepting the security tradeoffs of routing production data through personal hardware.

Author

Simangaliso Vilakazi

More from Simangaliso Vilakazi →