How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

7.3 relevance

Guide to replacing hosted LLM with self-hosted, practical and directly relevant.

AI/ML dev.to

How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

Summary

A developer migrated two production apps — a terminal-style portfolio and PayChasers email generator — from Gemini 3 Flash to a self-hosted Qwen 3.5 model via Ollama, driven by cost shape, privacy, and the desire to treat inference as shared infrastructure rather than a metered API. The model runs on a Mac mini at home, exposed through a Cloudflare Tunnel reverse proxy with no open ports, while an Oracle Cloud ARM instance serves as a fallback backup. The move required building a lightweight proxy and accepting the security tradeoffs of routing production data through personal hardware.

Author

Simangaliso Vilakazi