Building a Local LLM Gateway on a Mac Mini
Why a gateway at all
The cluster has ten pods that want to talk to an LLM, and I don’t want any of them holding API keys. Instead I stood up a small OpenAI-compatible proxy on the Mac Mini next to the Kubernetes box and pointed everything at it over WireGuard. This post is the boring, production-flavored version of that build.
If you haven’t read the intro post yet, that’s where this blog — and the broader #homelab project — kicks off.
Architecture
One service, three responsibilities: auth, routing, streaming. Clients speak the OpenAI /v1/chat/completions shape; the proxy rewrites to Ollama’s native API where it has to and forwards the stream back untouched.
Things I care about:
- Latency (local LAN, sub-10ms)
- Per-service quota (token/minute, not req/minute)
- Model allow-lists — I don’t want a rogue job pulling 70B weights
Deployment
The only thing running on bare metal is the Mac Mini. Everything else — the gateway’s Kubernetes shadow, the cert-manager Issuer, the WireGuard sidecar — lives in the cluster. More #ai and #ollama notes coming as I wire up the dashboards.
