Building a Local LLM Gateway on a Mac Mini — Wintersalmon

Why a gateway at all

The cluster has ten pods that want to talk to an LLM, and I don’t want any of them holding API keys. Instead I stood up a small OpenAI-compatible proxy on the Mac Mini next to the Kubernetes box and pointed everything at it over WireGuard. This post is the boring, production-flavored version of that build.

If you haven’t read the intro post yet, that’s where this blog — and the broader #homelab project — kicks off.

Architecture

One service, three responsibilities: auth, routing, streaming. Clients speak the OpenAI /v1/chat/completions shape; the proxy rewrites to Ollama’s native API where it has to and forwards the stream back untouched.

Things I care about:

Latency (local LAN, sub-10ms)
Per-service quota (token/minute, not req/minute)
Model allow-lists — I don’t want a rogue job pulling 70B weights

Deployment

The only thing running on bare metal is the Mac Mini. Everything else — the gateway’s Kubernetes shadow, the cert-manager Issuer, the WireGuard sidecar — lives in the cluster. More #ai and #ollama notes coming as I wire up the dashboards.