The M4 Mac Mini starts at $599. After 3 months running it as an always-on AI server, here's the honest verdict: the electricity costs $6/year, it runs Qwen 2.5 7B at 35–42 tok/s (3× faster than a Hetzner CX32), and it breaks even against a VPS subscription in about 80 months.
The Purchase Decision in Numbers
| Platform | 1 Year | 3 Years | 5 Years |
|---|---|---|---|
| Mac Mini M4 (incl. power) | $605 | $617 | $629 |
| Hetzner CX32 (€6.80/mo) | $88 | $264 | $440 |
| DigitalOcean ($24/mo) | $288 | $864 | $1,440 |
Break-even: ~80 months (6.7 years). You don't buy the Mac for the savings. You buy it for the speed, the privacy, and the permanence.
Read the full guide — free
Enter your email to unlock this guide and all future ones. No spam, one click to unsubscribe.
Free forever. No credit card. Unsubscribe any time.
Power Consumption: Real Numbers
| Scenario | Watts | kWh/year | Cost/year* |
|---|---|---|---|
| Idle (no inference) | 4 W | 35 | $3.50 |
| Active inference (7B) | 15–20 W | — | — |
| Always-on, 4h inference/day | 6–8 W avg | 52–70 | $5–7 |
*At $0.10/kWh US average. Even at $0.30/kWh (California, Europe) it's $15–21/year — still trivial.
Performance: MLX vs Ollama
The Mac Mini's real advantage is MLX — Apple's inference framework tuned for Apple Silicon. It's 3–4× faster than Ollama's CPU inference.
| Model | MLX (tok/s) | Ollama CPU (tok/s) | RTX 3060 |
|---|---|---|---|
| Qwen 2.5 7B Q4 | 35–42 | 8–12 | 42–50 |
| Llama 3.3 14B Q4 | 20–25 | 4–6 | 25–30 |
| Llama 3.3 70B Q4 | ✗ (OOM) | ✗ (OOM) | 8–12 |
Setup: MLX + Open WebUI
Step 1: Install MLX
# Install Python + MLX
brew install python@3.11
python3 -m venv ~/venv-mlx
source ~/venv-mlx/bin/activate
pip install mlx-lm
# Test it immediately
python -m mlx_lm.generate --model mlx-community/Qwen2.5-7B-Instruct-4bit --prompt "Hello" --max-tokens 50
# Tokens/sec: ~38 ← first time includes download (~2 GB)
Step 2: Install Ollama (simpler, slightly slower)
brew install ollama
ollama serve &
ollama pull qwen2.5:7b-instruct-q4_k_m
# Ollama is simpler than MLX but 3–4× slower
# Use MLX if you want max speed, Ollama if you want simplicity
Step 3: Open WebUI via Docker
# Install Docker Desktop from docker.com
# Start Open WebUI
docker run -d -p 8080:8080 -e OLLAMA_BASE_URL=http://localhost:11434 --name open-webui ghcr.io/open-webui/open-webui:latest
# Open http://localhost:8080 — create an account
# Your private ChatGPT interface is ready
Step 4: Auto-start on Boot
cat > ~/Library/LaunchAgents/com.local.ollama.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.local.ollama</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/bin/ollama</string>
<string>serve</string>
</array>
<key>RunAtLoad</key><true/>
</dict>
</plist>
EOF
launchctl load ~/Library/LaunchAgents/com.local.ollama.plist
Step 5: Share Across Your Home Network
# Install Tailscale (free)
brew install tailscale
tailscale up
# Get your Tailscale IP
tailscale ip -4
# Output: 100.x.x.x
# From any device on Tailscale:
# Open WebUI: http://100.x.x.x:8080
# Ollama API: http://100.x.x.x:11434
3-Month Honest Verdict
- You run Ollama 4+ hours/day
- Privacy matters — data stays at home
- You want the fastest <$1k inference
- You'll use it for 3+ years
- You want always-on with no monthly bill
- You want lowest 3-year cost (CX32 wins)
- You need 30B+ models (need 32 GB+)
- You're still testing / not sure you'll use it
- You need 99.95% uptime SLA
- Remote management is important to you
The honest recommendation: Start with a Hetzner CX32 ($88/year, no commitment). After 3 months of daily use, you'll know if you want the Mac Mini for the speed and permanence.