Mac Mini M4 for Ollama: 3 Months of Real Use — clawdVPS

The M4 Mac Mini starts at $599. After 3 months running it as an always-on AI server, here's the honest verdict: the electricity costs $6/year, it runs Qwen 2.5 7B at 35–42 tok/s (3× faster than a Hetzner CX32), and it breaks even against a VPS subscription in about 80 months.

The Purchase Decision in Numbers

Platform	1 Year	3 Years	5 Years
Mac Mini M4 (incl. power)	$605	$617	$629
Hetzner CX32 (€6.80/mo)	$88	$264	$440
DigitalOcean ($24/mo)	$288	$864	$1,440

Break-even: ~80 months (6.7 years). You don't buy the Mac for the savings. You buy it for the speed, the privacy, and the permanence.

🔓

Read the full guide — free

Enter your email to unlock this guide and all future ones. No spam, one click to unsubscribe.

Free forever. No credit card. Unsubscribe any time.

Power Consumption: Real Numbers

Scenario	Watts	kWh/year	Cost/year*
Idle (no inference)	4 W	35	$3.50
Active inference (7B)	15–20 W	—	—
Always-on, 4h inference/day	6–8 W avg	52–70	$5–7

*At $0.10/kWh US average. Even at $0.30/kWh (California, Europe) it's $15–21/year — still trivial.

Performance: MLX vs Ollama

The Mac Mini's real advantage is MLX — Apple's inference framework tuned for Apple Silicon. It's 3–4× faster than Ollama's CPU inference.

Model	MLX (tok/s)	Ollama CPU (tok/s)	RTX 3060
Qwen 2.5 7B Q4	35–42	8–12	42–50
Llama 3.3 14B Q4	20–25	4–6	25–30
Llama 3.3 70B Q4	✗ (OOM)	✗ (OOM)	8–12

Setup: MLX + Open WebUI

Step 1: Install MLX

# Install Python + MLX
brew install python@3.11
python3 -m venv ~/venv-mlx
source ~/venv-mlx/bin/activate
pip install mlx-lm

# Test it immediately
python -m mlx_lm.generate   --model mlx-community/Qwen2.5-7B-Instruct-4bit   --prompt "Hello"   --max-tokens 50
# Tokens/sec: ~38  ← first time includes download (~2 GB)

Step 2: Install Ollama (simpler, slightly slower)

brew install ollama
ollama serve &
ollama pull qwen2.5:7b-instruct-q4_k_m

# Ollama is simpler than MLX but 3–4× slower
# Use MLX if you want max speed, Ollama if you want simplicity

Step 3: Open WebUI via Docker

# Install Docker Desktop from docker.com

# Start Open WebUI
docker run -d -p 8080:8080   -e OLLAMA_BASE_URL=http://localhost:11434   --name open-webui   ghcr.io/open-webui/open-webui:latest

# Open http://localhost:8080 — create an account
# Your private ChatGPT interface is ready

Step 4: Auto-start on Boot

cat > ~/Library/LaunchAgents/com.local.ollama.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key><string>com.local.ollama</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key><true/>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.local.ollama.plist

Step 5: Share Across Your Home Network

# Install Tailscale (free)
brew install tailscale
tailscale up

# Get your Tailscale IP
tailscale ip -4
# Output: 100.x.x.x

# From any device on Tailscale:
# Open WebUI: http://100.x.x.x:8080
# Ollama API: http://100.x.x.x:11434

3-Month Honest Verdict

Buy it if…

You run Ollama 4+ hours/day
Privacy matters — data stays at home
You want the fastest <$1k inference
You'll use it for 3+ years
You want always-on with no monthly bill

Skip it if…

You want lowest 3-year cost (CX32 wins)
You need 30B+ models (need 32 GB+)
You're still testing / not sure you'll use it
You need 99.95% uptime SLA
Remote management is important to you

The honest recommendation: Start with a Hetzner CX32 ($88/year, no commitment). After 3 months of daily use, you'll know if you want the Mac Mini for the speed and permanence.