← All guides
Hardware review 11 min read · Updated April 2026

Mac Mini M4 for Ollama: 3 Months of Real Use

We ran a Mac Mini M4 16GB as an always-on AI server for 3 months. Real power consumption, real benchmarks, and an honest answer on when it's worth the $599 upfront.

The M4 Mac Mini starts at $599. After 3 months running it as an always-on AI server, here's the honest verdict: the electricity costs $6/year, it runs Qwen 2.5 7B at 35–42 tok/s (3× faster than a Hetzner CX32), and it breaks even against a VPS subscription in about 80 months.

The Purchase Decision in Numbers

Platform1 Year3 Years5 Years
Mac Mini M4 (incl. power)$605$617$629
Hetzner CX32 (€6.80/mo)$88$264$440
DigitalOcean ($24/mo)$288$864$1,440

Break-even: ~80 months (6.7 years). You don't buy the Mac for the savings. You buy it for the speed, the privacy, and the permanence.

🔓

Read the full guide — free

Enter your email to unlock this guide and all future ones. No spam, one click to unsubscribe.

Free forever. No credit card. Unsubscribe any time.

Power Consumption: Real Numbers

ScenarioWattskWh/yearCost/year*
Idle (no inference)4 W35$3.50
Active inference (7B)15–20 W
Always-on, 4h inference/day6–8 W avg52–70$5–7

*At $0.10/kWh US average. Even at $0.30/kWh (California, Europe) it's $15–21/year — still trivial.

Performance: MLX vs Ollama

The Mac Mini's real advantage is MLX — Apple's inference framework tuned for Apple Silicon. It's 3–4× faster than Ollama's CPU inference.

ModelMLX (tok/s)Ollama CPU (tok/s)RTX 3060
Qwen 2.5 7B Q435–428–1242–50
Llama 3.3 14B Q420–254–625–30
Llama 3.3 70B Q4✗ (OOM)✗ (OOM)8–12

Setup: MLX + Open WebUI

Step 1: Install MLX

# Install Python + MLX
brew install python@3.11
python3 -m venv ~/venv-mlx
source ~/venv-mlx/bin/activate
pip install mlx-lm

# Test it immediately
python -m mlx_lm.generate   --model mlx-community/Qwen2.5-7B-Instruct-4bit   --prompt "Hello"   --max-tokens 50
# Tokens/sec: ~38  ← first time includes download (~2 GB)

Step 2: Install Ollama (simpler, slightly slower)

brew install ollama
ollama serve &
ollama pull qwen2.5:7b-instruct-q4_k_m

# Ollama is simpler than MLX but 3–4× slower
# Use MLX if you want max speed, Ollama if you want simplicity

Step 3: Open WebUI via Docker

# Install Docker Desktop from docker.com

# Start Open WebUI
docker run -d -p 8080:8080   -e OLLAMA_BASE_URL=http://localhost:11434   --name open-webui   ghcr.io/open-webui/open-webui:latest

# Open http://localhost:8080 — create an account
# Your private ChatGPT interface is ready

Step 4: Auto-start on Boot

cat > ~/Library/LaunchAgents/com.local.ollama.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key><string>com.local.ollama</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key><true/>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.local.ollama.plist

Step 5: Share Across Your Home Network

# Install Tailscale (free)
brew install tailscale
tailscale up

# Get your Tailscale IP
tailscale ip -4
# Output: 100.x.x.x

# From any device on Tailscale:
# Open WebUI: http://100.x.x.x:8080
# Ollama API: http://100.x.x.x:11434

3-Month Honest Verdict

Buy it if…
  • You run Ollama 4+ hours/day
  • Privacy matters — data stays at home
  • You want the fastest <$1k inference
  • You'll use it for 3+ years
  • You want always-on with no monthly bill
Skip it if…
  • You want lowest 3-year cost (CX32 wins)
  • You need 30B+ models (need 32 GB+)
  • You're still testing / not sure you'll use it
  • You need 99.95% uptime SLA
  • Remote management is important to you

The honest recommendation: Start with a Hetzner CX32 ($88/year, no commitment). After 3 months of daily use, you'll know if you want the Mac Mini for the speed and permanence.