Finding the cheapest server for Ollama isn't just about the monthly price. A server that's €2/mo cheaper but 3× slower costs you more in productivity. We tested five providers over 30 days with identical workloads on Qwen 2.5 7B (Q4_K_M).
The Real Cost Comparison
| Provider | Plan | RAM | Price | TPS* | Setup |
|---|---|---|---|---|---|
| Hetzner CX22 | 2 vCPU | 4 GB | €5.89/mo | 2–3 (3B only) | Easy |
| Hetzner CX32 | 4 vCPU | 8 GB | €6.80/mo | 8–12 (7B) | Easy |
| Contabo VPS S | 4 vCPU | 6 GB | $5.99/mo | ~4 (3B) | Medium |
| DigitalOcean | 2 vCPU | 4 GB | $24/mo | 2–3 (3B) | Easiest |
| Vultr Cloud | 2 vCPU | 2 GB | $12/mo | — | Medium |
*TPS = tokens/second on Qwen 2.5 Q4_K_M, CPU inference, 2,048 token context
Read the full guide — free
Enter your email to unlock this guide and all future ones. No spam, one click to unsubscribe.
Free forever. No credit card. Unsubscribe any time.
Winner by Use Case
Hetzner CX32 — €6.80/mo
8 GB RAM handles Qwen 2.5 7B Q4 comfortably at 8–12 tok/s. The Goldilocks zone: slow enough to be cheap, fast enough for real work. This is where most people should start.
Hetzner CX22 — €5.89/mo
Runs 3B models (Phi-3 Mini, Qwen 2.5 3B) at 2–3 tok/s. Fast enough for a playground. Save €0.91/month vs competitors at this tier.
Vultr Cloud — $12/mo
Only 2 GB RAM — too little for even 3B models reliably. Either go cheaper (Hetzner) or more powerful. Nothing in between justifies this price.
DigitalOcean — $24/mo
99.95% uptime SLA vs Hetzner's 99%. Best docs, fastest support. Worth 3× the cost only if downtime has a real dollar cost.
Benchmark Details: Hetzner CX32
Model: Qwen2.5-7B-Instruct-Q4_K_M
Hardware: Hetzner CX32 (4 vCPU, 8 GB RAM)
Backend: Ollama 0.3
Output: 8–12 tokens/second
Latency: 80–120ms first token, then streaming
RAM used: ~7.2 GB (leaves 0.8 GB headroom)
CPU load: 85–95% all cores
Context: 2,048 tokens
Compare: RTX 3060 = 40–50 tok/s. M4 Mac Mini (MLX) = 35–42 tok/s. The VPS can't match GPU or Apple Silicon — but at €6.80/month it doesn't need to.
RAM vs Model Size
| Model | Q4_K_M RAM | Q8 RAM | Min VPS |
|---|---|---|---|
| 3B (Phi, Qwen) | 2.5 GB | 3.5 GB | CX22 (€5.89) |
| 7B (Qwen, Llama) | 6–7 GB | 8–10 GB | CX32 (€6.80) |
| 13B (Mistral) | 10–12 GB | 14–16 GB | CX42 (€16.40) |
| 30B | 22–24 GB | 32–40 GB | €32+ plan |
| 70B | 48 GB+ | 70 GB+ | GPU required |
Quick Setup: CX32 in 15 minutes
# 1. SSH into your VPS
ssh root@your.vps.ip
# 2. Install Ollama
curl https://ollama.ai/install.sh | sh
systemctl enable --now ollama
# 3. Pull the model (~4 GB, takes 2–3 min)
ollama pull qwen2.5:7b-instruct-q4_k_m
# 4. Test it
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:7b-instruct-q4_k_m",
"prompt": "Explain Docker in one sentence.",
"stream": false
}'
Annual Cost of Ownership
| Platform | 1 Year | 3 Years | 5 Years |
|---|---|---|---|
| Hetzner CX22 | €70.68 | €212 | €353 |
| Hetzner CX32 | €81.60 | €244 | €408 |
| DigitalOcean | $288 | $864 | $1,440 |
| Mac Mini M4 | $605 | $617 | $629 |
Mac Mini wins at year 5+. Hetzner wins for anything under ~7 years. DigitalOcean only makes sense with uptime SLA requirements.