Affiliate links on this page — rates explained
🇩🇪
Best overall value

Hetzner Cloud

Best value VPS for Ollama. Community favourite since day one.

Starting at€3.79/mo
LocationsGermany, Finland, Singapore
SLA99.9%+
Get started on Hetzner Cloud → See current pricing ↗

We earn €10 credit per referral if you sign up.

Prices shown are community-verified as of April 2026. Click the provider link above to confirm current rates — pricing changes without notice.

PlanRAMCPU / GPUPriceOllama use
CX11 2GB 2 vCPU €3.79 Phi-3 Mini only (3B) Get plan →
CX22 Recommended 4GB 2 vCPU €5.89 7B models (Q4_K_M) — sweet spot Get plan →
CX32 8GB 4 vCPU €12.49 14B models (Q4) Get plan →
CX42 16GB 8 vCPU €24.49 34B models (Q4) Get plan →
CX52 32GB 16 vCPU €49.49 70B models (Q4) Get plan →

Community benchmarks

Measured by community members running real workloads. Numbers are tokens/second on the listed model.

22 tok/s
Qwen 2.5 Coder 7B (Q4_K_M)
CX22 (4GB)
First token ~800ms. Great for code completion.
14 tok/s
Qwen 2.5 14B (Q4_K_M)
CX32 (8GB)
First token ~1.2s. Solid for chat.
8 tok/s
Llama 3.3 70B (Q2_K)
CX42 (16GB)
Needs Q2 quant at 16GB. Functional.
11 tok/s
Llama 3.3 70B (Q4_K_M)
CX52 (32GB)
Comfortable 70B at Q4. Best value for large models.

Detailed pros & cons

What's good

✓ Best RAM-per-euro on the market

CX22 gives 4GB RAM for €5.89 — DigitalOcean charges $24 for the same RAM. Ollama is memory-bound; every GB matters.

✓ Fast NVMe SSDs on every tier

Model files load from NVMe in 5–15 seconds. Older HDDs or lower-tier providers can take 45–60+ seconds per model load.

✓ Under 60 seconds from order to running server

Hetzner provisions faster than any other provider we tested. Order at 11pm, SSH in by 11:01pm.

✓ Honest, transparent pricing

No egress surprises, no 'enhanced support' tiers, no hidden fees. The price on the website is what you pay.

✓ Free private networking

Spin up multiple inference nodes and connect them on a private network at zero cost. Good for when you scale beyond one server.

Watch out for

✗ Credits-only affiliate (not cash)

Referrals earn €10 in Hetzner account credits — not transferable cash. Great for cutting your own bill, not for building affiliate income.

✗ EU and Singapore data centres only

No US East or West Coast locations. For US-based users, expect 80–120ms latency to EU. Fine for batch jobs; noticeable for real-time chat.

✗ No GPU instances

CPU-only. For 70B+ models at high speed or GPU-required workloads, use RunPod alongside Hetzner for CPU tasks.

✗ Weekend support is slower

Hetzner support is solid but not 24/7. Critical hardware issues get resolved — but you might wait until Monday for billing questions.

Best for

Developers and budget-conscious builders who want maximum value for CPU inference on 7B–34B models running 24/7.

Not for

US-only users who need <50ms latency, or GPU inference workloads.

Ollama setup guide for Hetzner Cloud

Step-by-step from account creation to first model response. Takes 15–20 minutes.

1

Create your account

Sign up at hetzner.com/cloud. Add a credit card or PayPal. New accounts may require a short identity check — usually resolves in under an hour.
2

Choose your plan

For a 7B model (best starting point): pick CX22 at €5.89/mo. Upgrading later is one click. Under-provisioning stalls model loads — don't go below 4GB for Ollama.
3

Deploy your server

Click 'Add Server'. Choose Ubuntu 24.04 LTS, pick the closest datacenter (Nuremberg or Helsinki for Europe, Singapore for Asia-Pacific). Name it something like 'ollama-1'.
4

Connect via SSH

Open your terminal:

ssh root@YOUR_SERVER_IP

Accept the fingerprint. You're in — it takes under 60 seconds from order to shell.
5

Install Ollama

One command installs Ollama and registers it as a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

Verify it's running: systemctl status ollama
6

Pull your first model

ollama pull qwen2.5-coder:7b-instruct-q4_K_M

Downloads ~4.1GB. Takes 3–5 minutes. Then test:
ollama run qwen2.5-coder:7b-instruct-q4_K_M
Type a message, press Enter.
7

Expose the API (for remote access)

Edit the systemd service to listen on all interfaces:

systemctl edit ollama

Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Then: systemctl restart ollama
Ollama now listens on port 11434.
8

Connect your client

In VS Code + Continue.dev: set Ollama base URL to http://YOUR_IP:11434.
For Open WebUI (Docker): -e OLLAMA_BASE_URL=http://YOUR_IP:11434
For LiteLLM: add the model as ollama/qwen2.5-coder:7b-instruct-q4_K_M.

Community setups

CX22 + Qwen 2.5 Coder 7B (Q4_K_M)
22 tok/s
"Solid for VS Code + Continue.dev. Zero lag on autocomplete. Wouldn't go below 4GB RAM."
@dev_dana
CX32 + Ollama + LiteLLM + Qwen 2.5 14B
14 tok/s
"LiteLLM routes 85% local, 15% to Claude API. Monthly bill dropped from $180 to $22."
@builder_ben

Ready to get started?

Starting at €3.79/mo. No long-term commitment required.

Get started on Hetzner Cloud →