Affiliate links on this page — rates explained

🇩🇪

Best overall value

Hetzner Cloud

Best value VPS for Ollama. Community favourite since day one.

Starting at€3.79/mo

LocationsGermany, Finland, Singapore

SLA99.9%+

Get started on Hetzner Cloud → See current pricing ↗

We earn €10 credit per referral if you sign up.

Plans & pricing

Verify current prices on Hetzner Cloud ↗

Prices shown are community-verified as of April 2026. Click the provider link above to confirm current rates — pricing changes without notice.

Plan	RAM	CPU / GPU	Price	Ollama use
CX11	2GB	2 vCPU	€3.79	Phi-3 Mini only (3B)	Get plan →
CX22 Recommended	4GB	2 vCPU	€5.89	7B models (Q4_K_M) — sweet spot	Get plan →
CX32	8GB	4 vCPU	€12.49	14B models (Q4)	Get plan →
CX42	16GB	8 vCPU	€24.49	34B models (Q4)	Get plan →
CX52	32GB	16 vCPU	€49.49	70B models (Q4)	Get plan →

Community benchmarks

Measured by community members running real workloads. Numbers are tokens/second on the listed model.

22 tok/s

Qwen 2.5 Coder 7B (Q4_K_M)

CX22 (4GB)

First token ~800ms. Great for code completion.

14 tok/s

Qwen 2.5 14B (Q4_K_M)

CX32 (8GB)

First token ~1.2s. Solid for chat.

8 tok/s

Llama 3.3 70B (Q2_K)

CX42 (16GB)

Needs Q2 quant at 16GB. Functional.

11 tok/s

Llama 3.3 70B (Q4_K_M)

CX52 (32GB)

Comfortable 70B at Q4. Best value for large models.

Detailed pros & cons

What's good

✓ Best RAM-per-euro on the market

CX22 gives 4GB RAM for €5.89 — DigitalOcean charges $24 for the same RAM. Ollama is memory-bound; every GB matters.

✓ Fast NVMe SSDs on every tier

Model files load from NVMe in 5–15 seconds. Older HDDs or lower-tier providers can take 45–60+ seconds per model load.

✓ Under 60 seconds from order to running server

Hetzner provisions faster than any other provider we tested. Order at 11pm, SSH in by 11:01pm.

✓ Honest, transparent pricing

No egress surprises, no 'enhanced support' tiers, no hidden fees. The price on the website is what you pay.

✓ Free private networking

Spin up multiple inference nodes and connect them on a private network at zero cost. Good for when you scale beyond one server.

Watch out for

✗ Credits-only affiliate (not cash)

Referrals earn €10 in Hetzner account credits — not transferable cash. Great for cutting your own bill, not for building affiliate income.

✗ EU and Singapore data centres only

No US East or West Coast locations. For US-based users, expect 80–120ms latency to EU. Fine for batch jobs; noticeable for real-time chat.

✗ No GPU instances

CPU-only. For 70B+ models at high speed or GPU-required workloads, use RunPod alongside Hetzner for CPU tasks.

✗ Weekend support is slower

Hetzner support is solid but not 24/7. Critical hardware issues get resolved — but you might wait until Monday for billing questions.

Best for

Developers and budget-conscious builders who want maximum value for CPU inference on 7B–34B models running 24/7.

Not for

US-only users who need <50ms latency, or GPU inference workloads.

Ollama setup guide for Hetzner Cloud

Step-by-step from account creation to first model response. Takes 15–20 minutes.

Create your account

Sign up at hetzner.com/cloud. Add a credit card or PayPal. New accounts may require a short identity check — usually resolves in under an hour.

Choose your plan

For a 7B model (best starting point): pick CX22 at €5.89/mo. Upgrading later is one click. Under-provisioning stalls model loads — don't go below 4GB for Ollama.

Deploy your server

Click 'Add Server'. Choose Ubuntu 24.04 LTS, pick the closest datacenter (Nuremberg or Helsinki for Europe, Singapore for Asia-Pacific). Name it something like 'ollama-1'.

Connect via SSH

Open your terminal:

ssh root@YOUR_SERVER_IP

Accept the fingerprint. You're in — it takes under 60 seconds from order to shell.

Install Ollama

One command installs Ollama and registers it as a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

Verify it's running: systemctl status ollama

Pull your first model

ollama pull qwen2.5-coder:7b-instruct-q4_K_M

Downloads ~4.1GB. Takes 3–5 minutes. Then test:
ollama run qwen2.5-coder:7b-instruct-q4_K_M
Type a message, press Enter.

Expose the API (for remote access)

Edit the systemd service to listen on all interfaces:

systemctl edit ollama

Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Then: systemctl restart ollama
Ollama now listens on port 11434.

Connect your client

In VS Code + Continue.dev: set Ollama base URL to http://YOUR_IP:11434.
For Open WebUI (Docker): -e OLLAMA_BASE_URL=http://YOUR_IP:11434
For LiteLLM: add the model as ollama/qwen2.5-coder:7b-instruct-q4_K_M.

Community setups

CX22 + Qwen 2.5 Coder 7B (Q4_K_M)

22 tok/s

"Solid for VS Code + Continue.dev. Zero lag on autocomplete. Wouldn't go below 4GB RAM."

@dev_dana

CX32 + Ollama + LiteLLM + Qwen 2.5 14B

14 tok/s

"LiteLLM routes 85% local, 15% to Claude API. Monthly bill dropped from $180 to $22."

@builder_ben

Ready to get started?

Starting at €3.79/mo. No long-term commitment required.

Get started on Hetzner Cloud →

Compare other providers

Best for beginners DigitalOcean → Best global coverage Vultr → Cheapest per GB RAM Contabo → View all 5 providers →