Hetzner Cloud
Best value VPS for Ollama. Community favourite since day one.
We earn €10 credit per referral if you sign up.
Plans & pricing
Verify current prices on Hetzner Cloud ↗Prices shown are community-verified as of April 2026. Click the provider link above to confirm current rates — pricing changes without notice.
| Plan | RAM | CPU / GPU | Price | Ollama use | |
|---|---|---|---|---|---|
| CX11 | 2GB | 2 vCPU | €3.79 | Phi-3 Mini only (3B) | Get plan → |
| CX22 Recommended | 4GB | 2 vCPU | €5.89 | 7B models (Q4_K_M) — sweet spot | Get plan → |
| CX32 | 8GB | 4 vCPU | €12.49 | 14B models (Q4) | Get plan → |
| CX42 | 16GB | 8 vCPU | €24.49 | 34B models (Q4) | Get plan → |
| CX52 | 32GB | 16 vCPU | €49.49 | 70B models (Q4) | Get plan → |
Community benchmarks
Measured by community members running real workloads. Numbers are tokens/second on the listed model.
Detailed pros & cons
What's good
CX22 gives 4GB RAM for €5.89 — DigitalOcean charges $24 for the same RAM. Ollama is memory-bound; every GB matters.
Model files load from NVMe in 5–15 seconds. Older HDDs or lower-tier providers can take 45–60+ seconds per model load.
Hetzner provisions faster than any other provider we tested. Order at 11pm, SSH in by 11:01pm.
No egress surprises, no 'enhanced support' tiers, no hidden fees. The price on the website is what you pay.
Spin up multiple inference nodes and connect them on a private network at zero cost. Good for when you scale beyond one server.
Watch out for
Referrals earn €10 in Hetzner account credits — not transferable cash. Great for cutting your own bill, not for building affiliate income.
No US East or West Coast locations. For US-based users, expect 80–120ms latency to EU. Fine for batch jobs; noticeable for real-time chat.
CPU-only. For 70B+ models at high speed or GPU-required workloads, use RunPod alongside Hetzner for CPU tasks.
Hetzner support is solid but not 24/7. Critical hardware issues get resolved — but you might wait until Monday for billing questions.
Developers and budget-conscious builders who want maximum value for CPU inference on 7B–34B models running 24/7.
US-only users who need <50ms latency, or GPU inference workloads.
Ollama setup guide for Hetzner Cloud
Step-by-step from account creation to first model response. Takes 15–20 minutes.
Create your account
Sign up at hetzner.com/cloud. Add a credit card or PayPal. New accounts may require a short identity check — usually resolves in under an hour.
Choose your plan
For a 7B model (best starting point): pick CX22 at €5.89/mo. Upgrading later is one click. Under-provisioning stalls model loads — don't go below 4GB for Ollama.
Deploy your server
Click 'Add Server'. Choose Ubuntu 24.04 LTS, pick the closest datacenter (Nuremberg or Helsinki for Europe, Singapore for Asia-Pacific). Name it something like 'ollama-1'.
Connect via SSH
Open your terminal: ssh root@YOUR_SERVER_IP Accept the fingerprint. You're in — it takes under 60 seconds from order to shell.
Install Ollama
One command installs Ollama and registers it as a systemd service: curl -fsSL https://ollama.com/install.sh | sh Verify it's running: systemctl status ollama
Pull your first model
ollama pull qwen2.5-coder:7b-instruct-q4_K_M Downloads ~4.1GB. Takes 3–5 minutes. Then test: ollama run qwen2.5-coder:7b-instruct-q4_K_M Type a message, press Enter.
Expose the API (for remote access)
Edit the systemd service to listen on all interfaces: systemctl edit ollama Add: [Service] Environment="OLLAMA_HOST=0.0.0.0" Then: systemctl restart ollama Ollama now listens on port 11434.
Connect your client
In VS Code + Continue.dev: set Ollama base URL to http://YOUR_IP:11434. For Open WebUI (Docker): -e OLLAMA_BASE_URL=http://YOUR_IP:11434 For LiteLLM: add the model as ollama/qwen2.5-coder:7b-instruct-q4_K_M.
Community setups
"Solid for VS Code + Continue.dev. Zero lag on autocomplete. Wouldn't go below 4GB RAM."
"LiteLLM routes 85% local, 15% to Claude API. Monthly bill dropped from $180 to $22."
Ready to get started?
Starting at €3.79/mo. No long-term commitment required.