Affiliate links on this page — rates explained

🔥

Best for GPU inference

RunPod

Cheapest GPU cloud. Best for 70B+ models and high-throughput inference.

Starting at~$0.12/hr (spot)

LocationsUS, EU, Canada — community GPU marketplace

SLAVariable (spot instances can be interrupted)

Get started on RunPod → See current pricing ↗

We earn 3–10% recurring revenue share for up to 1 year if you sign up.

Plans & pricing

Verify current prices on RunPod ↗

Prices shown are community-verified as of April 2026. Click the provider link above to confirm current rates — pricing changes without notice.

Plan	RAM	CPU / GPU	Price	Ollama use
RTX 3090 (24GB VRAM)	24GB VRAM	9 vCPU	~$0.34/hr	13–34B FP16, strong value	Get plan →
RTX 4090 (24GB VRAM) Recommended	24GB VRAM	9 vCPU	~$0.44/hr	Fastest 34B FP16 option	Get plan →
A40 (48GB VRAM)	48GB VRAM	–	~$0.79/hr	70B FP16, datacenter-grade	Get plan →
A100 SXM (80GB VRAM)	80GB VRAM	–	~$1.89/hr	70B+ FP16, max speed	Get plan →

Community benchmarks

Measured by community members running real workloads. Numbers are tokens/second on the listed model.

110 tok/s

Qwen 2.5 7B (Q4_K_M)

RTX 4090 (24GB)

GPU inference is a different league vs CPU.

52 tok/s

Llama 3.3 70B (Q4_K_M)

RTX 4090 (24GB)

70B at 52 tok/s — comfortable for real-time chat.

95 tok/s

Llama 3.3 70B (FP16)

A100 (80GB)

Full precision, maximum quality, 95 tok/s.

Detailed pros & cons

What's good

✓ Cheapest GPU rental on the market

RTX 4090 at $0.44/hr vs Lambda Labs at $1.25/hr. For on-demand GPU inference, RunPod is consistently 60–70% cheaper than alternatives.

✓ No minimum commitment, billed per second

Start a pod, run a batch job, stop it. 3 hours of GPU time costs $1.32 on an RTX 4090. No monthly subscription required.

✓ 3–10% recurring affiliate commission

GPU users spend $100–$500/mo. 3% recurring for a year is often worth more than one-time payouts. Top earners make $50+/mo from a single active referral.

✓ Serverless inference (scale to zero)

RunPod Serverless auto-scales inference endpoints to zero when not in use. Pay only per request. Ideal for low-volume production apps.

✓ Community GPU marketplace

Individual providers list spare GPUs, driving prices down. The competitive marketplace ensures pricing stays below data-centre alternatives.

Watch out for

✗ Forget to stop = unexpected bill

The most common RunPod mistake. An RTX 4090 costs $10.56/day idle. Set a phone reminder or use RunPod's spend alerts. This is the #1 complaint from new users.

✗ Not for always-on cheap inference

24/7 RTX 4090 = ~$320/mo. A Hetzner CX32 running 14B models is €12.49/mo. Use RunPod for bursts, Hetzner for always-on.

✗ Spot instances can be interrupted

Spot pricing saves ~30% but pods can be preempted. Use on-demand for anything important.

Best for

Batch inference jobs, 70B+ model testing, developers needing GPU bursts for specific tasks. Anyone who needs GPU power without 24/7 costs.

Not for

Personal always-on assistants — Hetzner/Contabo are 10–20x cheaper for 24/7 CPU inference.

Ollama setup guide for RunPod

Step-by-step from account creation to first model response. Takes 15–20 minutes.

Create your account and add credits

RunPod uses a pre-paid credit model. Add $20 to get started — that's ~45 hours on an RTX 4090. Credits never expire.

Choose your GPU

For 7B–34B at high speed: RTX 3090 or 4090 ($0.34–0.44/hr).
For 70B FP16 quality: A40 ($0.79/hr).
For maximum speed: A100 SXM ($1.89/hr).

Deploy with the Ollama template

Click 'Deploy' → select your GPU → search 'Ollama' in templates → select the official RunPod Ollama template → click 'Deploy On-Demand'.

Wait for the pod to start

RunPod pods start in 30–120 seconds. Watch the status change from 'Starting' to 'Running'. Note your Pod ID.

Connect to the pod

Click 'Connect' → 'Start Web Terminal' for a browser shell.

Or use RunPod CLI:
runpodctl connect YOUR_POD_ID

Pull your model

Inside the pod terminal:
ollama pull llama3.3:70b-instruct-q4_K_M

A100 downloads 70B Q4 in ~8 minutes. Loads into VRAM in ~15 seconds.
Then: ollama run llama3.3:70b-instruct-q4_K_M

Expose the API endpoint

In pod settings: add port 11434 to HTTP Ports → note the public URL format:
https://YOUR_POD_ID-11434.proxy.runpod.net

Use this URL in Continue.dev, Open WebUI, or LiteLLM.

CRITICAL: Stop your pod when done

RunPod bills by the second. An idle RTX 4090 costs $0.44/hr = $10.56/day if left running.

Stop the pod: RunPod Console → your pod → Stop.
Your volume (model files) persists — no need to re-download next time.

Community setups

RTX 4090 + vLLM + Llama 3.3 70B FP16

95 tok/s

"95 tok/s on 70B FP16. Unbeatable for batch jobs. Non-negotiable tip: stop the pod when done."

@gpu_power_user

Ready to get started?

Starting at ~$0.12/hr (spot). No long-term commitment required.

Get started on RunPod →

Compare other providers

Best overall value Hetzner Cloud → Best for beginners DigitalOcean → Best global coverage Vultr → View all 5 providers →