RunPod
Cheapest GPU cloud. Best for 70B+ models and high-throughput inference.
We earn 3–10% recurring revenue share for up to 1 year if you sign up.
Plans & pricing
Verify current prices on RunPod ↗Prices shown are community-verified as of April 2026. Click the provider link above to confirm current rates — pricing changes without notice.
| Plan | RAM | CPU / GPU | Price | Ollama use | |
|---|---|---|---|---|---|
| RTX 3090 (24GB VRAM) | 24GB VRAM | 9 vCPU | ~$0.34/hr | 13–34B FP16, strong value | Get plan → |
| RTX 4090 (24GB VRAM) Recommended | 24GB VRAM | 9 vCPU | ~$0.44/hr | Fastest 34B FP16 option | Get plan → |
| A40 (48GB VRAM) | 48GB VRAM | – | ~$0.79/hr | 70B FP16, datacenter-grade | Get plan → |
| A100 SXM (80GB VRAM) | 80GB VRAM | – | ~$1.89/hr | 70B+ FP16, max speed | Get plan → |
Community benchmarks
Measured by community members running real workloads. Numbers are tokens/second on the listed model.
Detailed pros & cons
What's good
RTX 4090 at $0.44/hr vs Lambda Labs at $1.25/hr. For on-demand GPU inference, RunPod is consistently 60–70% cheaper than alternatives.
Start a pod, run a batch job, stop it. 3 hours of GPU time costs $1.32 on an RTX 4090. No monthly subscription required.
GPU users spend $100–$500/mo. 3% recurring for a year is often worth more than one-time payouts. Top earners make $50+/mo from a single active referral.
RunPod Serverless auto-scales inference endpoints to zero when not in use. Pay only per request. Ideal for low-volume production apps.
Individual providers list spare GPUs, driving prices down. The competitive marketplace ensures pricing stays below data-centre alternatives.
Watch out for
The most common RunPod mistake. An RTX 4090 costs $10.56/day idle. Set a phone reminder or use RunPod's spend alerts. This is the #1 complaint from new users.
24/7 RTX 4090 = ~$320/mo. A Hetzner CX32 running 14B models is €12.49/mo. Use RunPod for bursts, Hetzner for always-on.
Spot pricing saves ~30% but pods can be preempted. Use on-demand for anything important.
Batch inference jobs, 70B+ model testing, developers needing GPU bursts for specific tasks. Anyone who needs GPU power without 24/7 costs.
Personal always-on assistants — Hetzner/Contabo are 10–20x cheaper for 24/7 CPU inference.
Ollama setup guide for RunPod
Step-by-step from account creation to first model response. Takes 15–20 minutes.
Create your account and add credits
RunPod uses a pre-paid credit model. Add $20 to get started — that's ~45 hours on an RTX 4090. Credits never expire.
Choose your GPU
For 7B–34B at high speed: RTX 3090 or 4090 ($0.34–0.44/hr). For 70B FP16 quality: A40 ($0.79/hr). For maximum speed: A100 SXM ($1.89/hr).
Deploy with the Ollama template
Click 'Deploy' → select your GPU → search 'Ollama' in templates → select the official RunPod Ollama template → click 'Deploy On-Demand'.
Wait for the pod to start
RunPod pods start in 30–120 seconds. Watch the status change from 'Starting' to 'Running'. Note your Pod ID.
Connect to the pod
Click 'Connect' → 'Start Web Terminal' for a browser shell. Or use RunPod CLI: runpodctl connect YOUR_POD_ID
Pull your model
Inside the pod terminal: ollama pull llama3.3:70b-instruct-q4_K_M A100 downloads 70B Q4 in ~8 minutes. Loads into VRAM in ~15 seconds. Then: ollama run llama3.3:70b-instruct-q4_K_M
Expose the API endpoint
In pod settings: add port 11434 to HTTP Ports → note the public URL format: https://YOUR_POD_ID-11434.proxy.runpod.net Use this URL in Continue.dev, Open WebUI, or LiteLLM.
CRITICAL: Stop your pod when done
RunPod bills by the second. An idle RTX 4090 costs $0.44/hr = $10.56/day if left running. Stop the pod: RunPod Console → your pod → Stop. Your volume (model files) persists — no need to re-download next time.
Community setups
"95 tok/s on 70B FP16. Unbeatable for batch jobs. Non-negotiable tip: stop the pod when done."
Ready to get started?
Starting at ~$0.12/hr (spot). No long-term commitment required.