GPU Providers

gpuci supports 6 GPU providers for testing CUDA kernels. This guide covers setup, configuration, and best practices for each provider.

Provider Overview

Provider	Type	Best For	Pricing	Instance Lifecycle
SSH	Your machines	Dedicated hardware	Free	Always on
RunPod	Cloud	Quick testing	$0.20-4.00/hr	On-demand pods
Lambda Labs	Cloud	Production workloads	$1.10-3.00/hr	On-demand instances
Vast.ai	Marketplace	Cost optimization	$0.10-2.00/hr	Spot-like rentals
FluidStack	Enterprise	Large scale	Custom	Managed instances
Brev	Cloud	Development	$0.50-4.00/hr	CLI-managed

Quick Start

# gpuci.yml - Example with multiple providers
targets:
  # Your own machine
  - name: local-4090
    provider: ssh
    host: gpu.local
    user: ubuntu
    gpu: RTX 4090

  # Cloud providers (pick one or more)
  - name: runpod-a100
    provider: runpod
    gpu: "NVIDIA A100 80GB PCIe"

  - name: lambda-h100
    provider: lambdalabs
    gpu: gpu_1x_h100_pcie

  - name: vastai-4090
    provider: vastai
    gpu: RTX_4090
    max_price: 0.50

SSH Provider

Direct SSH connection to your own GPU machines. No cloud costs, full control.

Configuration

- name: my-server
  provider: ssh
  host: gpu.example.com     # Required: hostname or IP
  user: ubuntu              # Required: SSH username
  port: 22                  # Optional: SSH port (default: 22)
  key: ~/.ssh/id_rsa        # Optional: path to SSH private key
  gpu: RTX 4090             # Required: GPU name (for display)

Setup

Ensure SSH access to your GPU machine:
```
ssh ubuntu@gpu.example.com
```

Verify CUDA is installed on remote:

ssh ubuntu@gpu.example.com "nvcc --version && nvidia-smi"

(Optional) Add SSH key for passwordless access:
```
ssh-copy-id ubuntu@gpu.example.com
```

Requirements on Remote Machine

NVIDIA GPU with drivers installed
CUDA toolkit with nvcc compiler
SSH access (password or key-based)
Sufficient disk space for kernel compilation

Troubleshooting

Connection refused: Check firewall and SSH service status

sudo systemctl status ssh
sudo ufw allow 22/tcp

CUDA not found: Ensure CUDA is in PATH

export PATH=/usr/local/cuda/bin:$PATH

RunPod Provider

RunPod offers on-demand GPU pods with simple pricing and fast spin-up times.

Installation

pip install gpuci[runpod]

Configuration

- name: runpod-a100
  provider: runpod
  gpu: "NVIDIA A100 80GB PCIe"    # Required: GPU type ID
  gpu_count: 1                     # Optional: number of GPUs (default: 1)
  image: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04  # Optional
  volume_size: 20                  # Optional: volume size in GB

Setup

Get your API key from RunPod Settings
Set environment variable:
```
export RUNPOD_API_KEY=your_api_key_here
```

For CI/CD, add as a secret:

# GitHub Actions
env:
  RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }}

Available GPU Types

GPU Type ID	GPU	VRAM	Approx. Price
`NVIDIA RTX 4090`	RTX 4090	24GB	$0.44/hr
`NVIDIA A100 80GB PCIe`	A100	80GB	$1.89/hr
`NVIDIA A100-SXM4-80GB`	A100 SXM	80GB	$1.99/hr
`NVIDIA H100 PCIe`	H100	80GB	$3.89/hr
`NVIDIA L40S`	L40S	48GB	$1.14/hr

Get current availability:

import runpod
runpod.api_key = "your_key"
print(runpod.get_gpus())

How It Works

gpuci creates a new pod with your specified GPU
Waits for pod to reach RUNNING state
Connects via SSH (port 22 exposed automatically)
Uploads kernel, compiles, runs benchmarks
Terminates pod on completion (you only pay for runtime)

Lambda Labs Provider

Lambda Labs provides high-performance cloud GPUs with pre-installed CUDA and ML frameworks.

Configuration

- name: lambda-h100
  provider: lambdalabs
  gpu: gpu_1x_h100_pcie        # Required: instance type name
  region: us-west-1            # Optional: preferred region
  ssh_key_name: my-key         # Optional: SSH key name (uses first key if not specified)

Setup

Get your API key from Lambda Cloud API Keys
Set environment variable:
```
export LAMBDA_API_KEY=your_api_key_here
```
Add an SSH key to your account at Lambda SSH Keys

Available Instance Types

Instance Type	GPU	VRAM	Price
`gpu_1x_a10`	A10	24GB	$0.60/hr
`gpu_1x_a100`	A100 40GB	40GB	$1.10/hr
`gpu_1x_a100_sxm4`	A100 80GB	80GB	$1.29/hr
`gpu_1x_h100_pcie`	H100	80GB	$2.49/hr
`gpu_8x_a100`	8x A100	320GB	$8.80/hr
`gpu_8x_h100_sxm5`	8x H100	640GB	$23.92/hr

Check availability:

curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instance-types

Regions

us-west-1 (California)
us-south-1 (Texas)
us-east-1 (New York)
me-central-1 (Middle East)
asia-south-1 (India)

Vast.ai Provider

Vast.ai is a GPU marketplace where you can rent from independent providers at competitive prices.

Installation

pip install gpuci[vastai]

Configuration

- name: vastai-4090
  provider: vastai
  gpu: RTX_4090                # Required: GPU name
  max_price: 0.50              # Optional: max $/hour (filters expensive offers)
  min_gpu_ram: 24              # Optional: minimum GPU RAM in GB
  image: pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel  # Optional: Docker image

Setup

Create account at Vast.ai
Get your API key from Account Settings
Set environment variable:
```
export VASTAI_API_KEY=your_api_key_here
```
Add credits to your account (prepaid model)

GPU Names

GPU Name	Typical VRAM	Price Range
`RTX_3090`	24GB	$0.15-0.30/hr
`RTX_4090`	24GB	$0.30-0.50/hr
`A100_PCIE`	40/80GB	$0.80-1.50/hr
`A100_SXM4`	80GB	$1.00-2.00/hr
`H100_PCIE`	80GB	$2.00-3.00/hr
`H100_SXM5`	80GB	$2.50-4.00/hr

Search Filters

gpuci automatically searches for:

Unrented, rentable machines
Verified hosts (more reliable)
Matches your GPU name, RAM, and price requirements

The cheapest matching offer is selected automatically.

Notes

Vast.ai uses a marketplace model - prices and availability fluctuate
Offers are sorted by price; gpuci picks the cheapest match
Instances are destroyed after testing (you only pay for runtime)
Consider setting max_price to avoid expensive offers during high demand

FluidStack Provider

FluidStack provides enterprise-grade GPU cloud infrastructure.

Configuration

- name: fluidstack-h100
  provider: fluidstack
  gpu: H100_SXM_80GB           # Required: GPU type
  ssh_key_name: my-key         # Optional: SSH key name

Setup

Create account at FluidStack Console
Get your API key from API Keys

Set environment variable:

export FLUIDSTACK_API_KEY=your_api_key_here

Add an SSH key at SSH Keys

Available GPU Types

GPU Type	GPU	VRAM
`RTX_A6000_48GB`	RTX A6000	48GB
`A100_PCIE_40GB`	A100	40GB
`A100_SXM_80GB`	A100 SXM	80GB
`H100_PCIE_80GB`	H100 PCIe	80GB
`H100_SXM_80GB`	H100 SXM	80GB

Notes

Enterprise pricing - contact FluidStack for rates
Best for organizations needing guaranteed capacity
SLA-backed availability

Brev Provider

NVIDIA Brev (formerly Brev.dev) provides managed cloud GPUs with CLI-based management.

Configuration

- name: brev-h100
  provider: brev
  gpu: H100                    # Required: GPU type for display

Setup

Install Brev CLI:

curl -fsSL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-latest.sh | bash

Login:
```
brev login
```
Verify:
```
brev ls
```

How It Works

Unlike other providers, Brev uses CLI commands rather than API:

gpuci runs brev create to create an instance
Polls brev ls until instance is ready
Extracts SSH info and connects via paramiko
Runs kernel tests
Runs brev delete to clean up

Notes

Requires Brev CLI installed locally
Uses your Brev account credentials
Good for development and testing
GUI available at console.brev.dev

Environment Variables Reference

Provider	Variable	Description
RunPod	`RUNPOD_API_KEY`	API key from RunPod settings
Lambda Labs	`LAMBDA_API_KEY`	API key from Lambda Cloud
Vast.ai	`VASTAI_API_KEY`	API key from Vast.ai account
FluidStack	`FLUIDSTACK_API_KEY`	API key from FluidStack console

Note: SSH and Brev providers don't require environment variables (SSH uses keys, Brev uses CLI login).

GitHub Actions Integration

Set up secrets in your repository settings:

# .github/workflows/gpu-test.yml
name: GPU Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install gpuci
        run: pip install gpuci[cloud]

      - name: Run GPU tests
        env:
          RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }}
          LAMBDA_API_KEY: ${{ secrets.LAMBDA_API_KEY }}
          VASTAI_API_KEY: ${{ secrets.VASTAI_API_KEY }}
        run: gpuci test kernel.cu

Cost Optimization Tips

Use SSH for frequent testing: If you have dedicated hardware, SSH provider is free.
Set max_price on Vast.ai: Prevents accidentally renting expensive machines during high demand.
Use appropriate GPU: Don't rent an H100 for simple kernels that run fine on a 4090.
Clean up instances: gpuci automatically terminates instances, but verify in provider dashboards.
Use spot/preemptible when available: Some providers offer discounted interruptible instances.
Test locally first: Debug on local GPU before running on expensive cloud instances.

Troubleshooting

"API key not set"

Ensure the environment variable is set:

echo $RUNPOD_API_KEY  # Should show your key

"No instances available"

Try a different GPU type
Try a different region
Check provider status page
For Vast.ai, increase max_price

"SSH connection failed"

Instance may still be initializing
Check that SSH key is added to provider account
Verify firewall allows SSH (port 22)

"CUDA not found on remote"

Use a CUDA-enabled Docker image
Verify CUDA toolkit is installed
Check PATH includes /usr/local/cuda/bin

Instance not terminating

If gpuci crashes before cleanup:

RunPod: Check console
Lambda Labs: Check instances
Vast.ai: Check instances
FluidStack: Check console

FilesExpand file tree

providers.md

Latest commit

History

providers.md

File metadata and controls

GPU Providers

Provider Overview

Quick Start

SSH Provider

Configuration

Setup

Requirements on Remote Machine

Troubleshooting

RunPod Provider

Installation

Configuration

Setup

Available GPU Types

How It Works

Lambda Labs Provider

Configuration

Setup

Available Instance Types

Regions

Vast.ai Provider

Installation

Configuration

Setup

GPU Names

Search Filters

Notes

FluidStack Provider

Configuration

Setup

Available GPU Types

Notes

Brev Provider

Configuration

Setup

How It Works

Notes

Environment Variables Reference

GitHub Actions Integration

Cost Optimization Tips

Troubleshooting

"API key not set"

"No instances available"

"SSH connection failed"

"CUDA not found on remote"

Instance not terminating