Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 26 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO.

</div>

## 📏 RULER: Zero-Shot Agent Rewards
## 🚀 W&B Training: Serverless RL

**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**.
**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.

✨ **Key Benefits:**

- **2-3x faster development** - Skip reward function engineering entirely
- **General-purpose** - Works across any task without modification
- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
- **Easy integration** - Drop-in replacement for manual reward functions
- **40% lower cost** - Multiplexing on shared production-grade inference cluster
- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs
- **Zero infra headaches** - Fully managed infrastructure that stays healthy
- **Instant deployment** - Every checkpoint instantly available via W&B Inference

```python
# Before: Hours of reward engineering
def complex_reward_function(trajectory):
# 50+ lines of careful scoring logic...
pass

# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")
# Before: Hours of GPU setup and infra management
# RuntimeError: CUDA error: out of memory 😢

# After: Serverless RL with instant feedback
from art.serverless.backend import ServerlessBackend

model = art.TrainableModel(
project="voice-agent",
name="agent-001",
base_model="Qwen/Qwen2.5-14B-Instruct"
)

backend = ServerlessBackend(
api_key="your_wandb_api_key"
)
await model.register(backend)
# Edit and iterate in minutes, not hours!
```

[📖 Learn more about RULER →](https://art.openpipe.ai/fundamentals/ruler)
[📖 Learn more about W&B Training →](https://docs.wandb.ai/guides/training)

## ART Overview

Expand All @@ -52,10 +62,10 @@ ART is an open-source RL framework that improves agent reliability by allowing L

| Agent Task | Example Notebook | Description | Comparative Performance |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
| **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
| **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
| **ART•E [RULER]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 7B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
| **2048** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 3B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
| **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |
| **Tic Tac Toe** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/tic_tac_toe/display-benchmarks.ipynb) |
| **Codenames** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" height="72"> [benchmarks](https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) |
Expand Down
1 change: 1 addition & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
"group": "Features",
"pages": [
"features/checkpoint-forking",
"features/checkpoint-deletion",
"features/additional-histories",
"features/mcp-rl"
]
Expand Down
111 changes: 111 additions & 0 deletions docs/features/checkpoint-deletion.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: Deleting Checkpoints
description: Learn how to automatically delete low-performing model checkpoints
---

Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints.

## Deleting low-performing checkpoints

To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below.

```python
import art
# also works with LocalBackend and SkyPilotBackend
from art.serverless.backend import ServerlessBackend

model = art.TrainableModel(
name="agent-001",
project="checkpoint-deletion-demo",
base_model="Qwen/Qwen2.5-14B-Instruct",
)
backend = ServerlessBackend()
# in order for the model to know where to look for its existing checkpoints,
# we have to point it to the correct backend
await model.register(backend)

# deletes all but the most recent checkpoint
# and the checkpoint with the highest val/reward
await model.delete_checkpoints()
```

By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed.

```python
await model.delete_checkpoints(best_checkpoint_metric="train/eval_1_score")
```

Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution.


## Deleting within a training loop

Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts).

```python

import art
from art.serverless.backend import ServerlessBackend

from .rollout import rollout
from .scenarios load_train_scenarios

TRAINING_STEPS = 50

model = art.TrainableModel(
name="agent-001",
project="checkpoint-deletion-demo",
base_model="Qwen/Qwen2.5-14B-Instruct",
)
backend = ServerlessBackend()
await model.register(backend)


train_scenarios = load_train_scenarios()

# training loop
for _step in range(await model.get_step(), TRAINING_STEPS):
train_groups = await art.gather_trajectory_groups(
(
art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
for scenario in train_scenarios
),
pbar_desc=f"gather(train:{step})",
)
# trains model and automatically persists each LoRA as a W&B Artifact
# ~120MB per step
await model.train(
train_groups,
config=art.TrainConfig(learning_rate=5e-5),
)

# ~6GB of storage used by checkpoints
```

However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step.


```python
...
# training loop
for _step in range(await model.get_step(), TRAINING_STEPS):
train_groups = await art.gather_trajectory_groups(
(
art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
for scenario in train_scenarios
),
pbar_desc=f"gather(train:{step})",
)
# trains model and automatically persists each LoRA as a W&B Artifact
# ~120MB per step
await model.train(
train_groups,
config=art.TrainConfig(learning_rate=5e-5),
)
# clear all but the most recent and best-performing checkpoint on the train/reward metric
await model.delete_checkpoints(best_checkpoint_metric="train/reward")

# ~240MB of storage used by checkpoints
```

With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`.
87 changes: 59 additions & 28 deletions docs/fundamentals/art-backend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The

While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.


<div className="cards-container">

<div className="card-wrapper">
<Card
title="LocalBackend"
icon="laptop-code"
href="/fundamentals/art-backend#localbackend"
title="ServerlessBackend"
icon="bolt"
href="/fundamentals/art-backend#serverlessbackend"
horizontal={true}
arrow={true}
></Card>
Expand All @@ -27,32 +29,44 @@ While the backend's training and inference settings are highly configurable, the
arrow={true}
></Card>
</div>
<div className="card-wrapper">
<Card
title="LocalBackend"
icon="laptop-code"
href="/fundamentals/art-backend#localbackend"
horizontal={true}
arrow={true}
></Card>
</div>
</div>

## Remote or local training
## Managed, remote, or local training

ART provides two backend classes, `LocalBackend` and `SkyPilotBackend`. If your agent is already running on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` instead.
ART provides three backend classes:

Both `LocalBackend` and `SkyPilotBackend` implement the `art.Backend` class, and once initialized, the client interacts with them in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance.
* `ServerlessBackend` - train remotely on autoscaling GPUs
* `SkyPilotBackend` - train remotely on self-managed infra
* `LocalBackend` - run your agent and training code on the same machine

### LocalBackend
If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra.

The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand.

To declare a `LocalBackend` instance, follow the code sample below:
### ServerlessBackend

Setting up `ServerlessBackend` requires a W&B API key. Once you have one, you can provide it to `ServerlessBackend` either as an environment variable or initialization argument.

```python
from art.local import LocalBackend
from art.serverless.backend import ServerlessBackend

backend = LocalBackend(
# set to True if you want your backend to shut down automatically
# when your client process ends
in_process: False,
# local path where the backend will store trajectory logs and model weights
path: './.art',
backend = ServerlessBackend(
api_key="my-api-key",
# or set WANDB_API_KEY in the environment
)
```

As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.

### SkyPilotBackend

To use SkyPilotBackend, you'll need to install the optional dependency:
Expand Down Expand Up @@ -106,46 +120,63 @@ backend.down()
uv run sky down my-cluster
```

### LocalBackend

The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.

To declare a `LocalBackend` instance, follow the code sample below:

```python
from art.local import LocalBackend

backend = LocalBackend(
# set to True if you want your backend to shut down automatically
# when your client process ends
in_process: False,
# local path where the backend will store trajectory logs and model weights
path: './.art',
)
```

## Using a backend

Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.

```python
LOCAL = True
BACKEND_TYPE = "serverless"

if LOCAL:
from art.local import LocalBackend
backend = LocalBackend()
else:
if BACKEND_TYPE == "serverless":
from art.serverless.backend import ServerlessBackend
backend = await ServerlessBackend()
else if BACKEND_TYPE="remote":
from art.skypilot import SkyPilotBackend
backend = await SkyPilotBackend.initialize_cluster(
cluster_name="my-cluster",
gpu="H100"
)
else:
from art.local import LocalBackend
backend = LocalBackend()

model = art.TrainableModel(...)

await model.register(backend)

# ...training code...

if not LOCAL:
# optionally shut down the cluster when training finishes
await backend.down()
```

To see `LocalBackend` and `SkyPilotBackend` in action, try the examples below.
To see `LocalBackend` and `ServerlessBackend` in action, try the examples below.

<div className="cards-container">
<div className="card-wrapper">
<Card
title="2048 Notebook"
icon="laptop-code"
icon="bolt"
href="https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb"
horizontal={true}
arrow={true}
>
Use LocalBackend to train an agent to play 2048.
Use ServerlessBackend to train an agent to play 2048.
</Card>
</div>
<div className="card-wrapper">
Expand Down
Loading
Loading