Skip to content

Commit e5a322e

Browse files
authored
WIP: More documentation (#432)
* Document checkpoint deletion * Document ServerlessBackend * Update quickstart * Update notebook order * Update notebooks in README * Update Installation + Setup * Update initial blurb * Simplify quickstart * Update import * Update example * Update 2048 notebook link * Await
1 parent 10437df commit e5a322e

File tree

9 files changed

+276
-91
lines changed

9 files changed

+276
-91
lines changed

README.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO.
2121

2222
</div>
2323

24-
## 📏 RULER: Zero-Shot Agent Rewards
24+
## 🚀 W&B Training: Serverless RL
2525

26-
**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**.
26+
**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.
2727

2828
**Key Benefits:**
2929

30-
- **2-3x faster development** - Skip reward function engineering entirely
31-
- **General-purpose** - Works across any task without modification
32-
- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
33-
- **Easy integration** - Drop-in replacement for manual reward functions
30+
- **40% lower cost** - Multiplexing on shared production-grade inference cluster
31+
- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs
32+
- **Zero infra headaches** - Fully managed infrastructure that stays healthy
33+
- **Instant deployment** - Every checkpoint instantly available via W&B Inference
3434

3535
```python
36-
# Before: Hours of reward engineering
37-
def complex_reward_function(trajectory):
38-
# 50+ lines of careful scoring logic...
39-
pass
40-
41-
# After: One line with RULER
42-
judged_group = await ruler_score_group(group, "openai/o3")
36+
# Before: Hours of GPU setup and infra management
37+
# RuntimeError: CUDA error: out of memory 😢
38+
39+
# After: Serverless RL with instant feedback
40+
from art.serverless.backend import ServerlessBackend
41+
42+
model = art.TrainableModel(
43+
project="voice-agent",
44+
name="agent-001",
45+
base_model="Qwen/Qwen2.5-14B-Instruct"
46+
)
47+
48+
backend = ServerlessBackend(
49+
api_key="your_wandb_api_key"
50+
)
51+
await model.register(backend)
52+
# Edit and iterate in minutes, not hours!
4353
```
4454

45-
[📖 Learn more about RULER ](https://art.openpipe.ai/fundamentals/ruler)
55+
[📖 Learn more about W&B Training ](https://docs.wandb.ai/guides/training)
4656

4757
## ART Overview
4858

@@ -52,10 +62,10 @@ ART is an open-source RL framework that improves agent reliability by allowing L
5262

5363
| Agent Task | Example Notebook | Description | Comparative Performance |
5464
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
65+
| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
66+
| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
5567
| **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
5668
| **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
57-
| **ART•E [RULER]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 7B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
58-
| **2048** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 3B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
5969
| **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |
6070
| **Tic Tac Toe** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/tic_tac_toe/display-benchmarks.ipynb) |
6171
| **Codenames** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" height="72"> [benchmarks](https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) |

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@
6464
"group": "Features",
6565
"pages": [
6666
"features/checkpoint-forking",
67+
"features/checkpoint-deletion",
6768
"features/additional-histories",
6869
"features/mcp-rl"
6970
]
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
title: Deleting Checkpoints
3+
description: Learn how to automatically delete low-performing model checkpoints
4+
---
5+
6+
Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints.
7+
8+
## Deleting low-performing checkpoints
9+
10+
To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below.
11+
12+
```python
13+
import art
14+
# also works with LocalBackend and SkyPilotBackend
15+
from art.serverless.backend import ServerlessBackend
16+
17+
model = art.TrainableModel(
18+
name="agent-001",
19+
project="checkpoint-deletion-demo",
20+
base_model="Qwen/Qwen2.5-14B-Instruct",
21+
)
22+
backend = ServerlessBackend()
23+
# in order for the model to know where to look for its existing checkpoints,
24+
# we have to point it to the correct backend
25+
await model.register(backend)
26+
27+
# deletes all but the most recent checkpoint
28+
# and the checkpoint with the highest val/reward
29+
await model.delete_checkpoints()
30+
```
31+
32+
By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed.
33+
34+
```python
35+
await model.delete_checkpoints(best_checkpoint_metric="train/eval_1_score")
36+
```
37+
38+
Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution.
39+
40+
41+
## Deleting within a training loop
42+
43+
Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts).
44+
45+
```python
46+
47+
import art
48+
from art.serverless.backend import ServerlessBackend
49+
50+
from .rollout import rollout
51+
from .scenarios load_train_scenarios
52+
53+
TRAINING_STEPS = 50
54+
55+
model = art.TrainableModel(
56+
name="agent-001",
57+
project="checkpoint-deletion-demo",
58+
base_model="Qwen/Qwen2.5-14B-Instruct",
59+
)
60+
backend = ServerlessBackend()
61+
await model.register(backend)
62+
63+
64+
train_scenarios = load_train_scenarios()
65+
66+
# training loop
67+
for _step in range(await model.get_step(), TRAINING_STEPS):
68+
train_groups = await art.gather_trajectory_groups(
69+
(
70+
art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
71+
for scenario in train_scenarios
72+
),
73+
pbar_desc=f"gather(train:{step})",
74+
)
75+
# trains model and automatically persists each LoRA as a W&B Artifact
76+
# ~120MB per step
77+
await model.train(
78+
train_groups,
79+
config=art.TrainConfig(learning_rate=5e-5),
80+
)
81+
82+
# ~6GB of storage used by checkpoints
83+
```
84+
85+
However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step.
86+
87+
88+
```python
89+
...
90+
# training loop
91+
for _step in range(await model.get_step(), TRAINING_STEPS):
92+
train_groups = await art.gather_trajectory_groups(
93+
(
94+
art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
95+
for scenario in train_scenarios
96+
),
97+
pbar_desc=f"gather(train:{step})",
98+
)
99+
# trains model and automatically persists each LoRA as a W&B Artifact
100+
# ~120MB per step
101+
await model.train(
102+
train_groups,
103+
config=art.TrainConfig(learning_rate=5e-5),
104+
)
105+
# clear all but the most recent and best-performing checkpoint on the train/reward metric
106+
await model.delete_checkpoints(best_checkpoint_metric="train/reward")
107+
108+
# ~240MB of storage used by checkpoints
109+
```
110+
111+
With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`.

docs/fundamentals/art-backend.mdx

Lines changed: 59 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The
88

99
While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.
1010

11+
1112
<div className="cards-container">
13+
1214
<div className="card-wrapper">
1315
<Card
14-
title="LocalBackend"
15-
icon="laptop-code"
16-
href="/fundamentals/art-backend#localbackend"
16+
title="ServerlessBackend"
17+
icon="bolt"
18+
href="/fundamentals/art-backend#serverlessbackend"
1719
horizontal={true}
1820
arrow={true}
1921
></Card>
@@ -27,32 +29,44 @@ While the backend's training and inference settings are highly configurable, the
2729
arrow={true}
2830
></Card>
2931
</div>
32+
<div className="card-wrapper">
33+
<Card
34+
title="LocalBackend"
35+
icon="laptop-code"
36+
href="/fundamentals/art-backend#localbackend"
37+
horizontal={true}
38+
arrow={true}
39+
></Card>
40+
</div>
3041
</div>
3142

32-
## Remote or local training
43+
## Managed, remote, or local training
3344

34-
ART provides two backend classes, `LocalBackend` and `SkyPilotBackend`. If your agent is already running on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` instead.
45+
ART provides three backend classes:
3546

36-
Both `LocalBackend` and `SkyPilotBackend` implement the `art.Backend` class, and once initialized, the client interacts with them in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance.
47+
* `ServerlessBackend` - train remotely on autoscaling GPUs
48+
* `SkyPilotBackend` - train remotely on self-managed infra
49+
* `LocalBackend` - run your agent and training code on the same machine
3750

38-
### LocalBackend
51+
If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra.
3952

40-
The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
53+
All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand.
4154

42-
To declare a `LocalBackend` instance, follow the code sample below:
55+
### ServerlessBackend
56+
57+
Setting up `ServerlessBackend` requires a W&B API key. Once you have one, you can provide it to `ServerlessBackend` either as an environment variable or initialization argument.
4358

4459
```python
45-
from art.local import LocalBackend
60+
from art.serverless.backend import ServerlessBackend
4661

47-
backend = LocalBackend(
48-
# set to True if you want your backend to shut down automatically
49-
# when your client process ends
50-
in_process: False,
51-
# local path where the backend will store trajectory logs and model weights
52-
path: './.art',
62+
backend = ServerlessBackend(
63+
api_key="my-api-key",
64+
# or set WANDB_API_KEY in the environment
5365
)
5466
```
5567

68+
As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.
69+
5670
### SkyPilotBackend
5771

5872
To use SkyPilotBackend, you'll need to install the optional dependency:
@@ -106,46 +120,63 @@ backend.down()
106120
uv run sky down my-cluster
107121
```
108122

123+
### LocalBackend
124+
125+
The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
126+
127+
To declare a `LocalBackend` instance, follow the code sample below:
128+
129+
```python
130+
from art.local import LocalBackend
131+
132+
backend = LocalBackend(
133+
# set to True if you want your backend to shut down automatically
134+
# when your client process ends
135+
in_process: False,
136+
# local path where the backend will store trajectory logs and model weights
137+
path: './.art',
138+
)
139+
```
140+
109141
## Using a backend
110142

111143
Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.
112144

113145
```python
114-
LOCAL = True
146+
BACKEND_TYPE = "serverless"
115147

116-
if LOCAL:
117-
from art.local import LocalBackend
118-
backend = LocalBackend()
119-
else:
148+
if BACKEND_TYPE == "serverless":
149+
from art.serverless.backend import ServerlessBackend
150+
backend = await ServerlessBackend()
151+
else if BACKEND_TYPE="remote":
120152
from art.skypilot import SkyPilotBackend
121153
backend = await SkyPilotBackend.initialize_cluster(
122154
cluster_name="my-cluster",
123155
gpu="H100"
124156
)
157+
else:
158+
from art.local import LocalBackend
159+
backend = LocalBackend()
125160

126161
model = art.TrainableModel(...)
127162

128163
await model.register(backend)
129164

130165
# ...training code...
131-
132-
if not LOCAL:
133-
# optionally shut down the cluster when training finishes
134-
await backend.down()
135166
```
136167

137-
To see `LocalBackend` and `SkyPilotBackend` in action, try the examples below.
168+
To see `LocalBackend` and `ServerlessBackend` in action, try the examples below.
138169

139170
<div className="cards-container">
140171
<div className="card-wrapper">
141172
<Card
142173
title="2048 Notebook"
143-
icon="laptop-code"
174+
icon="bolt"
144175
href="https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb"
145176
horizontal={true}
146177
arrow={true}
147178
>
148-
Use LocalBackend to train an agent to play 2048.
179+
Use ServerlessBackend to train an agent to play 2048.
149180
</Card>
150181
</div>
151182
<div className="card-wrapper">

0 commit comments

Comments
 (0)