You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-16Lines changed: 26 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO.
21
21
22
22
</div>
23
23
24
-
## 📏 RULER: Zero-Shot Agent Rewards
24
+
## 🚀 W&B Training: Serverless RL
25
25
26
-
**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**.
26
+
**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.
27
27
28
28
✨ **Key Benefits:**
29
29
30
-
-**2-3x faster development** - Skip reward function engineering entirely
31
-
-**General-purpose** - Works across any task without modification
32
-
-**Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
33
-
-**Easy integration** - Drop-in replacement for manual reward functions
30
+
-**40% lower cost** - Multiplexing on shared production-grade inference cluster
31
+
-**28% faster training** - Scale to 2000+ concurrent requests across many GPUs
32
+
-**Zero infra headaches** - Fully managed infrastructure that stays healthy
33
+
-**Instant deployment** - Every checkpoint instantly available via W&B Inference
description: Learn how to automatically delete low-performing model checkpoints
4
+
---
5
+
6
+
Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints.
7
+
8
+
## Deleting low-performing checkpoints
9
+
10
+
To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below.
11
+
12
+
```python
13
+
import art
14
+
# also works with LocalBackend and SkyPilotBackend
15
+
from art.serverless.backend import ServerlessBackend
16
+
17
+
model = art.TrainableModel(
18
+
name="agent-001",
19
+
project="checkpoint-deletion-demo",
20
+
base_model="Qwen/Qwen2.5-14B-Instruct",
21
+
)
22
+
backend = ServerlessBackend()
23
+
# in order for the model to know where to look for its existing checkpoints,
24
+
# we have to point it to the correct backend
25
+
await model.register(backend)
26
+
27
+
# deletes all but the most recent checkpoint
28
+
# and the checkpoint with the highest val/reward
29
+
await model.delete_checkpoints()
30
+
```
31
+
32
+
By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed.
Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution.
39
+
40
+
41
+
## Deleting within a training loop
42
+
43
+
Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts).
44
+
45
+
```python
46
+
47
+
import art
48
+
from art.serverless.backend import ServerlessBackend
49
+
50
+
from .rollout import rollout
51
+
from .scenarios load_train_scenarios
52
+
53
+
TRAINING_STEPS=50
54
+
55
+
model = art.TrainableModel(
56
+
name="agent-001",
57
+
project="checkpoint-deletion-demo",
58
+
base_model="Qwen/Qwen2.5-14B-Instruct",
59
+
)
60
+
backend = ServerlessBackend()
61
+
await model.register(backend)
62
+
63
+
64
+
train_scenarios = load_train_scenarios()
65
+
66
+
# training loop
67
+
for _step inrange(await model.get_step(), TRAINING_STEPS):
68
+
train_groups =await art.gather_trajectory_groups(
69
+
(
70
+
art.TrajectoryGroup(rollout(model, scenario, step) for _ inrange(8))
71
+
for scenario in train_scenarios
72
+
),
73
+
pbar_desc=f"gather(train:{step})",
74
+
)
75
+
# trains model and automatically persists each LoRA as a W&B Artifact
76
+
# ~120MB per step
77
+
await model.train(
78
+
train_groups,
79
+
config=art.TrainConfig(learning_rate=5e-5),
80
+
)
81
+
82
+
# ~6GB of storage used by checkpoints
83
+
```
84
+
85
+
However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step.
86
+
87
+
88
+
```python
89
+
...
90
+
# training loop
91
+
for _step inrange(await model.get_step(), TRAINING_STEPS):
92
+
train_groups =await art.gather_trajectory_groups(
93
+
(
94
+
art.TrajectoryGroup(rollout(model, scenario, step) for _ inrange(8))
95
+
for scenario in train_scenarios
96
+
),
97
+
pbar_desc=f"gather(train:{step})",
98
+
)
99
+
# trains model and automatically persists each LoRA as a W&B Artifact
100
+
# ~120MB per step
101
+
await model.train(
102
+
train_groups,
103
+
config=art.TrainConfig(learning_rate=5e-5),
104
+
)
105
+
# clear all but the most recent and best-performing checkpoint on the train/reward metric
With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`.
Copy file name to clipboardExpand all lines: docs/fundamentals/art-backend.mdx
+59-28Lines changed: 59 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The
8
8
9
9
While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.
@@ -27,32 +29,44 @@ While the backend's training and inference settings are highly configurable, the
27
29
arrow={true}
28
30
></Card>
29
31
</div>
32
+
<divclassName="card-wrapper">
33
+
<Card
34
+
title="LocalBackend"
35
+
icon="laptop-code"
36
+
href="/fundamentals/art-backend#localbackend"
37
+
horizontal={true}
38
+
arrow={true}
39
+
></Card>
40
+
</div>
30
41
</div>
31
42
32
-
## Remote or local training
43
+
## Managed, remote, or local training
33
44
34
-
ART provides two backend classes, `LocalBackend` and `SkyPilotBackend`. If your agent is already running on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` instead.
45
+
ART provides three backend classes:
35
46
36
-
Both `LocalBackend` and `SkyPilotBackend` implement the `art.Backend` class, and once initialized, the client interacts with them in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance.
47
+
*`ServerlessBackend` - train remotely on autoscaling GPUs
48
+
*`SkyPilotBackend` - train remotely on self-managed infra
49
+
*`LocalBackend` - run your agent and training code on the same machine
37
50
38
-
### LocalBackend
51
+
If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra.
39
52
40
-
The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
53
+
All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand.
41
54
42
-
To declare a `LocalBackend` instance, follow the code sample below:
55
+
### ServerlessBackend
56
+
57
+
Setting up `ServerlessBackend` requires a W&B API key. Once you have one, you can provide it to `ServerlessBackend` either as an environment variable or initialization argument.
43
58
44
59
```python
45
-
from art.localimportLocalBackend
60
+
from art.serverless.backendimportServerlessBackend
46
61
47
-
backend = LocalBackend(
48
-
# set to True if you want your backend to shut down automatically
49
-
# when your client process ends
50
-
in_process: False,
51
-
# local path where the backend will store trajectory logs and model weights
52
-
path: './.art',
62
+
backend = ServerlessBackend(
63
+
api_key="my-api-key",
64
+
# or set WANDB_API_KEY in the environment
53
65
)
54
66
```
55
67
68
+
As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.
69
+
56
70
### SkyPilotBackend
57
71
58
72
To use SkyPilotBackend, you'll need to install the optional dependency:
@@ -106,46 +120,63 @@ backend.down()
106
120
uv run sky down my-cluster
107
121
```
108
122
123
+
### LocalBackend
124
+
125
+
The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
126
+
127
+
To declare a `LocalBackend` instance, follow the code sample below:
128
+
129
+
```python
130
+
from art.local import LocalBackend
131
+
132
+
backend = LocalBackend(
133
+
# set to True if you want your backend to shut down automatically
134
+
# when your client process ends
135
+
in_process: False,
136
+
# local path where the backend will store trajectory logs and model weights
137
+
path: './.art',
138
+
)
139
+
```
140
+
109
141
## Using a backend
110
142
111
143
Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.
112
144
113
145
```python
114
-
LOCAL=True
146
+
BACKEND_TYPE="serverless"
115
147
116
-
ifLOCAL:
117
-
from art.localimportLocalBackend
118
-
backend =LocalBackend()
119
-
else:
148
+
ifBACKEND_TYPE=="serverless":
149
+
from art.serverless.backendimportServerlessBackend
0 commit comments