diff --git a/README.md b/README.md
index ce38b7e6..103f7140 100644
--- a/README.md
+++ b/README.md
@@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO.
-## 📏 RULER: Zero-Shot Agent Rewards
+## 🚀 W&B Training: Serverless RL
-**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**.
+**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.
✨ **Key Benefits:**
-- **2-3x faster development** - Skip reward function engineering entirely
-- **General-purpose** - Works across any task without modification
-- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
-- **Easy integration** - Drop-in replacement for manual reward functions
+- **40% lower cost** - Multiplexing on shared production-grade inference cluster
+- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs
+- **Zero infra headaches** - Fully managed infrastructure that stays healthy
+- **Instant deployment** - Every checkpoint instantly available via W&B Inference
```python
-# Before: Hours of reward engineering
-def complex_reward_function(trajectory):
- # 50+ lines of careful scoring logic...
- pass
-
-# After: One line with RULER
-judged_group = await ruler_score_group(group, "openai/o3")
+# Before: Hours of GPU setup and infra management
+# RuntimeError: CUDA error: out of memory 😢
+
+# After: Serverless RL with instant feedback
+from art.serverless.backend import ServerlessBackend
+
+model = art.TrainableModel(
+ project="voice-agent",
+ name="agent-001",
+ base_model="Qwen/Qwen2.5-14B-Instruct"
+)
+
+backend = ServerlessBackend(
+ api_key="your_wandb_api_key"
+)
+await model.register(backend)
+# Edit and iterate in minutes, not hours!
```
-[📖 Learn more about RULER →](https://art.openpipe.ai/fundamentals/ruler)
+[📖 Learn more about W&B Training →](https://docs.wandb.ai/guides/training)
## ART Overview
@@ -52,10 +62,10 @@ ART is an open-source RL framework that improves agent reliability by allowing L
| Agent Task | Example Notebook | Description | Comparative Performance |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
+| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 |
[benchmarks](/examples/2048/display_benchmarks.ipynb) |
| **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
| **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
-| **ART•E [RULER]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 7B learns to search emails using RULER |
[benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
-| **2048** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 3B learns to play 2048 |
[benchmarks](/examples/2048/display_benchmarks.ipynb) |
| **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |
| **Tic Tac Toe** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe |
[benchmarks](/examples/tic_tac_toe/display-benchmarks.ipynb) |
| **Codenames** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames |
[benchmarks](https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) |
diff --git a/docs/docs.json b/docs/docs.json
index a520c93b..23409152 100644
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -64,6 +64,7 @@
"group": "Features",
"pages": [
"features/checkpoint-forking",
+ "features/checkpoint-deletion",
"features/additional-histories",
"features/mcp-rl"
]
diff --git a/docs/features/checkpoint-deletion.mdx b/docs/features/checkpoint-deletion.mdx
new file mode 100644
index 00000000..d58ddff4
--- /dev/null
+++ b/docs/features/checkpoint-deletion.mdx
@@ -0,0 +1,111 @@
+---
+title: Deleting Checkpoints
+description: Learn how to automatically delete low-performing model checkpoints
+---
+
+Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints.
+
+## Deleting low-performing checkpoints
+
+To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below.
+
+```python
+import art
+# also works with LocalBackend and SkyPilotBackend
+from art.serverless.backend import ServerlessBackend
+
+model = art.TrainableModel(
+ name="agent-001",
+ project="checkpoint-deletion-demo",
+ base_model="Qwen/Qwen2.5-14B-Instruct",
+)
+backend = ServerlessBackend()
+# in order for the model to know where to look for its existing checkpoints,
+# we have to point it to the correct backend
+await model.register(backend)
+
+# deletes all but the most recent checkpoint
+# and the checkpoint with the highest val/reward
+await model.delete_checkpoints()
+```
+
+By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed.
+
+```python
+await model.delete_checkpoints(best_checkpoint_metric="train/eval_1_score")
+```
+
+Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution.
+
+
+## Deleting within a training loop
+
+Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts).
+
+```python
+
+import art
+from art.serverless.backend import ServerlessBackend
+
+from .rollout import rollout
+from .scenarios load_train_scenarios
+
+TRAINING_STEPS = 50
+
+model = art.TrainableModel(
+ name="agent-001",
+ project="checkpoint-deletion-demo",
+ base_model="Qwen/Qwen2.5-14B-Instruct",
+)
+backend = ServerlessBackend()
+await model.register(backend)
+
+
+train_scenarios = load_train_scenarios()
+
+# training loop
+for _step in range(await model.get_step(), TRAINING_STEPS):
+ train_groups = await art.gather_trajectory_groups(
+ (
+ art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
+ for scenario in train_scenarios
+ ),
+ pbar_desc=f"gather(train:{step})",
+ )
+ # trains model and automatically persists each LoRA as a W&B Artifact
+ # ~120MB per step
+ await model.train(
+ train_groups,
+ config=art.TrainConfig(learning_rate=5e-5),
+ )
+
+# ~6GB of storage used by checkpoints
+```
+
+However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step.
+
+
+```python
+...
+# training loop
+for _step in range(await model.get_step(), TRAINING_STEPS):
+ train_groups = await art.gather_trajectory_groups(
+ (
+ art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
+ for scenario in train_scenarios
+ ),
+ pbar_desc=f"gather(train:{step})",
+ )
+ # trains model and automatically persists each LoRA as a W&B Artifact
+ # ~120MB per step
+ await model.train(
+ train_groups,
+ config=art.TrainConfig(learning_rate=5e-5),
+ )
+ # clear all but the most recent and best-performing checkpoint on the train/reward metric
+ await model.delete_checkpoints(best_checkpoint_metric="train/reward")
+
+# ~240MB of storage used by checkpoints
+```
+
+With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`.
diff --git a/docs/fundamentals/art-backend.mdx b/docs/fundamentals/art-backend.mdx
index 178b6d39..31f2ce51 100644
--- a/docs/fundamentals/art-backend.mdx
+++ b/docs/fundamentals/art-backend.mdx
@@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The
While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.
+
|
diff --git a/docs/getting-started/quick-start.mdx b/docs/getting-started/quick-start.mdx
index f2ab23ee..9d637082 100644
--- a/docs/getting-started/quick-start.mdx
+++ b/docs/getting-started/quick-start.mdx
@@ -4,7 +4,7 @@ description: "Get started with ART in a few quick steps."
icon: "forward"
---
-In this Quick Start tutorial, we'll be training Qwen 2.5 3B to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills.
+In this Quick Start tutorial, we'll be training Qwen 2.5 14B to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills.