OpenPipe
diff --git a/‎README.md‎
Lines changed: 26 additions & 16 deletions b/‎README.md‎
Lines changed: 26 additions & 16 deletions
diff --git a/‎docs/docs.json‎
Lines changed: 1 addition & 0 deletions b/‎docs/docs.json‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/features/checkpoint-deletion.mdx‎
Lines changed: 111 additions & 0 deletions b/‎docs/features/checkpoint-deletion.mdx‎
Lines changed: 111 additions & 0 deletions
diff --git a/‎docs/fundamentals/art-backend.mdx‎
Lines changed: 59 additions & 28 deletions b/‎docs/fundamentals/art-backend.mdx‎
Lines changed: 59 additions & 28 deletions
@@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO.
 
 </div>
 
-## 📏 RULER: Zero-Shot Agent Rewards
+## 🚀 W&B Training: Serverless RL
 
-**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**.
+**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.
 
 ✨ **Key Benefits:**
 
-- **2-3x faster development** - Skip reward function engineering entirely
-- **General-purpose** - Works across any task without modification
-- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
-- **Easy integration** - Drop-in replacement for manual reward functions
+- **40% lower cost** - Multiplexing on shared production-grade inference cluster
+- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs
+- **Zero infra headaches** - Fully managed infrastructure that stays healthy
+- **Instant deployment** - Every checkpoint instantly available via W&B Inference
 
 ```python
-# Before: Hours of reward engineering
-def complex_reward_function(trajectory):
-    # 50+ lines of careful scoring logic...
-    pass
-
-# After: One line with RULER
-judged_group = await ruler_score_group(group, "openai/o3")
+# Before: Hours of GPU setup and infra management
+# RuntimeError: CUDA error: out of memory 😢
+
+# After: Serverless RL with instant feedback
+from art.serverless.backend import ServerlessBackend
+
+model = art.TrainableModel(
+  project="voice-agent",
+  name="agent-001",
+  base_model="Qwen/Qwen2.5-14B-Instruct"
+)
+
+backend = ServerlessBackend(
+    api_key="your_wandb_api_key"
+)
+await model.register(backend)
+# Edit and iterate in minutes, not hours!
 ```
 
-[📖 Learn more about RULER →](https://art.openpipe.ai/fundamentals/ruler)
+[📖 Learn more about W&B Training →](https://docs.wandb.ai/guides/training)
 
 ## ART Overview
 
@@ -52,10 +62,10 @@ ART is an open-source RL framework that improves agent reliability by allowing L
 
 | Agent Task          | Example Notebook                                                                                                                       | Description                                         | Comparative Performance                                                                                                                                                                                                     |
 | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **ART•E [Serverless]**   | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb)                       | Qwen 2.5 14B learns to search emails using RULER     | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb)                              |
+| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb)                   | Qwen 2.5 14B learns to play 2048                     | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb)                                                |
 | **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb)   | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon]                                                                                                                                                                                                          |
 | **MCP•RL**          | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb)               | Qwen 2.5 3B masters the NWS MCP server              | [Link coming soon]                                                                                                                                                                                                          |
-| **ART•E [RULER]**   | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb)                       | Qwen 2.5 7B learns to search emails using RULER     | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb)                              |
-| **2048**            | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb)                   | Qwen 2.5 3B learns to play 2048                     | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb)                                                |
 | **Temporal Clue**   | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue           | [Link coming soon]                                                                                                                                                                                                          |
 | **Tic Tac Toe**     | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb)     | Qwen 2.5 3B learns to play Tic Tac Toe              | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/tic_tac_toe/display-benchmarks.ipynb)                            |
 | **Codenames**       | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb)      | Qwen 2.5 3B learns to play Codenames                | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" height="72"> [benchmarks](https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) |
 
@@ -64,6 +64,7 @@
         "group": "Features",
         "pages": [
           "features/checkpoint-forking",
+          "features/checkpoint-deletion",
           "features/additional-histories",
           "features/mcp-rl"
         ]
 
@@ -0,0 +1,111 @@
+---
+title: Deleting Checkpoints
+description: Learn how to automatically delete low-performing model checkpoints
+---
+
+Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints.
+
+## Deleting low-performing checkpoints
+
+To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below.
+
+```python
+import art
+# also works with LocalBackend and SkyPilotBackend
+from art.serverless.backend import ServerlessBackend
+
+model = art.TrainableModel(
+    name="agent-001",
+    project="checkpoint-deletion-demo",
+    base_model="Qwen/Qwen2.5-14B-Instruct",
+)
+backend = ServerlessBackend()
+# in order for the model to know where to look for its existing checkpoints,
+# we have to point it to the correct backend
+await model.register(backend)
+
+# deletes all but the most recent checkpoint
+# and the checkpoint with the highest val/reward
+await model.delete_checkpoints()
+```
+
+By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed.
+
+```python
+await model.delete_checkpoints(best_checkpoint_metric="train/eval_1_score")
+```
+
+Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution.
+
+
+## Deleting within a training loop
+
+Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts).
+
+```python
+
+import art
+from art.serverless.backend import ServerlessBackend
+
+from .rollout import rollout
+from .scenarios load_train_scenarios
+
+TRAINING_STEPS = 50
+
+model = art.TrainableModel(
+    name="agent-001",
+    project="checkpoint-deletion-demo",
+    base_model="Qwen/Qwen2.5-14B-Instruct",
+)
+backend = ServerlessBackend()
+await model.register(backend)
+
+
+train_scenarios = load_train_scenarios()
+
+# training loop
+for _step in range(await model.get_step(), TRAINING_STEPS):
+    train_groups = await art.gather_trajectory_groups(
+        (
+            art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
+            for scenario in train_scenarios
+        ),
+        pbar_desc=f"gather(train:{step})",
+    )
+    # trains model and automatically persists each LoRA as a W&B Artifact
+    # ~120MB per step
+    await model.train(
+        train_groups,
+        config=art.TrainConfig(learning_rate=5e-5),
+    )
+
+# ~6GB of storage used by checkpoints
+```
+
+However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step.
+
+
+```python
+...
+# training loop
+for _step in range(await model.get_step(), TRAINING_STEPS):
+    train_groups = await art.gather_trajectory_groups(
+        (
+            art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8))
+            for scenario in train_scenarios
+        ),
+        pbar_desc=f"gather(train:{step})",
+    )
+    # trains model and automatically persists each LoRA as a W&B Artifact
+    # ~120MB per step
+    await model.train(
+        train_groups,
+        config=art.TrainConfig(learning_rate=5e-5),
+    )
+    # clear all but the most recent and best-performing checkpoint on the train/reward metric
+    await model.delete_checkpoints(best_checkpoint_metric="train/reward")
+
+# ~240MB of storage used by checkpoints
+```
+
+With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`.
@@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The
 
 While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.
 
+
 <div className="cards-container">
+
   <div className="card-wrapper">
     <Card
-      title="LocalBackend"
-      icon="laptop-code"
-      href="/fundamentals/art-backend#localbackend"
+      title="ServerlessBackend"
+      icon="bolt"
+      href="/fundamentals/art-backend#serverlessbackend"
       horizontal={true}
       arrow={true}
     ></Card>
@@ -27,32 +29,44 @@ While the backend's training and inference settings are highly configurable, the
       arrow={true}
     ></Card>
   </div>
+  <div className="card-wrapper">
+    <Card
+      title="LocalBackend"
+      icon="laptop-code"
+      href="/fundamentals/art-backend#localbackend"
+      horizontal={true}
+      arrow={true}
+    ></Card>
+  </div>
 </div>
 
-## Remote or local training
+## Managed, remote, or local training
 
-ART provides two backend classes, `LocalBackend` and `SkyPilotBackend`. If your agent is already running on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` instead.
+ART provides three backend classes:
 
-Both `LocalBackend` and `SkyPilotBackend` implement the `art.Backend` class, and once initialized, the client interacts with them in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance.
+* `ServerlessBackend` - train remotely on autoscaling GPUs
+* `SkyPilotBackend` - train remotely on self-managed infra
+* `LocalBackend` - run your agent and training code on the same machine
 
-### LocalBackend
+If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra.
 
-The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
+All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand.
 
-To declare a `LocalBackend` instance, follow the code sample below:
+### ServerlessBackend
+
+Setting up `ServerlessBackend` requires a W&B API key. Once you have one, you can provide it to `ServerlessBackend` either as an environment variable or initialization argument.
 
 ```python
-from art.local import LocalBackend
+from art.serverless.backend import ServerlessBackend
 
-backend = LocalBackend(
-    # set to True if you want your backend to shut down automatically
-    # when your client process ends
-    in_process: False,
-    # local path where the backend will store trajectory logs and model weights
-    path: './.art',
+backend = ServerlessBackend(
+  api_key="my-api-key",
+  # or set WANDB_API_KEY in the environment
 )
 ```
 
+As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.
+
 ### SkyPilotBackend
 
 To use SkyPilotBackend, you'll need to install the optional dependency:
@@ -106,46 +120,63 @@ backend.down()
 uv run sky down my-cluster
 ```
 
+### LocalBackend
+
+The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
+
+To declare a `LocalBackend` instance, follow the code sample below:
+
+```python
+from art.local import LocalBackend
+
+backend = LocalBackend(
+    # set to True if you want your backend to shut down automatically
+    # when your client process ends
+    in_process: False,
+    # local path where the backend will store trajectory logs and model weights
+    path: './.art',
+)
+```
+
 ## Using a backend
 
 Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.
 
 ```python
-LOCAL = True
+BACKEND_TYPE = "serverless"
 
-if LOCAL:
-    from art.local import LocalBackend
-    backend = LocalBackend()
-else:
+if BACKEND_TYPE == "serverless":
+    from art.serverless.backend import ServerlessBackend
+    backend = await ServerlessBackend()
+else if BACKEND_TYPE="remote":
     from art.skypilot import SkyPilotBackend
     backend = await SkyPilotBackend.initialize_cluster(
         cluster_name="my-cluster",
         gpu="H100"
     )
+else:
+    from art.local import LocalBackend
+    backend = LocalBackend()
 
 model = art.TrainableModel(...)
 
 await model.register(backend)
 
 # ...training code...
-
-if not LOCAL:
-    # optionally shut down the cluster when training finishes
-    await backend.down()
 ```
 
-To see `LocalBackend` and `SkyPilotBackend` in action, try the examples below.
+To see `LocalBackend` and `ServerlessBackend` in action, try the examples below.
 
 <div className="cards-container">
   <div className="card-wrapper">
     <Card
       title="2048 Notebook"
-      icon="laptop-code"
+      icon="bolt"
       href="https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb"
       horizontal={true}
       arrow={true}
     >
-      Use LocalBackend to train an agent to play 2048.
+      Use ServerlessBackend to train an agent to play 2048.
     </Card>
   </div>
   <div className="card-wrapper">
Original file line number	Diff line number	Diff line change
`@@ -64,6 +64,7 @@`
`64`	`64`	`"group": "Features",`
`65`	`65`	`"pages": [`
`66`	`66`	`"features/checkpoint-forking",`
	`67`	`+ "features/checkpoint-deletion",`
`67`	`68`	`"features/additional-histories",`
`68`	`69`	`"features/mcp-rl"`
`69`	`70`	`]`