diff --git a/README.md b/README.md index ce38b7e6..103f7140 100644 --- a/README.md +++ b/README.md @@ -21,28 +21,38 @@ Train multi-step agents for real-world tasks using GRPO. -## 📏 RULER: Zero-Shot Agent Rewards +## 🚀 W&B Training: Serverless RL -**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—**no labeled data, expert feedback, or reward engineering required**. +**W&B Training (Serverless RL)** is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps. ✨ **Key Benefits:** -- **2-3x faster development** - Skip reward function engineering entirely -- **General-purpose** - Works across any task without modification -- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks -- **Easy integration** - Drop-in replacement for manual reward functions +- **40% lower cost** - Multiplexing on shared production-grade inference cluster +- **28% faster training** - Scale to 2000+ concurrent requests across many GPUs +- **Zero infra headaches** - Fully managed infrastructure that stays healthy +- **Instant deployment** - Every checkpoint instantly available via W&B Inference ```python -# Before: Hours of reward engineering -def complex_reward_function(trajectory): - # 50+ lines of careful scoring logic... - pass - -# After: One line with RULER -judged_group = await ruler_score_group(group, "openai/o3") +# Before: Hours of GPU setup and infra management +# RuntimeError: CUDA error: out of memory 😢 + +# After: Serverless RL with instant feedback +from art.serverless.backend import ServerlessBackend + +model = art.TrainableModel( + project="voice-agent", + name="agent-001", + base_model="Qwen/Qwen2.5-14B-Instruct" +) + +backend = ServerlessBackend( + api_key="your_wandb_api_key" +) +await model.register(backend) +# Edit and iterate in minutes, not hours! ``` -[📖 Learn more about RULER →](https://art.openpipe.ai/fundamentals/ruler) +[📖 Learn more about W&B Training →](https://docs.wandb.ai/guides/training) ## ART Overview @@ -52,10 +62,10 @@ ART is an open-source RL framework that improves agent reliability by allowing L | Agent Task | Example Notebook | Description | Comparative Performance | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) | +| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | [benchmarks](/examples/2048/display_benchmarks.ipynb) | | **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] | | **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] | -| **ART•E [RULER]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 7B learns to search emails using RULER | [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) | -| **2048** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 3B learns to play 2048 | [benchmarks](/examples/2048/display_benchmarks.ipynb) | | **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] | | **Tic Tac Toe** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe | [benchmarks](/examples/tic_tac_toe/display-benchmarks.ipynb) | | **Codenames** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames | [benchmarks](https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | diff --git a/docs/docs.json b/docs/docs.json index a520c93b..23409152 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -64,6 +64,7 @@ "group": "Features", "pages": [ "features/checkpoint-forking", + "features/checkpoint-deletion", "features/additional-histories", "features/mcp-rl" ] diff --git a/docs/features/checkpoint-deletion.mdx b/docs/features/checkpoint-deletion.mdx new file mode 100644 index 00000000..d58ddff4 --- /dev/null +++ b/docs/features/checkpoint-deletion.mdx @@ -0,0 +1,111 @@ +--- +title: Deleting Checkpoints +description: Learn how to automatically delete low-performing model checkpoints +--- + +Training jobs can run for thousands of steps, and each step generates a new model checkpoint. For most training runs, these checkpoints are LoRAs that takes up 80-150MB of disk space. To reduce storage overhead and preserve only the best checkpoint from your runs, you can set up automatic deletion of all but your best-performing and most recent checkpoints. + +## Deleting low-performing checkpoints + +To delete all but the most recent and best-performing checkpoints of a model, call the `delete_checkpoints` method as shown below. + +```python +import art +# also works with LocalBackend and SkyPilotBackend +from art.serverless.backend import ServerlessBackend + +model = art.TrainableModel( + name="agent-001", + project="checkpoint-deletion-demo", + base_model="Qwen/Qwen2.5-14B-Instruct", +) +backend = ServerlessBackend() +# in order for the model to know where to look for its existing checkpoints, +# we have to point it to the correct backend +await model.register(backend) + +# deletes all but the most recent checkpoint +# and the checkpoint with the highest val/reward +await model.delete_checkpoints() +``` + +By default, `delete_checkpoints` ranks existing checkpoints by their `val/reward` score and erases all but the highest-performing and most recent. However, `delete_checkpoints` can be configured to use any metric that it is passed. + +```python +await model.delete_checkpoints(best_checkpoint_metric="train/eval_1_score") +``` + +Keep in mind that once checkpoints are deleted, they generally cannot be recovered, so use this method with caution. + + +## Deleting within a training loop + +Below is a simple example of a training loop that trains a model for 50 steps before exiting. By default, the LoRA checkpoint generated by each step will automatically be saved in the storage mechanism your backend uses (in this case W&B Artifacts). + +```python + +import art +from art.serverless.backend import ServerlessBackend + +from .rollout import rollout +from .scenarios load_train_scenarios + +TRAINING_STEPS = 50 + +model = art.TrainableModel( + name="agent-001", + project="checkpoint-deletion-demo", + base_model="Qwen/Qwen2.5-14B-Instruct", +) +backend = ServerlessBackend() +await model.register(backend) + + +train_scenarios = load_train_scenarios() + +# training loop +for _step in range(await model.get_step(), TRAINING_STEPS): + train_groups = await art.gather_trajectory_groups( + ( + art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8)) + for scenario in train_scenarios + ), + pbar_desc=f"gather(train:{step})", + ) + # trains model and automatically persists each LoRA as a W&B Artifact + # ~120MB per step + await model.train( + train_groups, + config=art.TrainConfig(learning_rate=5e-5), + ) + +# ~6GB of storage used by checkpoints +``` + +However, since each LoRA checkpoint generated by this training run is ~120MB, in total this training run will require ~6GB of storage for the model checkpoints alone. To reduce our storage overhead, let's implement checkpoint deletion on each step. + + +```python +... +# training loop +for _step in range(await model.get_step(), TRAINING_STEPS): + train_groups = await art.gather_trajectory_groups( + ( + art.TrajectoryGroup(rollout(model, scenario, step) for _ in range(8)) + for scenario in train_scenarios + ), + pbar_desc=f"gather(train:{step})", + ) + # trains model and automatically persists each LoRA as a W&B Artifact + # ~120MB per step + await model.train( + train_groups, + config=art.TrainConfig(learning_rate=5e-5), + ) + # clear all but the most recent and best-performing checkpoint on the train/reward metric + await model.delete_checkpoints(best_checkpoint_metric="train/reward") + +# ~240MB of storage used by checkpoints +``` + +With this change, we've reduced the total amount of storage used by checkpoints from 6GB to 240MB, while preserving the checkpoint that performed the best on `train/reward`. diff --git a/docs/fundamentals/art-backend.mdx b/docs/fundamentals/art-backend.mdx index 178b6d39..31f2ce51 100644 --- a/docs/fundamentals/art-backend.mdx +++ b/docs/fundamentals/art-backend.mdx @@ -8,12 +8,14 @@ ART divides the logic for training an agent into two distinct abstractions. The While the backend's training and inference settings are highly configurable, they're also set up to use **intelligent defaults** that save beginners time while getting started. However, there are a few important considerations to take before running your first training job. +
+
@@ -27,32 +29,44 @@ While the backend's training and inference settings are highly configurable, the arrow={true} >
+
+ +
-## Remote or local training +## Managed, remote, or local training -ART provides two backend classes, `LocalBackend` and `SkyPilotBackend`. If your agent is already running on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` instead. +ART provides three backend classes: -Both `LocalBackend` and `SkyPilotBackend` implement the `art.Backend` class, and once initialized, the client interacts with them in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. +* `ServerlessBackend` - train remotely on autoscaling GPUs +* `SkyPilotBackend` - train remotely on self-managed infra +* `LocalBackend` - run your agent and training code on the same machine -### LocalBackend +If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra. -The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU. +All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand. -To declare a `LocalBackend` instance, follow the code sample below: +### ServerlessBackend + +Setting up `ServerlessBackend` requires a W&B API key. Once you have one, you can provide it to `ServerlessBackend` either as an environment variable or initialization argument. ```python -from art.local import LocalBackend +from art.serverless.backend import ServerlessBackend -backend = LocalBackend( - # set to True if you want your backend to shut down automatically - # when your client process ends - in_process: False, - # local path where the backend will store trajectory logs and model weights - path: './.art', +backend = ServerlessBackend( + api_key="my-api-key", + # or set WANDB_API_KEY in the environment ) ``` +As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference. + ### SkyPilotBackend To use SkyPilotBackend, you'll need to install the optional dependency: @@ -106,46 +120,63 @@ backend.down() uv run sky down my-cluster ``` +### LocalBackend + +The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU. + +To declare a `LocalBackend` instance, follow the code sample below: + +```python +from art.local import LocalBackend + +backend = LocalBackend( + # set to True if you want your backend to shut down automatically + # when your client process ends + in_process: False, + # local path where the backend will store trajectory logs and model weights + path: './.art', +) +``` + ## Using a backend Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely. ```python -LOCAL = True +BACKEND_TYPE = "serverless" -if LOCAL: - from art.local import LocalBackend - backend = LocalBackend() -else: +if BACKEND_TYPE == "serverless": + from art.serverless.backend import ServerlessBackend + backend = await ServerlessBackend() +else if BACKEND_TYPE="remote": from art.skypilot import SkyPilotBackend backend = await SkyPilotBackend.initialize_cluster( cluster_name="my-cluster", gpu="H100" ) +else: + from art.local import LocalBackend + backend = LocalBackend() model = art.TrainableModel(...) await model.register(backend) # ...training code... - -if not LOCAL: - # optionally shut down the cluster when training finishes - await backend.down() ``` -To see `LocalBackend` and `SkyPilotBackend` in action, try the examples below. +To see `LocalBackend` and `ServerlessBackend` in action, try the examples below.
- Use LocalBackend to train an agent to play 2048. + Use ServerlessBackend to train an agent to play 2048.
diff --git a/docs/fundamentals/art-client.mdx b/docs/fundamentals/art-client.mdx index 807177f1..431f78d9 100644 --- a/docs/fundamentals/art-client.mdx +++ b/docs/fundamentals/art-client.mdx @@ -8,30 +8,35 @@ One of ART's primary goals is to minimize the amount of setup necessary to begin If you're curious about how ART allows you to run training and inference either remotely or locally depending on your development machine, check out the backend docs below. Otherwise, let's dive deeper into the client! -
-
- - Run training and inference on your local machine. - -
-
- - Run training and inference on a separate ephemeral machine. - -
-
+ + + Run training and inference on autoscaling GPUs. + + + Run training and inference on a separate ephemeral machine. + + + Run training and inference on your local machine. + + ## Initializing the client @@ -55,14 +60,17 @@ model = art.TrainableModel( Once you've initialized your [backend](/fundamentals/art-backend), you can register it with your model. This sets up all the wiring to run inference and training. ```python -# local training -backend = art.LocalBackend() +# managed training +backend = ServerlessBackend() # remote training backend = SkyPilotBackend.initialize_cluster( cluster_name="art", gpu="H100" ) +# local training +backend = LocalBackend() + await model.register(backend) ``` diff --git a/docs/getting-started/about.mdx b/docs/getting-started/about.mdx index 31e648ba..d08a6727 100644 --- a/docs/getting-started/about.mdx +++ b/docs/getting-started/about.mdx @@ -55,6 +55,7 @@ Our docs will guide you through the process of training your own agents to opera - **Train from anywhere.** Run the ART client on your laptop and let the ART server kick off an ephemeral GPU-enabled environment, or run on a local GPU. - Integrations with hosted platforms like W&B, Langfuse, and OpenPipe provide flexible observability and **simplify debugging**. - ART is customizable with **intelligent defaults**. You can configure training parameters and inference engine configurations to meet specific needs, or take advantage of the defaults, which have been optimized for training efficiency and stability. +- Direct integration with autoscaling GPUs through [W&B Training](https://docs.wandb.ai/guides/training/), making training and inference **faster** and **cheaper**. ## Installation diff --git a/docs/getting-started/installation-setup.mdx b/docs/getting-started/installation-setup.mdx index 27cfbc9b..43aa3475 100644 --- a/docs/getting-started/installation-setup.mdx +++ b/docs/getting-started/installation-setup.mdx @@ -12,7 +12,7 @@ The ART client can be installed into projects designed to run on any machine tha pip install openpipe-art ``` -### Running the client and server locally +### Running the server locally The ART server can be run locally on any machine with a GPU. To install the backend dependencies required for training and inference, you can install the `backend` extra: @@ -39,7 +39,33 @@ await model.register(backend) ... the rest of your code ... ``` -### Running the client locally and connecting to a remote server +### Using a managed autoscaling backend + +Instead of managing the GPUs and training processes yourself, you can optionally send inference and training requests to the W&B Training cluster, which autoscales to match your job's demand. To do so, install `openpipe-art` without any extras and use `ServerlessBackend`: + +```bash +pip install openpipe-art +``` + + +```python +from art import TrainableModel, gather_trajectory_groups +from art.serverless.backend import ServerlessBackend + +backend = ServerlessBackend() + +model = TrainableModel( + name="agent-001", + project="my-agentic-task", + base_model="Qwen/Qwen2.5-14B-Instruct", +) + +await model.register(backend) + +... the rest of your code ... +``` + +### Running the server on remote dedicated GPUs The ART client can also be run locally and connected to a remote server, which ART will automatically provision for you. To use SkyPilot, you'll need to install the optional dependency: diff --git a/docs/getting-started/notebooks.mdx b/docs/getting-started/notebooks.mdx index 7e88e89e..a3e5477a 100644 --- a/docs/getting-started/notebooks.mdx +++ b/docs/getting-started/notebooks.mdx @@ -9,10 +9,10 @@ icon: "book" | Agent Task | Notebook | Description | Performance | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | | +| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | | | **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] | | **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] | -| **ART•E [RULER]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 7B learns to search emails using RULER | | -| **2048** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 3B learns to play 2048 | | | **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] | | **Tic Tac Toe** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe | | | **Codenames** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames | | diff --git a/docs/getting-started/quick-start.mdx b/docs/getting-started/quick-start.mdx index f2ab23ee..9d637082 100644 --- a/docs/getting-started/quick-start.mdx +++ b/docs/getting-started/quick-start.mdx @@ -4,7 +4,7 @@ description: "Get started with ART in a few quick steps." icon: "forward" --- -In this Quick Start tutorial, we'll be training Qwen 2.5 3B to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills. +In this Quick Start tutorial, we'll be training Qwen 2.5 14B to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills. @@ -16,29 +16,26 @@ Total cost: Free! -## Step 1: Provision optional API keys +## Step 1: Provision W&B API key -[ART](https://github.com/OpenPipe/art) is an open source library and does not require an API key to run in Google Colab, which we'll use for this quick start. But if you're interested in seeing the progress of your training runs and inspecting your model's completions as it trains, you can optionally provision API keys from platforms like Weights & Biases. +[ART](https://github.com/OpenPipe/art) is an open source library and works across infra and observability providers. To keep things simple in this tutorial, we'll exclusively use Weights & Biases services, which means we'll only need to provision one API key. We'll use these services: -If you'd like to enable observability while working through this guide, create a W&B account and provision an API key. +* **W&B Training** - autoscale GPUs for inference and training +* **W&B Models** - record metrics like reward +* **W&B Weave** - record your model's traces as it generates completions +* **W&B Artifacts** - store and manage your model's checkpoints -- [Weights & Biases](https://wandb.ai/home) - -Once you have your Weights & Biases API key, open the [notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) in Google Colab and set them in the **Environment Variables** cell. +Weights & Biases currently provides a small free tier for all the services we'll use during this quickstart, so you shouldn't need to add a credit card to get started. -Once your API keys are set, or if you won't need observability while completing this walkthrough, continue on to the next step. +- [Weights & Biases](https://wandb.ai/home) -## Step 2: Prepare your notebook +Once you have your Weights & Biases API key, open the [notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) in Google Colab and set it in the **Environment Variables** cell. Then continue on to the next step. -If you haven't already, open the [notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) in Google Colab and connect to a T4 runtime environment. +## Step 2: Run the notebook - - In the top bar of your Google Colab notebook, find *Runtime* > *Change runtime - type* to open the selection modal. Select **T4 GPU** and hit **Save**. - +At the top of the [notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) you should see a small **Run all** button. Press it to begin training your model. -## Step 3: Run the notebook -From here on out, you can follow the instructions in the notebook! While training is occuring, remember to track your progress in [Weights & Biases](https://wandb.ai). +## Step 3: Track metrics -If you have questions along the way, please ask in the [Discord](https://discord.gg/zbBHRUpwf4). Happy training! +While your run progresses, observe its traces and metrics in your [W&B workspace](https://wandb.ai/home). You should start seeing some progress in the first 20-30 steps. If you have questions along the way, please ask in the [Discord](https://discord.gg/zbBHRUpwf4). Happy training!