Skip to content

Commit 303cb64

Browse files
authored
Document Qwen3 14B instead of Qwen2.5 14B in non-tutorial examples (#437)
* Update docs to reference Qwen3 14B Instruct for most examples * Update notebooks
1 parent f51e1e4 commit 303cb64

File tree

10 files changed

+38
-30
lines changed

10 files changed

+38
-30
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ from art.serverless.backend import ServerlessBackend
4242
model = art.TrainableModel(
4343
project="voice-agent",
4444
name="agent-001",
45-
base_model="Qwen/Qwen2.5-14B-Instruct"
45+
base_model="OpenPipe/Qwen3-14B-Instruct"
4646
)
4747

4848
backend = ServerlessBackend(
@@ -62,8 +62,8 @@ ART is an open-source RL framework that improves agent reliability by allowing L
6262

6363
| Agent Task | Example Notebook | Description | Comparative Performance |
6464
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
65-
| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
66-
| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
65+
| **ART•E [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen3 14B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> [benchmarks](/dev/art-e/art_e/evaluate/display_benchmarks.ipynb) |
66+
| **2048 [Serverless]** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen3 14B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> [benchmarks](/examples/2048/display_benchmarks.ipynb) |
6767
| **ART•E LangGraph** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
6868
| **MCP•RL** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
6969
| **Temporal Clue** | [🏋️ Train agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |

docs/features/checkpoint-deletion.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ from art.serverless.backend import ServerlessBackend
1717
model = art.TrainableModel(
1818
name="agent-001",
1919
project="checkpoint-deletion-demo",
20-
base_model="Qwen/Qwen2.5-14B-Instruct",
20+
base_model="OpenPipe/Qwen3-14B-Instruct",
2121
)
2222
backend = ServerlessBackend()
2323
# in order for the model to know where to look for its existing checkpoints,
@@ -55,7 +55,7 @@ TRAINING_STEPS = 50
5555
model = art.TrainableModel(
5656
name="agent-001",
5757
project="checkpoint-deletion-demo",
58-
base_model="Qwen/Qwen2.5-14B-Instruct",
58+
base_model="OpenPipe/Qwen3-14B-Instruct",
5959
)
6060
backend = ServerlessBackend()
6161
await model.register(backend)

docs/features/checkpoint-forking.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ async def train():
3636
model = art.TrainableModel(
3737
name="my-model-v2",
3838
project="my-project",
39-
base_model="Qwen/Qwen2.5-14B-Instruct",
39+
base_model="OpenPipe/Qwen3-14B-Instruct",
4040
)
4141

4242
# Copy the checkpoint from another model
@@ -104,14 +104,14 @@ Here's a practical example of using checkpoint forking to test a lower learning
104104
base_model = art.TrainableModel(
105105
name="summarizer-base",
106106
project="experiments",
107-
base_model="Qwen/Qwen2.5-14B-Instruct",
107+
base_model="OpenPipe/Qwen3-14B-Instruct",
108108
)
109109

110110
# Fork at step 1000 to try lower learning rate
111111
low_lr_model = art.TrainableModel(
112112
name="summarizer-low-lr",
113113
project="experiments",
114-
base_model="Qwen/Qwen2.5-14B-Instruct",
114+
base_model="OpenPipe/Qwen3-14B-Instruct",
115115
)
116116

117117
async def experiment():

docs/features/mcp-rl.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,8 @@ from art.rewards import ruler_score_group
104104
from art import gather_trajectory_groups
105105

106106
# Initialize the model
107-
model = art.RemoteModel(
108-
model="Qwen/Qwen2.5-3B-Instruct",
107+
model = art.TrainableModel(
108+
model="OpenPipe/Qwen3-14B-Instruct",
109109
openrouter_api_key="your_openrouter_key"
110110
)
111111

docs/fundamentals/art-client.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ model = art.TrainableModel(
5353
# for a given task to consistently group metrics
5454
project="my-agentic-task",
5555
# the model that you want to train from
56-
base_model="Qwen/Qwen2.5-14B-Instruct",
56+
base_model="OpenPipe/Qwen3-14B-Instruct",
5757
)
5858
```
5959

docs/getting-started/installation-setup.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ backend = LocalBackend()
3131
model = TrainableModel(
3232
name="agent-001",
3333
project="my-agentic-task",
34-
base_model="Qwen/Qwen2.5-14B-Instruct",
34+
base_model="OpenPipe/Qwen3-14B-Instruct",
3535
)
3636

3737
await model.register(backend)
@@ -57,7 +57,7 @@ backend = ServerlessBackend()
5757
model = TrainableModel(
5858
name="agent-001",
5959
project="my-agentic-task",
60-
base_model="Qwen/Qwen2.5-14B-Instruct",
60+
base_model="OpenPipe/Qwen3-14B-Instruct",
6161
)
6262

6363
await model.register(backend)
@@ -87,7 +87,7 @@ backend = await SkyPilotBackend.initialize_cluster(
8787
model = TrainableModel(
8888
name="agent-001",
8989
project="my-agentic-task",
90-
base_model="Qwen/Qwen2.5-14B-Instruct",
90+
base_model="OpenPipe/Qwen3-14B-Instruct",
9191
)
9292

9393
await model.register(backend)

docs/getting-started/notebooks.mdx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ icon: "book"
99

1010
| Agent Task | Notebook | Description | Performance |
1111
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
12-
| **ART•E [Serverless]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen 2.5 14B learns to search emails using RULER | <a href="https://github.com/OpenPipe/ART/blob/main/dev/art-e/art_e/evaluate/display_benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
13-
| **2048 [Serverless]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen 2.5 14B learns to play 2048 | <a href="https://github.com/OpenPipe/ART/blob/main/examples/2048/display_benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
14-
| **ART•E LangGraph** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
15-
| **MCP•RL** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
16-
| **Temporal Clue** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |
17-
| **Tic Tac Toe** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen 2.5 3B learns to play Tic Tac Toe | <a href="https://github.com/OpenPipe/ART/blob/main/examples/tic_tac_toe/display-benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
18-
| **Codenames** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen 2.5 3B learns to play Codenames | <a href="https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" width="72" style={{margin: "0"}} /></a> |
19-
| **AutoRL [RULER]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/auto_rl.ipynb) | Train Qwen 2.5 7B to master any task | [Link coming soon] |
12+
| **ART•E [Serverless]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb) | Qwen3 14B learns to search emails using RULER | <a href="https://github.com/OpenPipe/ART/blob/main/dev/art-e/art_e/evaluate/display_benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
13+
| **2048 [Serverless]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) | Qwen3 14B learns to play 2048 | <a href="https://github.com/OpenPipe/ART/blob/main/examples/2048/display_benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
14+
| **ART•E LangGraph** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) | Qwen2.5 7B learns to search emails using LangGraph | [Link coming soon] |
15+
| **MCP•RL** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb) | Qwen2.5 3B masters the NWS MCP server | [Link coming soon] |
16+
| **Temporal Clue** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb) | Qwen2.5 7B learns to solve Temporal Clue | [Link coming soon] |
17+
| **Tic Tac Toe** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb) | Qwen2.5 3B learns to play Tic Tac Toe | <a href="https://github.com/OpenPipe/ART/blob/main/examples/tic_tac_toe/display-benchmarks.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" width="72" style={{margin: "0"}} /></a> |
18+
| **Codenames** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb) | Qwen2.5 3B learns to play Codenames | <a href="https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb"><img src="https://github.com/OpenPipe/ART/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" width="72" style={{margin: "0"}} /></a> |
19+
| **AutoRL [RULER]** | [🏋️&nbsp;Train&nbsp;agent](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/auto_rl.ipynb) | Train Qwen2.5 7B to master any task | [Link coming soon] |
2020

2121
</div>

docs/getting-started/quick-start.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: "Get started with ART in a few quick steps."
44
icon: "forward"
55
---
66

7-
In this Quick Start tutorial, we'll be training Qwen 2.5 14B to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills.
7+
In this Quick Start tutorial, we'll be training Qwen3 14B Instruct to play [2048](https://play2048.co/), a simple game that requires forward planning and basic math skills.
88

99
<Info>
1010

docs/resources/models.mdx

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,21 @@ description: "Train open source models on ART."
55
icon: "robot"
66
---
77

8-
## Recommended Models
8+
## Serverless Models
99

10-
- [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
10+
We currently only support the following model for serverless training. We are actively adding support for both larger and smaller models. If there's a particular model you'd like to see serverless support for, please send a request to [email protected].
11+
12+
- [OpenPipe Qwen 3 14B Instruct](https://huggingface.co/OpenPipe/Qwen3-14B-Instruct)
1113
- Good balance of performance and size. Has support for tool calling and generally trains well. This is our recommended model for users new to RL.
12-
- [Qwen 2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
14+
15+
16+
## Recommended Local Models
17+
18+
If you're developing locally or in your own hardware, here are a couple other models you could try in addition to the recommended serverless list.
19+
20+
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
1321
- Less capable than 14B, but smaller and faster
14-
- [Qwen 2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
22+
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
1523
- More capable than 14B, but larger and slower
1624

1725
## More Models
@@ -24,7 +32,7 @@ Here are additional models that we've tested and found to work well with ART:
2432
- [Llama 3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
2533
- [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
2634
- [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
27-
- [Qwen 2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
35+
- [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
2836
- Additionally, the [Qwen 3](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) family of models is well supported for single-turn workflows. For multi-turn workflows the Qwen 3 chat template removes the `<think>` tokens from previous turns, which makes training more complicated. It is still possible to use for multi-turn workflows by splitting each turn into a separate message history with our `additional_histories` trajectory parameter (see [Additional Histories](/features/additional-histories)).
2937

3038
If you're curious about a model that is not listed above, ask in the Discord [#support](https://discord.com/channels/1359674493949448375/1359674622965973185) channel.

0 commit comments

Comments
 (0)