Skip to content

Commit 6e61bab

Browse files
open-swe[bot]open-swebracesproul
authored
feat: Update default model to Claude Sonnet 4.5 (#848)
* Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Apply patch [skip ci] * Empty commit to trigger CI * make it opus 4.5 --------- Co-authored-by: open-swe[bot] <[email protected]> Co-authored-by: bracesproul <[email protected]>
1 parent bfb2ab7 commit 6e61bab

File tree

9 files changed

+66
-45
lines changed

9 files changed

+66
-45
lines changed

apps/docs/faq.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: Frequently Asked Questions
66
<Accordion title="How much does an end to end Open SWE run cost?">
77
The cost per run varies greatly based on the complexity of the task, the size of the repository, and the number of files that need to be changed.
88

9-
For most tasks, you can expect to pay between `$0.50` -> `$3.00` when using Claude Sonnet 4.
9+
For most tasks, you can expect to pay between `$0.50` -> `$3.00` when using Claude Opus 4.5.
1010
For the same tasks running on Claude Opus 4/4.1, you can expect to pay between `$1.50` -> `$9.00`.
1111

1212
Always remember to monitor your runs if you're cost conscious. The most expensive run I've seen Open SWE complete was ~50M Opus 4 tokens, costing `$25.00`.
@@ -53,3 +53,4 @@ description: Frequently Asked Questions
5353
<Accordion title="Can I contribute to Open SWE?">
5454
Yes! We're always looking for contributors to help us improve Open SWE. Feel free to pick up an [open issue](https://github.com/langchain-ai/open-swe/issues) or submit a pull request with a new feature or bug fix.
5555
</Accordion>
56+

apps/docs/usage/best-practices.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Submit separate requests for different features or fixes. This allows Open SWE t
3939

4040
## Model Selection
4141

42-
- **Claude Sonnet 4 (Default)**: The default model for planning, writing code, and reviewing changes. This model offers the best balance of performance, speed and cost.
42+
- **Claude Opus 4.5 (Default)**: The default model for planning, writing code, and reviewing changes. This model offers the best balance of performance, speed and cost.
4343
- **Claude Opus 4.1**: A larger, more powerful model for difficult, or open-ended tasks. Opus 4.1 is more expensive and slower, but will provide better results for complex tasks.
4444

4545
### Avoid Other Models
@@ -50,7 +50,7 @@ Although Open SWE allows you to select any model from Anthropic, OpenAI and Goog
5050

5151
### `open-swe` vs `open-swe-max`
5252

53-
**`open-swe`**: Uses Claude Sonnet 4
53+
**`open-swe`**: Uses Claude Opus 4.5
5454

5555
- Suitable for most development tasks
5656
- Faster execution
@@ -81,3 +81,5 @@ If you're running Open SWE against an open-ended or very complex task, you may w
8181
In development environments, append `-dev` to all labels (e.g.,
8282
`open-swe-dev`, `open-swe-auto-dev`).
8383
</Note>
84+
85+

apps/open-swe/src/graphs/manager/nodes/classify-message/prompts.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,10 +71,10 @@ Your source code is available in the GitHub repository: https://github.com/langc
7171
The website you're accessible through is: https://swe.langchain.com
7272
Your documentation is available at: https://docs.langchain.com/labs/swe
7373
You can be invoked by both the web app, or by adding a label to a GitHub issue. These label options are:
74-
- \`open-swe\` - trigger a standard Open SWE task. It will interrupt after generating a plan, and the user must approve it before it can continue. Uses Claude Sonnet 4 for all LLM requests.
75-
- \`open-swe-auto\` - trigger an 'auto' Open SWE task. It will not interrupt after generating a plan, and instead it will auto-approve the plan, and continue to the programming step without user approval. Uses Claude Sonnet 4 for all LLM requests.
76-
- \`open-swe-max\` - this label acts the same as \`open-swe\`, except it uses a larger, more powerful model for the planning and programming steps: Claude Opus 4.1. It still uses Claude Sonnet 4 for the reviewer step.
77-
- \`open-swe-max-auto\` - this label acts the same as \`open-swe-auto\`, except it uses a larger, more powerful model for the planning and programming steps: Claude Opus 4.1. It still uses Claude Sonnet 4 for the reviewer step.
74+
- \`open-swe\` - trigger a standard Open SWE task. It will interrupt after generating a plan, and the user must approve it before it can continue. Uses Claude Opus 4.5 for all LLM requests.
75+
- \`open-swe-auto\` - trigger an 'auto' Open SWE task. It will not interrupt after generating a plan, and instead it will auto-approve the plan, and continue to the programming step without user approval. Uses Claude Opus 4.5 for all LLM requests.
76+
- \`open-swe-max\` - this label acts the same as \`open-swe\`, except it uses a larger, more powerful model for the planning and programming steps: Claude Opus 4.1. It still uses Claude Opus 4.5 for the reviewer step.
77+
- \`open-swe-max-auto\` - this label acts the same as \`open-swe-auto\`, except it uses a larger, more powerful model for the planning and programming steps: Claude Opus 4.1. It still uses Claude Opus 4.5 for the reviewer step.
7878
7979
Only provide this information if requested by the user.
8080
For example, if the user asks what you can do, you should provide the above information in your response.

apps/open-swe/src/utils/llms/model-manager.ts

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -379,23 +379,23 @@ export class ModelManager {
379379
): ModelLoadConfig | null {
380380
const defaultModels: Record<Provider, Record<LLMTask, string>> = {
381381
anthropic: {
382-
[LLMTask.PLANNER]: "claude-sonnet-4-0",
383-
[LLMTask.PROGRAMMER]: "claude-sonnet-4-0",
384-
[LLMTask.REVIEWER]: "claude-sonnet-4-0",
385-
[LLMTask.ROUTER]: "claude-3-5-haiku-latest",
386-
[LLMTask.SUMMARIZER]: "claude-sonnet-4-0",
382+
[LLMTask.PLANNER]: "claude-opus-4-5",
383+
[LLMTask.PROGRAMMER]: "claude-opus-4-5",
384+
[LLMTask.REVIEWER]: "claude-opus-4-5",
385+
[LLMTask.ROUTER]: "claude-haiku-4-5-latest",
386+
[LLMTask.SUMMARIZER]: "claude-opus-4-5",
387387
},
388388
"google-genai": {
389-
[LLMTask.PLANNER]: "gemini-2.5-flash",
390-
[LLMTask.PROGRAMMER]: "gemini-2.5-pro",
391-
[LLMTask.REVIEWER]: "gemini-2.5-flash",
392-
[LLMTask.ROUTER]: "gemini-2.5-flash",
393-
[LLMTask.SUMMARIZER]: "gemini-2.5-pro",
389+
[LLMTask.PLANNER]: "gemini-3-pro-preview",
390+
[LLMTask.PROGRAMMER]: "gemini-3-pro-preview",
391+
[LLMTask.REVIEWER]: "gemini-flash-latest",
392+
[LLMTask.ROUTER]: "gemini-flash-latest",
393+
[LLMTask.SUMMARIZER]: "gemini-3-pro-preview",
394394
},
395395
openai: {
396-
[LLMTask.PLANNER]: "gpt-5",
397-
[LLMTask.PROGRAMMER]: "gpt-5",
398-
[LLMTask.REVIEWER]: "gpt-5",
396+
[LLMTask.PLANNER]: "gpt-5-codex",
397+
[LLMTask.PROGRAMMER]: "gpt-5-codex",
398+
[LLMTask.REVIEWER]: "gpt-5-codex",
399399
[LLMTask.ROUTER]: "gpt-5-nano",
400400
[LLMTask.SUMMARIZER]: "gpt-5-mini",
401401
},

apps/web/src/components/v2/token-usage.tsx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,16 @@ function getModelPricingPlaceholder(model: string): {
8585
outputPrice: 15.0,
8686
cachePrice: 0.3,
8787
},
88+
"anthropic:claude-sonnet-4-5": {
89+
inputPrice: 3.0,
90+
outputPrice: 15.0,
91+
cachePrice: 0.3,
92+
},
93+
"anthropic:claude-opus-4-5": {
94+
inputPrice: 5.0,
95+
outputPrice: 25.0,
96+
cachePrice: 0.3,
97+
},
8898

8999
// Claude 3.7 models
90100
"anthropic:claude-3-7-sonnet": {

packages/shared/src/__tests__/caching.test.ts

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ describe("tokenDataReducer", () => {
55
it("should merge objects with the same model string", () => {
66
const state: ModelTokenData[] = [
77
{
8-
model: "anthropic:claude-sonnet-4-0",
8+
model: "anthropic:claude-sonnet-4-5",
99
cacheCreationInputTokens: 100,
1010
cacheReadInputTokens: 50,
1111
inputTokens: 200,
@@ -22,7 +22,7 @@ describe("tokenDataReducer", () => {
2222

2323
const update: ModelTokenData[] = [
2424
{
25-
model: "anthropic:claude-sonnet-4-0",
25+
model: "anthropic:claude-sonnet-4-5",
2626
cacheCreationInputTokens: 25,
2727
cacheReadInputTokens: 15,
2828
inputTokens: 75,
@@ -44,10 +44,10 @@ describe("tokenDataReducer", () => {
4444

4545
// Find the merged anthropic model
4646
const mergedAnthropic = result.find(
47-
(data) => data.model === "anthropic:claude-sonnet-4-0",
47+
(data) => data.model === "anthropic:claude-sonnet-4-5",
4848
);
4949
expect(mergedAnthropic).toEqual({
50-
model: "anthropic:claude-sonnet-4-0",
50+
model: "anthropic:claude-sonnet-4-5",
5151
cacheCreationInputTokens: 125, // 100 + 25
5252
cacheReadInputTokens: 65, // 50 + 15
5353
inputTokens: 275, // 200 + 75
@@ -82,7 +82,7 @@ describe("tokenDataReducer", () => {
8282
it("should return update array when state is undefined", () => {
8383
const update: ModelTokenData[] = [
8484
{
85-
model: "anthropic:claude-sonnet-4-0",
85+
model: "anthropic:claude-sonnet-4-5",
8686
cacheCreationInputTokens: 100,
8787
cacheReadInputTokens: 50,
8888
inputTokens: 200,
@@ -98,7 +98,7 @@ describe("tokenDataReducer", () => {
9898
it("should handle empty update array", () => {
9999
const state: ModelTokenData[] = [
100100
{
101-
model: "anthropic:claude-sonnet-4-0",
101+
model: "anthropic:claude-sonnet-4-5",
102102
cacheCreationInputTokens: 100,
103103
cacheReadInputTokens: 50,
104104
inputTokens: 200,
@@ -114,7 +114,7 @@ describe("tokenDataReducer", () => {
114114
it("should handle multiple updates for the same model", () => {
115115
const state: ModelTokenData[] = [
116116
{
117-
model: "anthropic:claude-sonnet-4-0",
117+
model: "anthropic:claude-sonnet-4-5",
118118
cacheCreationInputTokens: 100,
119119
cacheReadInputTokens: 50,
120120
inputTokens: 200,
@@ -124,14 +124,14 @@ describe("tokenDataReducer", () => {
124124

125125
const update: ModelTokenData[] = [
126126
{
127-
model: "anthropic:claude-sonnet-4-0",
127+
model: "anthropic:claude-sonnet-4-5",
128128
cacheCreationInputTokens: 25,
129129
cacheReadInputTokens: 15,
130130
inputTokens: 75,
131131
outputTokens: 60,
132132
},
133133
{
134-
model: "anthropic:claude-sonnet-4-0",
134+
model: "anthropic:claude-sonnet-4-5",
135135
cacheCreationInputTokens: 10,
136136
cacheReadInputTokens: 5,
137137
inputTokens: 30,
@@ -143,7 +143,7 @@ describe("tokenDataReducer", () => {
143143

144144
expect(result).toHaveLength(1);
145145
expect(result[0]).toEqual({
146-
model: "anthropic:claude-sonnet-4-0",
146+
model: "anthropic:claude-sonnet-4-5",
147147
cacheCreationInputTokens: 135, // 100 + 25 + 10
148148
cacheReadInputTokens: 70, // 50 + 15 + 5
149149
inputTokens: 305, // 200 + 75 + 30

packages/shared/src/open-swe/llm-task.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,23 +29,23 @@ export enum LLMTask {
2929

3030
export const TASK_TO_CONFIG_DEFAULTS_MAP = {
3131
[LLMTask.PLANNER]: {
32-
modelName: "anthropic:claude-sonnet-4-0",
32+
modelName: "anthropic:claude-opus-4-5",
3333
temperature: 0,
3434
},
3535
[LLMTask.PROGRAMMER]: {
36-
modelName: "anthropic:claude-sonnet-4-0",
36+
modelName: "anthropic:claude-opus-4-5",
3737
temperature: 0,
3838
},
3939
[LLMTask.REVIEWER]: {
40-
modelName: "anthropic:claude-sonnet-4-0",
40+
modelName: "anthropic:claude-opus-4-5",
4141
temperature: 0,
4242
},
4343
[LLMTask.ROUTER]: {
44-
modelName: "anthropic:claude-3-5-haiku-latest",
44+
modelName: "anthropic:claude-haiku-4-5",
4545
temperature: 0,
4646
},
4747
[LLMTask.SUMMARIZER]: {
48-
modelName: "anthropic:claude-3-5-haiku-latest",
48+
modelName: "anthropic:claude-haiku-4-5",
4949
temperature: 0,
5050
},
5151
};

packages/shared/src/open-swe/models.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,18 @@ export const MODEL_OPTIONS = [
88
// label: "Claude Opus 4 (Extended Thinking)",
99
// value: "anthropic:extended-thinking:claude-opus-4-0",
1010
// },
11+
{
12+
label: "Claude Sonnet 4.5",
13+
value: "anthropic:claude-sonnet-4-5",
14+
},
1115
{
1216
label: "Claude Sonnet 4",
1317
value: "anthropic:claude-sonnet-4-0",
1418
},
19+
{
20+
label: "Claude Opus 4.5",
21+
value: "anthropic:claude-opus-4-5",
22+
},
1523
{
1624
label: "Claude Opus 4.1",
1725
value: "anthropic:claude-opus-4-1",

packages/shared/src/open-swe/types.ts

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,7 @@ export const GraphConfigurationMetadata: {
323323
plannerModelName: {
324324
x_open_swe_ui_config: {
325325
type: "select",
326-
default: "anthropic:claude-sonnet-4-0",
326+
default: "anthropic:claude-opus-4-5",
327327
description:
328328
"The model to use for planning tasks. This model should be very good at generating code, and have strong context understanding and reasoning capabilities. It will be used for the most complex tasks throughout the agent.",
329329
options: MODEL_OPTIONS_NO_THINKING,
@@ -342,7 +342,7 @@ export const GraphConfigurationMetadata: {
342342
programmerModelName: {
343343
x_open_swe_ui_config: {
344344
type: "select",
345-
default: "anthropic:claude-sonnet-4-0",
345+
default: "anthropic:claude-opus-4-5",
346346
description:
347347
"The model to use for programming/other advanced technical tasks. This model should be very good at generating code, and have strong context understanding and reasoning capabilities. It will be used for the most complex tasks throughout the agent.",
348348
options: MODEL_OPTIONS_NO_THINKING,
@@ -361,7 +361,7 @@ export const GraphConfigurationMetadata: {
361361
reviewerModelName: {
362362
x_open_swe_ui_config: {
363363
type: "select",
364-
default: "anthropic:claude-sonnet-4-0",
364+
default: "anthropic:claude-opus-4-5",
365365
description:
366366
"The model to use for reviewer tasks. This model should be very good at generating code, and have strong context understanding and reasoning capabilities. It will be used for the most complex tasks throughout the agent.",
367367
options: MODEL_OPTIONS_NO_THINKING,
@@ -380,7 +380,7 @@ export const GraphConfigurationMetadata: {
380380
routerModelName: {
381381
x_open_swe_ui_config: {
382382
type: "select",
383-
default: "anthropic:claude-3-5-haiku-latest",
383+
default: "anthropic:claude-haiku-4-5",
384384
description:
385385
"The model to use for routing tasks, and other simple generations. This model should be good at tool calling/structured output.",
386386
options: MODEL_OPTIONS,
@@ -399,7 +399,7 @@ export const GraphConfigurationMetadata: {
399399
summarizerModelName: {
400400
x_open_swe_ui_config: {
401401
type: "select",
402-
default: "anthropic:claude-sonnet-4-0",
402+
default: "anthropic:claude-opus-4-5",
403403
description:
404404
"The model to use for summarizing the conversation history, or extracting key context from large inputs. This model should have strong context retention/understanding capabilities, and should be good at tool calling/structured output.",
405405
options: MODEL_OPTIONS_NO_THINKING,
@@ -537,7 +537,7 @@ export const GraphConfiguration = z.object({
537537

538538
/**
539539
* The model ID to use for programming/other advanced technical tasks.
540-
* @default "anthropic:claude-sonnet-4-0"
540+
* @default "anthropic:claude-opus-4-5"
541541
*/
542542
plannerModelName: withLangGraph(z.string().optional(), {
543543
metadata: GraphConfigurationMetadata.plannerModelName,
@@ -552,7 +552,7 @@ export const GraphConfiguration = z.object({
552552

553553
/**
554554
* The model ID to use for programming/other advanced technical tasks.
555-
* @default "anthropic:claude-sonnet-4-0"
555+
* @default "anthropic:claude-opus-4-5"
556556
*/
557557
programmerModelName: withLangGraph(z.string().optional(), {
558558
metadata: GraphConfigurationMetadata.programmerModelName,
@@ -567,7 +567,7 @@ export const GraphConfiguration = z.object({
567567

568568
/**
569569
* The model ID to use for programming/other advanced technical tasks.
570-
* @default "anthropic:claude-sonnet-4-0"
570+
* @default "anthropic:claude-opus-4-5"
571571
*/
572572
reviewerModelName: withLangGraph(z.string().optional(), {
573573
metadata: GraphConfigurationMetadata.reviewerModelName,
@@ -582,7 +582,7 @@ export const GraphConfiguration = z.object({
582582

583583
/**
584584
* The model ID to use for routing tasks.
585-
* @default "anthropic:claude-3-5-haiku-latest"
585+
* @default "anthropic:claude-haiku-4-5"
586586
*/
587587
routerModelName: withLangGraph(z.string().optional(), {
588588
metadata: GraphConfigurationMetadata.routerModelName,
@@ -597,7 +597,7 @@ export const GraphConfiguration = z.object({
597597

598598
/**
599599
* The model ID to use for summarizing the conversation history, or extracting key context from large inputs.
600-
* @default "anthropic:claude-sonnet-4-0"
600+
* @default "anthropic:claude-opus-4-5"
601601
*/
602602
summarizerModelName: withLangGraph(z.string().optional(), {
603603
metadata: GraphConfigurationMetadata.summarizerModelName,

0 commit comments

Comments
 (0)