Skip to content

Commit 5a3c6f8

Browse files
zealoushackerclaudebassil-anthropicbrigitanthropicparisac-ant
authored
Adds zoom tool for Opus 4.5 (#309)
* Add browser automation tool from anthropics/anthropic#148306 Implements a dedicated browser tool for web automation tasks as an alternative to full computer use. The browser tool provides specialized actions for navigating, clicking, typing, and scrolling in Firefox. Key features: - New BrowserTool20250910 with browser-specific actions - Auto-launch Firefox if not running - Graceful browser close functionality - Model mapping for bobcat models to use browser mode - Browser-specific system prompt and UI updates Changes: - Add computer_use_demo/tools/browser.py with full browser automation - Add browser_use_20250910 tool version to groups.py - Map bobcat-latest/bobcat-v17-prod to browser mode in streamlit.py - Add BROWSER_SYSTEM_PROMPT for browser-specific instructions - Fix session state handling and base64 encoding issues - Add comprehensive browser tool tests Original implementation by benkomalo, sagnik, and brigit 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix browser tool initialization and improve startup reliability - Fix session state initialization for tool_versions to prevent AttributeError - Improve browser window detection using xdotool instead of process check - Add window focus activation to ensure browser is ready for commands - Add polling mechanism to wait for Firefox window (up to 15s) - Add dynamic page title based on tool mode (browser vs computer use) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * save but screenshot too big * before going for everything * actual multi-session * everything works * refactor * Add browser-use-demo quickstart for Browser Use API Create dedicated quickstart for Browser Use API demonstration using Playwright with Chromium. Runs fully containerized for security and isolation. - Separate quickstart focused solely on browser automation (split from computer-use-demo) - Container-based Playwright Chrome browser for secure execution - Streamlit UI with inline action display showing tool usage - Support for Claude 4+ models with browser_use capability - Port 8080 for main UI, 6080 for NoVNC browser view Based on initial implementation by @bassil 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Update model name from bobcat to claude-boucle-eap * Remove browser functionality from computer-use-demo Browser functionality has been moved to its own dedicated quickstart (browser-use-demo). This keeps computer-use-demo focused solely on computer use capabilities. * Remove remaining browser-related files from computer-use-demo Remove README_LOCAL.md, run_local.py, and setup.py as these were for local browser mode which has been moved to browser-use-demo * Improve browser resolution configuration and security - Set default resolution to 1920x1080 for better modern web compatibility - Add environment-based configuration via .env file for all settings - Remove bind mounts in favor of Docker watch for better security - Add validation script to ensure proper configuration at startup - Create docker-compose.yml for easier deployment - Update documentation with new setup instructions The container now fails fast with helpful error messages if not properly configured, and uses Docker's watch feature for development instead of bind mounts to prevent container from modifying host files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Rename browser-use-demo to browser-tools-api-demo Changes: - Renamed directory from browser-use-demo to browser-tools-api-demo - Updated Python module from browser_use_demo to browser_tools_api_demo - Changed container user from browseruse to browsertoolsapi - Updated all references in documentation, Docker configs, and code - Removed docker run instructions in favor of docker-compose - Updated window titles and demo names throughout 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Change default model to claude-boucle-eap in browser-tools-api-demo 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Update README.md updated readme with additional instructions * Create browser-tools-api.md Doc explaining difference between browser tools api & computer use * Rename browser-tools-api.md to browser_tools_api.md renaming * Update browser API beta flag from browser-use to browser-tools - Change API header from browser-use-2025-09-10 to browser-tools-2025-09-10 - Rename constant BROWSER_USE_BETA_FLAG to BROWSER_TOOLS_BETA_FLAG 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add Browser Tools API Demo to root README Added the Browser Tools API Demo quickstart to the main README with the same format as existing demos. This provides a complete reference implementation for browser automation using Claude's browser tools API. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix Xvfb startup issue with stale lock files in browser-tools-api-demo When docker-compose stops and restarts the container, Xvfb lock files persist, causing the startup script to incorrectly assume Xvfb is running. This leads to tint2 failing with "could not open display!" error. The fix enhances xvfb_startup.sh to: - Check if the display is actually accessible (not just if lock exists) - Clean up stale lock files and sockets when display is inaccessible - Start Xvfb fresh when needed This makes the container startup idempotent and resolves the restart issue. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Update browser_tools_api.md Fixed typos & added safety considerations section * Update README.md Add safety section, made browser name consistent "browser tool API" and cleaned up typos * Update browser_tools_api.md Changed "browser tool" --> "browser tools" * Update README.md "Browser tool" --> "Browser tools" * Update README.md Small edits to security considerations * Update browser_tools_api.md Small changes to safety section * Update README.md typo fix * Add Playwright attribution and legal notices Add required attribution for Microsoft Playwright components used in the browser-tools-api-demo: - Add NOTICE file with Playwright attribution - Add modification headers to files derived from Playwright source (browser_dom_script.js, browser_element_script.js, browser.py) - Update README with reference to NOTICE file 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add CHANGELOG.md and update attribution headers - Create CHANGELOG.md to track modification dates and details centrally - Update file headers to reference CHANGELOG.md instead of inline dates - Update NOTICE to reference CHANGELOG.md This allows easier maintenance of modification history going forward. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add legal guidance to CLAUDE.md Add requirement to track copyright notice modifications in CHANGELOG.md files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix browser state continuity and click targeting Previously, when issuing follow-up instructions, the browser would restart from the beginning instead of continuing from the current state. Additionally, clicks were sometimes missing their targets due to viewport sizing issues. Issues fixed: 1. Message history not being preserved - streamlit was stripping out the full conversation context when preparing API messages 2. Event loop incompatibility - using asyncio.run() for each turn created a new event loop, breaking Playwright's browser instance 3. Browser window/viewport sizing causing click coordinate misalignment Changes: - Preserve full message history in streamlit.py to maintain conversation context - Implement persistent event loop to keep browser instance alive across turns - Integrate screenshots with user messages to show current browser state - Enable screenshot filtering (keep 3 most recent) to manage context size - Remove event loop reset logic in browser.py that was causing browser restarts - Fix browser window sizing and viewport configuration for accurate click targeting - Add helpful debug logging for browser state tracking The browser now successfully maintains state across multiple requests, continues from where it left off, and clicks work reliably. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix claude-sonnet-4-5-20250929 tool compatibility - Add EditTool20250728 with text_editor_20250728 API type and str_replace_based_edit_tool name - Update computer_use_20250124 to use EditTool20250728 instead of EditTool20250124 - Add CLAUDE_4_5 model configuration using computer_use_20250124 tool version - Update model mapping to use CLAUDE_4_5 for claude-sonnet-4-5-20250929 - Set claude-sonnet-4-5-20250929 as the default model This resolves the "does not support tool types: text_editor_20250124" error by using the official tool types supported by the model: bash_20250124, computer_20250124, text_editor_20250728. Uses official computer_use_20250124 tool version instead of inventing a new one. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Update README to reflect Claude 4.5 Sonnet as default model Updates the default model reference from claude-sonnet-4-20250514 to claude-sonnet-4-5-20250929 in the README note section. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * remove unused import * Add support for a new zoom action in computer_use_20251124 This overhauls a bunch of things, including: - adds the new `computer_use_20251124` variant, with the zoom action - removes a bunch of older models that are deprecated - adds a bunch of newer models that were missing - replaces the buggy text editor 0429 with the correct 0728 version --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Bassil Shama <[email protected]> Co-authored-by: brigitanthropic <[email protected]> Co-authored-by: Alex Paris <[email protected]> Co-authored-by: Ben Komalo <[email protected]>
1 parent 9bcc95e commit 5a3c6f8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+4164
-111
lines changed

CLAUDE.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Claude Quickstarts Development Guide
22

3+
## Legal
4+
5+
- When changes are made to files that have a copyright notice add them to that subdirectory's CHANGELOG.md file.
6+
37
## Computer-Use Demo
48

59
### Setup & Development
@@ -55,4 +59,4 @@
5559
- **TypeScript**: Strict mode with proper type definitions
5660
- **Components**: Function components with type annotations
5761
- **Visualization**: Use Recharts library for data visualization
58-
- **State management**: React hooks for state
62+
- **State management**: React hooks for state

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,16 @@ A financial data analyst powered by Claude. This project demonstrates how to lev
2222

2323
### Computer Use Demo
2424

25-
An environment and tools that Claude can use to control a desktop computer. This project demonstrates how to leverage the computer use capabilities of the new Claude 3.5 Sonnet model.
25+
An environment and tools that Claude can use to control a desktop computer. This project demonstrates how to leverage the computer use capabilities of Claude, including support for the latest `computer_use_20251124` tool version with zoom actions.
2626

2727
[Go to Computer Use Demo Quickstart](./computer-use-demo)
2828

29+
### Browser Tools API Demo
30+
31+
A complete reference implementation for browser automation powered by Claude. This project demonstrates how to leverage Claude's browser tools API for web interaction, including navigation, DOM inspection, and form manipulation using Playwright.
32+
33+
[Go to Browser Tools API Demo Quickstart](./browser-tools-api-demo)
34+
2935
## General Usage
3036

3137
Each quickstart project comes with its own README and setup instructions. Generally, you'll follow these steps:
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
**/__pycache__
2+
**/*.pyc
3+
**/*.pyo
4+
**/*.pyd
5+
.Python
6+
*.egg-info/
7+
.git/
8+
.gitignore
9+
*.md
10+
.DS_Store
11+
tests/
12+
*.log
13+
.vscode/
14+
.idea/
15+
*.swp
16+
*.swo
17+
*~
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Anthropic API Configuration
2+
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3+
4+
# Display Configuration
5+
DISPLAY_WIDTH=1920
6+
DISPLAY_HEIGHT=1080
7+
DISPLAY_NUM=1
8+
9+
# Browser Configuration
10+
BROWSER_WIDTH=1920
11+
BROWSER_HEIGHT=1080
12+
13+
# Port Configuration (optional - defaults shown)
14+
VNC_PORT=5900
15+
STREAMLIT_PORT=8501
16+
NOVNC_PORT=6080
17+
HTTP_PORT=8080

browser-tools-api-demo/.gitignore

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
env/
8+
venv/
9+
ENV/
10+
build/
11+
develop-eggs/
12+
dist/
13+
downloads/
14+
eggs/
15+
.eggs/
16+
lib/
17+
lib64/
18+
parts/
19+
sdist/
20+
var/
21+
wheels/
22+
*.egg-info/
23+
.installed.cfg
24+
*.egg
25+
26+
# IDE
27+
.vscode/
28+
.idea/
29+
*.swp
30+
*.swo
31+
*~
32+
33+
# OS
34+
.DS_Store
35+
Thumbs.db
36+
37+
# Logs
38+
*.log
39+
/tmp/
40+
41+
# Environment
42+
.env
43+
.anthropic/
44+
45+
# Test
46+
.pytest_cache/
47+
.coverage
48+
htmlcov/
49+
50+
# Streamlit
51+
.streamlit/cache/
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Modifications to Microsoft Playwright Source
2+
3+
This file tracks all modifications made to files derived from or inspired by Microsoft Playwright source code.
4+
5+
## Modified Files
6+
7+
### browser_tools_api_demo/browser_tool_utils/browser_dom_script.js
8+
- **Date Modified**: 9/23/25
9+
- **Original Source**: https://github.com/microsoft/playwright/blob/main/packages/injected/src/ariaSnapshot.ts
10+
- **Nature of Changes**: Adapted Playwright's accessibility tree generation for use with browser tools API. Implemented accessibility tree extraction with element reference tracking, visibility filtering, and YAML-formatted output.
11+
12+
### browser_tools_api_demo/browser_tool_utils/browser_element_script.js
13+
- **Date Modified**: 9/23/25
14+
- **Original Source**: Microsoft Playwright element interaction patterns
15+
- **Nature of Changes**: Implemented element finding and interaction logic inspired by Playwright's approach to reliable element targeting and coordinate calculation.
16+
17+
### browser_tools_api_demo/tools/browser.py
18+
- **Date Modified**: 9/23/25
19+
- **Original Source**: Microsoft Playwright click emulation implementation
20+
- **Nature of Changes**: Click emulation methods developed with reference to Playwright source code during debugging to ensure reliable mouse interactions.

browser-tools-api-demo/Dockerfile

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
FROM docker.io/ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
ENV DEBIAN_PRIORITY=high
5+
6+
# Install system dependencies
7+
RUN apt-get update && \
8+
apt-get -y upgrade && \
9+
apt-get -y install \
10+
# UI Requirements
11+
xvfb \
12+
xterm \
13+
xdotool \
14+
scrot \
15+
imagemagick \
16+
sudo \
17+
mutter \
18+
x11vnc \
19+
# Python/pyenv reqs
20+
build-essential \
21+
libssl-dev \
22+
zlib1g-dev \
23+
libbz2-dev \
24+
libreadline-dev \
25+
libsqlite3-dev \
26+
curl \
27+
git \
28+
libncursesw5-dev \
29+
xz-utils \
30+
tk-dev \
31+
libxml2-dev \
32+
libxmlsec1-dev \
33+
libffi-dev \
34+
liblzma-dev \
35+
# Network tools
36+
net-tools \
37+
netcat \
38+
# PPA req
39+
software-properties-common && \
40+
# Browser and desktop apps
41+
sudo apt-get install -y --no-install-recommends \
42+
x11-apps \
43+
tint2 \
44+
pcmanfm \
45+
unzip \
46+
# Playwright Chromium dependencies
47+
libnss3 \
48+
libnspr4 \
49+
libatk1.0-0 \
50+
libatk-bridge2.0-0 \
51+
libcups2 \
52+
libatspi2.0-0 \
53+
libxcomposite1 \
54+
libxdamage1 \
55+
libxrandr2 \
56+
libgbm1 \
57+
libxkbcommon0 \
58+
libpango-1.0-0 \
59+
libcairo2 \
60+
libasound2 && \
61+
apt-get clean
62+
63+
# Install noVNC
64+
RUN git clone --branch v1.5.0 https://github.com/novnc/noVNC.git /opt/noVNC && \
65+
git clone --branch v0.12.0 https://github.com/novnc/websockify /opt/noVNC/utils/websockify && \
66+
ln -s /opt/noVNC/vnc.html /opt/noVNC/index.html
67+
68+
# Setup user
69+
ENV USERNAME=browsertoolsapi
70+
ENV HOME=/home/$USERNAME
71+
RUN useradd -m -s /bin/bash -d $HOME $USERNAME
72+
RUN echo "${USERNAME} ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
73+
USER browsertoolsapi
74+
WORKDIR $HOME
75+
76+
# Setup Python
77+
RUN git clone https://github.com/pyenv/pyenv.git ~/.pyenv && \
78+
cd ~/.pyenv && src/configure && make -C src && cd .. && \
79+
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc && \
80+
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc && \
81+
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
82+
ENV PYENV_ROOT="$HOME/.pyenv"
83+
ENV PATH="$PYENV_ROOT/bin:$PATH"
84+
ENV PYENV_VERSION=3.11.6
85+
RUN eval "$(pyenv init -)" && \
86+
pyenv install $PYENV_VERSION && \
87+
pyenv global $PYENV_VERSION && \
88+
pyenv rehash
89+
90+
ENV PATH="$HOME/.pyenv/shims:$HOME/.pyenv/bin:$PATH"
91+
92+
RUN python -m pip install --upgrade pip==23.1.2 setuptools==58.0.4 wheel==0.40.0 && \
93+
python -m pip config set global.disable-pip-version-check true
94+
95+
# Copy requirements and install dependencies
96+
COPY --chown=$USERNAME:$USERNAME browser_tools_api_demo/requirements.txt $HOME/browser_tools_api_demo/requirements.txt
97+
RUN python -m pip install -r $HOME/browser_tools_api_demo/requirements.txt
98+
99+
# Install Playwright and Chromium
100+
RUN python -m playwright install chromium && \
101+
python -m playwright install-deps chromium
102+
103+
# Setup desktop environment & app
104+
COPY --chown=$USERNAME:$USERNAME image/ $HOME
105+
COPY --chown=$USERNAME:$USERNAME browser_tools_api_demo/ $HOME/browser_tools_api_demo/
106+
107+
ARG DISPLAY_NUM=1
108+
ARG HEIGHT=1080
109+
ARG WIDTH=1920
110+
ENV DISPLAY_NUM=${DISPLAY_NUM}
111+
ENV HEIGHT=${HEIGHT}
112+
ENV WIDTH=${WIDTH}
113+
114+
ENTRYPOINT [ "./entrypoint.sh" ]

browser-tools-api-demo/NOTICE

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
This software includes components from the following open source projects:
2+
3+
Microsoft Playwright
4+
Source: https://github.com/microsoft/playwright
5+
License: Apache License 2.0
6+
Copyright (c) Microsoft Corporation
7+
Modified files are marked with modification notices. See CHANGELOG.md for details.

0 commit comments

Comments
 (0)