feat: Add browser session tools (Tetra + CDP + semantic queries)#35
feat: Add browser session tools (Tetra + CDP + semantic queries)#35paveldudka wants to merge 3 commits intomainfrom
Conversation
…ery_elements) - Add Tetra cloud browser integration via AgentQL SDK + Playwright - New tools: create_session, close_session, query_data, query_elements - Existing extract-web-data tool preserved (stateless REST API) - Migrate from deprecated Server class to McpServer.registerTool() - Stateful in-memory session management with CDP connection holding - query_elements returns actionable CSS selectors for CDP interaction - Clear tool descriptions guide agents on when to use each tool - Bump version to 2.0.0
📝 WalkthroughWalkthroughThis pull request introduces browser automation and web data extraction capabilities to the MCP Server. It adds a new session management module that enables persistent browser sessions via Playwright and AgentQL, supporting both stateless REST-based extraction and interactive session-based queries. The entry point is refactored to register five new tools: extract-web-data, create_session, close_session, query_data, and query_elements. Dependencies are updated to include agentql and playwright, and the package version is bumped to 2.0.0. Documentation is expanded with installation instructions for multiple IDEs, architecture details, and development guidance. Sequence Diagram(s)sequenceDiagram
participant Client as MCP Client
participant Server as MCP Server
participant SessionMgr as Session Manager
participant Browser as Playwright Browser
participant AgentQL as AgentQL Service
rect rgba(100, 200, 150, 0.5)
Note over Client,AgentQL: Create Browser Session Flow
Client->>Server: create_session(options)
Server->>SessionMgr: createSession(params)
SessionMgr->>AgentQL: Create remote session
AgentQL-->>SessionMgr: CDP URL + Streaming URL
SessionMgr->>Browser: Connect via CDP
Browser-->>SessionMgr: Browser instance ready
SessionMgr-->>Server: {session_id, cdp_url}
Server-->>Client: Session created
end
rect rgba(100, 150, 200, 0.5)
Note over Client,AgentQL: Query Data from Session
Client->>Server: query_data(session_id, query)
Server->>SessionMgr: queryData(sessionId, query)
SessionMgr->>Browser: Get page from context
Browser-->>SessionMgr: Page instance
SessionMgr->>AgentQL: Execute semantic query
AgentQL-->>SessionMgr: Query results
SessionMgr-->>Server: {data: result}
Server-->>Client: Query results
end
rect rgba(200, 150, 100, 0.5)
Note over Client,AgentQL: Close Session
Client->>Server: close_session(session_id)
Server->>SessionMgr: closeSession(sessionId)
SessionMgr->>Browser: Close browser
Browser-->>SessionMgr: Closed
SessionMgr->>AgentQL: Cleanup session
SessionMgr-->>Server: {ok: true}
Server-->>Client: Session closed
end
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
README.md (2)
19-29: Add language specifier to fenced code block.This ASCII diagram code block is missing a language specifier. Use
textorplaintextto satisfy markdown linting rules.📝 Proposed fix
-``` +```text "I need web data"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 19 - 29, The fenced ASCII diagram starting with the line "I need web data" is missing a language tag; open the code fence that currently begins with ``` and add a language specifier such as text or plaintext (e.g., change ``` to ```text) so the diagram satisfies Markdown linting rules and renders as plain text—update the code block that contains the diagram lines including "Single URL, no interaction needed?" and the branches to extract-web-data / create_session accordingly.
180-199: Add language specifier to architecture diagram code block.Same issue as the decision flow diagram — add
textorplaintextfor markdown lint compliance.📝 Proposed fix
-``` +```text ┌──────────────────────────────────────────────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 180 - 199, The fenced ASCII architecture diagram block in README.md lacks a language specifier; update the opening triple-backtick for that diagram (the block containing the Agent / MCP Server / Tetra Cloud Browser ASCII art) to include a language tag such as text or plaintext (e.g., ```text) so markdown linters accept it and rendering stays unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@README.md`:
- Around line 19-29: The fenced ASCII diagram starting with the line "I need web
data" is missing a language tag; open the code fence that currently begins with
``` and add a language specifier such as text or plaintext (e.g., change ``` to
```text) so the diagram satisfies Markdown linting rules and renders as plain
text—update the code block that contains the diagram lines including "Single
URL, no interaction needed?" and the branches to extract-web-data /
create_session accordingly.
- Around line 180-199: The fenced ASCII architecture diagram block in README.md
lacks a language specifier; update the opening triple-backtick for that diagram
(the block containing the Agent / MCP Server / Tetra Cloud Browser ASCII art) to
include a language tag such as text or plaintext (e.g., ```text) so markdown
linters accept it and rendering stays unchanged.
colriot
left a comment
There was a problem hiding this comment.
A few thoughts and questions to discuss:
- found elements can be used for clicks/interactions — who's gonna interact with them? Job for another MCP? Then we need to mention that and suggest options
- Did you observe cases when session is not closed after interaction? (minor thing, user can close themselves, but still)
- Naming — extract-web-data but query-data. I think we agreed to use extract-* (and get_elements) in all integrations, and even in general it just makes sense to have aligned naming within the same thing (plus underscores vs dashes)
- We have prompt and AQL inputs for our APIs. For the the old one we used prompt only, for new ones you are using AQL only. Wdyt about aligning these?
- Also, how reliable is AQL query generation from simple examples in this PR?
| ┌────────────────────────┐ ┌─────────────────────────┐ | ||
| │ MCP Server │ │ Tetra Cloud Browser │ | ||
| │ │ │ (Chromium instance) │ | ||
| │ extract-web-data ─REST─▶ AgentQL API │ |
There was a problem hiding this comment.
Smth is off with this line. What should it be pointing at?
| │ | ||
| Single URL, no interaction needed? | ||
| │ | ||
| YES │ NO |
There was a problem hiding this comment.
NO branch looks disconnected
found elements returned as a json that contains enough breadcrumbs for AI to dynamically build a Playwright locator. That path is for cases when AI is directly connected to browser over CDP. So instead of analyzing HTML, AI triggers
I haven't, but tetra currently has a guardrail to auto-terminate on inactivity (5 mins by default). Maybe worth adding guardrail in the MCP tool itself too, so we clean up browser from the memory map by timeout too. Ill update
underscore-vs-dashes - legit. Ill update.
that sounds like an extra MCP tool which we don't need at this point. I don't want to switch to prompt-only exclusively also - that will introduce an extra AI roundtrip (which we will be paying for) to generate a query to eventually call exactly the same query-data API. This is already an AI path, so why not generate query on user side? Will be much faster. Consistency argument in this context is overrated IMO. |
|
i see closeAll is called on shutdown - that covers the happy path. the risk is ungraceful exits (SIGKILL, OOM, MCP client crash) where closeAll never runs and sessions leak. if Tetra has idle timeouts on sessions this is probably fine in practice. if sessions are billed or have no TTL it's worth adding a SIGTERM/SIGINT handler or a heartbeat-based cleanup. |
| const tetraSession = await createBrowserSession(sessionOpts); | ||
| const cdpUrl = tetraSession.cdpUrl; | ||
|
|
||
| // Connect Playwright to hold the connection alive | ||
| const browser = await chromium.connectOverCDP(cdpUrl); | ||
|
|
||
| const sessionId = `sess_${++idCounter}_${Date.now()}`; | ||
| const streamingUrl = tetraSession.getPageStreamingUrl(0); | ||
|
|
||
| sessions.set(sessionId, { | ||
| sessionId, | ||
| cdpUrl, | ||
| streamingUrl, | ||
| browser, | ||
| tetraSession, | ||
| createdAt: new Date(), | ||
| }); | ||
|
|
||
| return { | ||
| session_id: sessionId, | ||
| cdp_url: cdpUrl, | ||
| streaming_url: streamingUrl, | ||
| }; |
There was a problem hiding this comment.
If connectOverCDP() or getPageStreamingUrl() throws after createBrowserSession() succeeds, I believe the cloud browser is leaked
we can wrap these lines 62-84 in try/catch, call await tetraSession.close() in the catch
| const entry = sessions.get(sessionId); | ||
| if (entry) { | ||
| try { | ||
| await entry.browser.close(); |
There was a problem hiding this comment.
browser.close() disconnects the local CDP client but never calls tetraSession.close(). Remote browsers may keep running and billing.
we should add await entry.tetraSession.close() before or after browser.close().
| await wrappedPage.queryElements(query, { | ||
| includeHidden: options.include_hidden ?? false, | ||
| mode: (options.mode ?? "fast") as "standard" | "fast", | ||
| }); |
There was a problem hiding this comment.
getLastResponse() race in queryElements here; two concurrent calls on the same page can return each other's selectors. Since selectors are used for clicks/typing, this is silent data corruption.
lets serialize queries per page, or use an API that returns results directly
| "node-fetch": "^3.3.2" | ||
| "agentql": "^1.17.0", | ||
| "node-fetch": "^3.3.2", | ||
| "playwright": "^1.58.2" |
There was a problem hiding this comment.
i think we can get away with playwright-core here. as far as i can tell the MCP server only uses connectOverCDP, the whole playwright lib downloads ~300MB of unused browser binaries
| proxy_url?: string; | ||
| } | ||
|
|
||
| const sessions = new Map<string, SessionEntry>(); |
There was a problem hiding this comment.
Unbounded Map, no cap, no reaper. A buggy agent loop can exhaust memory and Tetra resources - i think we want session limits or TTL
| "agentql": "^1.17.0", | ||
| "node-fetch": "^3.3.2", | ||
| "playwright": "^1.58.2" | ||
| }, |
| if (params.profile !== "stealth") { | ||
| sessionOpts.uaPreset = uaPreset; | ||
| } | ||
| if (params.proxy === "tetra") { |
There was a problem hiding this comment.
proxy: "custom" without proxy_url silently ignored: Tool description says required, code doesn't enforce it. Falls through to unproxied
| // Connect Playwright to hold the connection alive | ||
| const browser = await chromium.connectOverCDP(cdpUrl); | ||
|
|
||
| const sessionId = `sess_${++idCounter}_${Date.now()}`; |
There was a problem hiding this comment.
Sequential counter + Date.now() - why not yse crypto.randomUUID() ?
londondavila
left a comment
There was a problem hiding this comment.
left a bunch of suggestions @paveldudka . approving with notes + nits
Overview
Adds browser session management and semantic query tools to the MCP server, powered by AgentQL SDK + Tetra cloud browsers. The existing
extract-web-datatool is preserved — this PR extends the server from a stateless REST wrapper into a full browser automation platform.Version bump: 1.0.1 → 2.0.0 (new capabilities, API migration)
What Changed
New Tools
create_sessionclose_sessionquery_dataquery_elementsExisting Tool (preserved)
extract-web-dataMigration
Serverclass →McpServerwithregisterTool()(the oldserver.tool()and manualsetRequestHandlerare deprecated)agentqlandplaywrightas dependenciesArchitecture
Before: Stateless REST Wrapper
One tool, one call, done. No browser control.
After: Stateless REST + Full Browser Sessions
Session Lifecycle
When to Use What
Key Design Decisions
Dual CDP connections: MCP server holds one CDP connection (keeps browser alive, powers semantic queries). Agent holds another (for direct browsing). Either alone keeps the browser alive.
Stateful, in-memory: Session state lives in a
Map— no database, no persistence. The MCP server process lifecycle = session lifecycle. When the process dies, Tetra cleans up.Actionable selectors from
query_elements: Returns[tf623_id="..."]CSS selectors that agents use directly over CDP. No Playwright locators cross the MCP boundary.Clear tool descriptions: Each tool explains when to use it vs alternatives, with AgentQL query syntax guide and examples baked into the description.
Testing
Verified end-to-end:
extract-web-data→ HN top stories ✅create_session→ Tetra browser provisioned ✅query_data→ pricing data extracted ✅query_elements→ selectors returned, used to click buttons ✅close_session→ cleanup ✅