Skip to content

feat: Add browser session tools (Tetra + CDP + semantic queries)#35

Open
paveldudka wants to merge 3 commits intomainfrom
feat/browser-sessions
Open

feat: Add browser session tools (Tetra + CDP + semantic queries)#35
paveldudka wants to merge 3 commits intomainfrom
feat/browser-sessions

Conversation

@paveldudka
Copy link
Copy Markdown
Contributor

Overview

Adds browser session management and semantic query tools to the MCP server, powered by AgentQL SDK + Tetra cloud browsers. The existing extract-web-data tool is preserved — this PR extends the server from a stateless REST wrapper into a full browser automation platform.

Version bump: 1.0.1 → 2.0.0 (new capabilities, API migration)

What Changed

New Tools

Tool Purpose
create_session Provision a Tetra cloud browser, return CDP URL
close_session Tear down a browser session
query_data Extract structured data from a live page (AgentQL semantic queries)
query_elements Find interactive elements with actionable CSS selectors

Existing Tool (preserved)

Tool Purpose
extract-web-data One-shot stateless extraction via REST API (unchanged behavior)

Migration

  • Server class → McpServer with registerTool() (the old server.tool() and manual setRequestHandler are deprecated)
  • Added agentql and playwright as dependencies

Architecture

Before: Stateless REST Wrapper

Agent ──MCP──▶ MCP Server ──HTTP──▶ AgentQL REST API
                                         │
                                    navigates, extracts,
                                    returns JSON

One tool, one call, done. No browser control.

After: Stateless REST + Full Browser Sessions

┌─────────────────────────────────────────────────────────────┐
│                          Agent                              │
│                                                             │
│  ┌──────────────┐         ┌─────────────────────────────┐   │
│  │  MCP Client  │         │ Native Browser (Playwright) │   │
│  └──────┬───────┘         └─────────────┬───────────────┘   │
└─────────┼───────────────────────────────┼───────────────────┘
          │ stdio                         │ CDP WebSocket
          ▼                               ▼
┌─────────────────────────┐    ┌─────────────────────────┐
│     MCP Server          │    │  Tetra Cloud Browser    │
│                         │    │  (Chromium instance)    │
│  extract-web-data ──REST──▶ AgentQL API               │
│                         │    │                         │
│  create_session ────────┼───▶│  wss://cdp-url          │
│  close_session          │    │                         │
│  query_data ────────────┼───▶│  (semantic queries)     │
│  query_elements ────────┼───▶│                         │
└─────────────────────────┘    └─────────────────────────┘

Session Lifecycle

create_session
     │
     ▼
Tetra provisions cloud browser
     │
     ▼
MCP server connects via CDP (holds connection alive)
     │
     ▼
Returns CDP URL to agent
     │
     ▼
Agent connects to same CDP URL (dual connection)
     │
     ├──▶ Agent browses directly (navigate, click, type)
     ├──▶ query_data extracts structured info from current page
     └──▶ query_elements finds elements with CSS selectors
               │
               ▼
          Agent clicks using selector:
          page.click('[tf623_id="42"]')
               │
               ▼
          close_session → MCP disconnects → browser dies

When to Use What

                ┌─────────────────────────┐
                │  "I need web data"      │
                └────────────┬────────────┘
                             │
                   ┌─────────▼─────────┐
                   │  Single URL,      │
                   │  no interaction?  │
                   └────┬─────────┬────┘
                    YES │         │ NO
                        ▼         ▼
              ┌──────────────┐  ┌──────────────────┐
              │ extract-web- │  │  create_session   │
              │ data         │  │  + browse via CDP │
              │              │  │  + query_data     │
              │ One call,    │  │  + query_elements │
              │ done.        │  │  + close_session  │
              └──────────────┘  └──────────────────┘

Key Design Decisions

  1. Dual CDP connections: MCP server holds one CDP connection (keeps browser alive, powers semantic queries). Agent holds another (for direct browsing). Either alone keeps the browser alive.

  2. Stateful, in-memory: Session state lives in a Map — no database, no persistence. The MCP server process lifecycle = session lifecycle. When the process dies, Tetra cleans up.

  3. Actionable selectors from query_elements: Returns [tf623_id="..."] CSS selectors that agents use directly over CDP. No Playwright locators cross the MCP boundary.

  4. Clear tool descriptions: Each tool explains when to use it vs alternatives, with AgentQL query syntax guide and examples baked into the description.

Testing

Verified end-to-end:

  • extract-web-data → HN top stories ✅
  • create_session → Tetra browser provisioned ✅
  • Agent CDP connection → navigate, screenshot ✅
  • query_data → pricing data extracted ✅
  • query_elements → selectors returned, used to click buttons ✅
  • close_session → cleanup ✅

…ery_elements)

- Add Tetra cloud browser integration via AgentQL SDK + Playwright
- New tools: create_session, close_session, query_data, query_elements
- Existing extract-web-data tool preserved (stateless REST API)
- Migrate from deprecated Server class to McpServer.registerTool()
- Stateful in-memory session management with CDP connection holding
- query_elements returns actionable CSS selectors for CDP interaction
- Clear tool descriptions guide agents on when to use each tool
- Bump version to 2.0.0
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This pull request introduces browser automation and web data extraction capabilities to the MCP Server. It adds a new session management module that enables persistent browser sessions via Playwright and AgentQL, supporting both stateless REST-based extraction and interactive session-based queries. The entry point is refactored to register five new tools: extract-web-data, create_session, close_session, query_data, and query_elements. Dependencies are updated to include agentql and playwright, and the package version is bumped to 2.0.0. Documentation is expanded with installation instructions for multiple IDEs, architecture details, and development guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client as MCP Client
    participant Server as MCP Server
    participant SessionMgr as Session Manager
    participant Browser as Playwright Browser
    participant AgentQL as AgentQL Service

    rect rgba(100, 200, 150, 0.5)
    Note over Client,AgentQL: Create Browser Session Flow
    Client->>Server: create_session(options)
    Server->>SessionMgr: createSession(params)
    SessionMgr->>AgentQL: Create remote session
    AgentQL-->>SessionMgr: CDP URL + Streaming URL
    SessionMgr->>Browser: Connect via CDP
    Browser-->>SessionMgr: Browser instance ready
    SessionMgr-->>Server: {session_id, cdp_url}
    Server-->>Client: Session created
    end

    rect rgba(100, 150, 200, 0.5)
    Note over Client,AgentQL: Query Data from Session
    Client->>Server: query_data(session_id, query)
    Server->>SessionMgr: queryData(sessionId, query)
    SessionMgr->>Browser: Get page from context
    Browser-->>SessionMgr: Page instance
    SessionMgr->>AgentQL: Execute semantic query
    AgentQL-->>SessionMgr: Query results
    SessionMgr-->>Server: {data: result}
    Server-->>Client: Query results
    end

    rect rgba(200, 150, 100, 0.5)
    Note over Client,AgentQL: Close Session
    Client->>Server: close_session(session_id)
    Server->>SessionMgr: closeSession(sessionId)
    SessionMgr->>Browser: Close browser
    Browser-->>SessionMgr: Closed
    SessionMgr->>AgentQL: Cleanup session
    SessionMgr-->>Server: {ok: true}
    Server-->>Client: Session closed
    end
Loading
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change—adding browser session tools with Tetra, CDP, and semantic queries—matching the substantial code additions in src/index.ts and src/sessions.ts.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing new tools (create_session, close_session, query_data, query_elements), architecture diagrams, design decisions, and test verification that align with the code changes.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/browser-sessions

Comment @coderabbitai help to get the list of available commands and usage tips.

@paveldudka paveldudka requested a review from colriot February 17, 2026 02:43
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
README.md (2)

19-29: Add language specifier to fenced code block.

This ASCII diagram code block is missing a language specifier. Use text or plaintext to satisfy markdown linting rules.

📝 Proposed fix
-```
+```text
 "I need web data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 19 - 29, The fenced ASCII diagram starting with the
line "I need web data" is missing a language tag; open the code fence that
currently begins with ``` and add a language specifier such as text or plaintext
(e.g., change ``` to ```text) so the diagram satisfies Markdown linting rules
and renders as plain text—update the code block that contains the diagram lines
including "Single URL, no interaction needed?" and the branches to
extract-web-data / create_session accordingly.

180-199: Add language specifier to architecture diagram code block.

Same issue as the decision flow diagram — add text or plaintext for markdown lint compliance.

📝 Proposed fix
-```
+```text
 ┌──────────────────────────────────────────────────────────┐
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 180 - 199, The fenced ASCII architecture diagram
block in README.md lacks a language specifier; update the opening
triple-backtick for that diagram (the block containing the Agent / MCP Server /
Tetra Cloud Browser ASCII art) to include a language tag such as text or
plaintext (e.g., ```text) so markdown linters accept it and rendering stays
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@README.md`:
- Around line 19-29: The fenced ASCII diagram starting with the line "I need web
data" is missing a language tag; open the code fence that currently begins with
``` and add a language specifier such as text or plaintext (e.g., change ``` to
```text) so the diagram satisfies Markdown linting rules and renders as plain
text—update the code block that contains the diagram lines including "Single
URL, no interaction needed?" and the branches to extract-web-data /
create_session accordingly.
- Around line 180-199: The fenced ASCII architecture diagram block in README.md
lacks a language specifier; update the opening triple-backtick for that diagram
(the block containing the Agent / MCP Server / Tetra Cloud Browser ASCII art) to
include a language tag such as text or plaintext (e.g., ```text) so markdown
linters accept it and rendering stays unchanged.

Copy link
Copy Markdown
Contributor

@colriot colriot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts and questions to discuss:

  • found elements can be used for clicks/interactions — who's gonna interact with them? Job for another MCP? Then we need to mention that and suggest options
  • Did you observe cases when session is not closed after interaction? (minor thing, user can close themselves, but still)
  • Naming — extract-web-data but query-data. I think we agreed to use extract-* (and get_elements) in all integrations, and even in general it just makes sense to have aligned naming within the same thing (plus underscores vs dashes)
  • We have prompt and AQL inputs for our APIs. For the the old one we used prompt only, for new ones you are using AQL only. Wdyt about aligning these?
  • Also, how reliable is AQL query generation from simple examples in this PR?

Comment thread README.md
┌────────────────────────┐ ┌─────────────────────────┐
│ MCP Server │ │ Tetra Cloud Browser │
│ │ │ (Chromium instance) │
│ extract-web-data ─REST─▶ AgentQL API │
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smth is off with this line. What should it be pointing at?

Comment thread README.md
Single URL, no interaction needed?
YES │ NO
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO branch looks disconnected

@paveldudka
Copy link
Copy Markdown
Contributor Author

paveldudka commented Feb 20, 2026

@colriot

  • found elements can be used for clicks/interactions — who's gonna interact with them? Job for another MCP? Then we need to mention that and suggest options

found elements returned as a json that contains enough breadcrumbs for AI to dynamically build a Playwright locator. That path is for cases when AI is directly connected to browser over CDP. So instead of analyzing HTML, AI triggers query_data tool. Tested locally - works

  • Did you observe cases when session is not closed after interaction? (minor thing, user can close themselves, but still)

I haven't, but tetra currently has a guardrail to auto-terminate on inactivity (5 mins by default). Maybe worth adding guardrail in the MCP tool itself too, so we clean up browser from the memory map by timeout too. Ill update

  • Naming — extract-web-data but query-data. I think we agreed to use extract-* (and get_elements) in all integrations, and even in general it just makes sense to have aligned naming within the same thing (plus underscores vs dashes)

underscore-vs-dashes - legit. Ill update.
But for others, what are your thouhgts? extract-web-data for REST API path and extract-data for CDP path? I find it worse..
AgentQL Docs describe those pieces as query-data and query-elements, so Im trying to be consistent. All integrations so far only focused on Rest API. But in this case we tap into fundamental AgentQL APIs and those are query-data and query-elements. We agreed to keep AgentQL API naming intact

  • We have prompt and AQL inputs for our APIs. For the the old one we used prompt only, for new ones you are using AQL only. Wdyt about aligning these?

that sounds like an extra MCP tool which we don't need at this point. I don't want to switch to prompt-only exclusively also - that will introduce an extra AI roundtrip (which we will be paying for) to generate a query to eventually call exactly the same query-data API. This is already an AI path, so why not generate query on user side? Will be much faster. Consistency argument in this context is overrated IMO.
Argument towards having a prompt-based approach - is that we can probably generate query a bit more reliably since query gen quality will depend on user's choice of model. And on our side its more stable. But I would optimize for a happy path here and use our query gen as a fallback only rather than forcing all users to pay latency penalty

@ayc1
Copy link
Copy Markdown

ayc1 commented Apr 7, 2026

i see closeAll is called on shutdown - that covers the happy path. the risk is ungraceful exits (SIGKILL, OOM, MCP client crash) where closeAll never runs and sessions leak.

if Tetra has idle timeouts on sessions this is probably fine in practice. if sessions are billed or have no TTL it's worth adding a SIGTERM/SIGINT handler or a heartbeat-based cleanup.

Comment thread src/sessions.ts
Comment on lines +62 to +84
const tetraSession = await createBrowserSession(sessionOpts);
const cdpUrl = tetraSession.cdpUrl;

// Connect Playwright to hold the connection alive
const browser = await chromium.connectOverCDP(cdpUrl);

const sessionId = `sess_${++idCounter}_${Date.now()}`;
const streamingUrl = tetraSession.getPageStreamingUrl(0);

sessions.set(sessionId, {
sessionId,
cdpUrl,
streamingUrl,
browser,
tetraSession,
createdAt: new Date(),
});

return {
session_id: sessionId,
cdp_url: cdpUrl,
streaming_url: streamingUrl,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If connectOverCDP() or getPageStreamingUrl() throws after createBrowserSession() succeeds, I believe the cloud browser is leaked

we can wrap these lines 62-84 in try/catch, call await tetraSession.close() in the catch

Comment thread src/sessions.ts
const entry = sessions.get(sessionId);
if (entry) {
try {
await entry.browser.close();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

browser.close() disconnects the local CDP client but never calls tetraSession.close(). Remote browsers may keep running and billing.

we should add await entry.tetraSession.close() before or after browser.close().

Comment thread src/sessions.ts
Comment on lines +198 to +201
await wrappedPage.queryElements(query, {
includeHidden: options.include_hidden ?? false,
mode: (options.mode ?? "fast") as "standard" | "fast",
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLastResponse() race in queryElements here; two concurrent calls on the same page can return each other's selectors. Since selectors are used for clicks/typing, this is silent data corruption.

lets serialize queries per page, or use an API that returns results directly

Comment thread package.json
"node-fetch": "^3.3.2"
"agentql": "^1.17.0",
"node-fetch": "^3.3.2",
"playwright": "^1.58.2"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can get away with playwright-core here. as far as i can tell the MCP server only uses connectOverCDP, the whole playwright lib downloads ~300MB of unused browser binaries

Comment thread src/sessions.ts
proxy_url?: string;
}

const sessions = new Map<string, SessionEntry>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded Map, no cap, no reaper. A buggy agent loop can exhaust memory and Tetra resources - i think we want session limits or TTL

Comment thread package.json
"agentql": "^1.17.0",
"node-fetch": "^3.3.2",
"playwright": "^1.58.2"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also think we need to add zod

Comment thread src/sessions.ts
if (params.profile !== "stealth") {
sessionOpts.uaPreset = uaPreset;
}
if (params.proxy === "tetra") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy: "custom" without proxy_url silently ignored: Tool description says required, code doesn't enforce it. Falls through to unproxied

Comment thread src/sessions.ts
// Connect Playwright to hold the connection alive
const browser = await chromium.connectOverCDP(cdpUrl);

const sessionId = `sess_${++idCounter}_${Date.now()}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential counter + Date.now() - why not yse crypto.randomUUID() ?

Copy link
Copy Markdown

@londondavila londondavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a bunch of suggestions @paveldudka . approving with notes + nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants