You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> This project is **early access** and subject to breaking changes until v1.0.
11
8
12
9
13
-
## Vision Agent MCP Server v0.1 - Overview
10
+
## VisionAgent MCP Server v0.1 - Overview
14
11
15
-
Modern LLM “agents” call external tools through the **Model Context Protocol (MCP)**.
16
-
**Vision Agent MCP** is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s Vision Agent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
12
+
Modern LLM “agents” call external tools through the **[Model Context Protocol (MCP)](https://modelcontextprotocol.io/).****VisionAgent MCP** is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
17
13
18
14
19
15
## 📸 Demo
@@ -31,7 +27,7 @@ Modern LLM “agents” call external tools through the **Model Context Protocol
31
27
|**`activity-recognition`**| Recognise multiple activities in video with start/end timestamps. |
32
28
|**`depth-pro`**| High-resolution monocular depth estimation for single images. |
33
29
34
-
> Run **`npm run generate-tools`** whenever Vision Agent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
30
+
> Run **`npm run generate-tools`** whenever VisionAgent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
35
31
36
32
37
33
## 🗺 Table of Contents
@@ -54,13 +50,13 @@ If you do not have a VisionAgent API key, [create an account](https://va.landing
54
50
# 1 Install
55
51
npm install -g vision-tools-mcp
56
52
57
-
# 2 Set your Vision Agent API key
53
+
# 2 Set your VisionAgent API key
58
54
export VISION_AGENT_API_KEY="<YOUR_API_KEY>"
59
55
60
56
# 3 Configure your MCP client with the following settings:
61
57
{
62
58
"mcpServers": {
63
-
"Vision Agent": {
59
+
"VisionAgent": {
64
60
"command": "npx",
65
61
"args": ["vision-tools-mcp"],
66
62
"env": {
@@ -89,7 +85,7 @@ If your client supports inline resources, you’ll see bounding-box overlays; ot
1.**Prompt → tool-call** The client converts your natural-language prompt into a structured MCP call.
153
149
2.**Validation** The server validates args with Zod schemas derived from the live OpenAPI spec.
154
-
3.**Forward** An authenticated Axios request hits the Vision Agent endpoint.
150
+
3.**Forward** An authenticated Axios request hits the VisionAgent endpoint.
155
151
4.**Response** JSON + any base64 media are returned.
156
152
5.**Visualization** If enabled, masks / boxes / depth maps are rendered to files.
157
153
6.**Return to chat** The MCP client receives data + file paths (or inline previews).
@@ -189,7 +185,7 @@ Here’s how to dive into the code, add new endpoints, or troubleshoot issues.
189
185
190
186
### Environment Variables
191
187
192
-
-`VISION_AGENT_API_KEY` - **Required** API key for Vision Agent authentication
188
+
-`VISION_AGENT_API_KEY` - **Required** API key for VisionAgent authentication
193
189
-`OUTPUT_DIRECTORY` - Optional directory for saving processed outputs (supports relative and absolute paths)
194
190
-`IMAGE_DISPLAY_ENABLED` - Set to `"true"` to enable image visualization features
195
191
@@ -200,7 +196,7 @@ After building, configure your MCP client with the following settings:
200
196
```json
201
197
{
202
198
"mcpServers": {
203
-
"Vision Agent": {
199
+
"VisionAgent": {
204
200
"command": "node",
205
201
"args": [
206
202
"/path/to/build/index.js"
@@ -227,7 +223,7 @@ After building, configure your MCP client with the following settings:
227
223
|`npm run generate-tools`| Fetch latest OpenAPI and regenerate `toolDefinitionMap.ts`. |
228
224
|`npm run build:all`| Convenience: `npm run build` + `npm run generate-tools`. |
229
225
230
-
> **Pro Tip**: If you modify any files under `src/` or want to pick up new endpoints from Vision Agent, run `npm run build:all` to recompile + regenerate tool definitions.
226
+
> **Pro Tip**: If you modify any files under `src/` or want to pick up new endpoints from VisionAgent, run `npm run build:all` to recompile + regenerate tool definitions.
231
227
232
228
233
229
### 📂 Project Layout
@@ -279,7 +275,7 @@ vision-agent-mcp/
279
275
280
276
1.**`src/generateTools.ts`**
281
277
282
-
* Fetches `https://api.va.landing.ai/openapi.json` (Vision Agent’s public OpenAPI).
278
+
* Fetches `https://api.va.landing.ai/openapi.json` (VisionAgent’s public OpenAPI).
283
279
* Filters endpoints via a whitelist (or you can disable filtering to include all).
284
280
* Converts JSON Schema → Zod schemas, writes `toolDefinitionMap.ts` with a `Map<string, McpToolDefinition>`.
285
281
* Run: `npm run generate-tools`.
@@ -297,7 +293,7 @@ vision-agent-mcp/
297
293
* Validates incoming `arguments` with Zod.
298
294
* If file-based args (e.g., `imagePath`, `pdfPath`), reads & base64-encodes via `src/utils/file.ts`.
299
295
* Builds a multipart/form-data or JSON payload for Axios.
0 commit comments