Skip to content

Commit 2831edc

Browse files
authored
Update README.md
VA gaps removed, added MCP link.
1 parent 0fd81da commit 2831edc

File tree

1 file changed

+18
-22
lines changed

1 file changed

+18
-22
lines changed

README.md

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
1-
# Vision Agent MCP Server
2-
3-
<!-- ───────────────────────────── Badges ───────────────────────────── -->
4-
<!-- Replace all TODOs with real links once available -->
1+
# VisionAgent MCP Server
52

63
[![npm](https://img.shields.io/npm/v/vision-tools-mcp?label=npm)](https://www.npmjs.com/package/vision-tools-mcp)
74
![build](https://github.com/landing-ai/vision-agent-mcp/actions/workflows/ci.yml/badge.svg)
@@ -10,10 +7,9 @@
107
> This project is **early access** and subject to breaking changes until v1.0.
118
129

13-
## Vision Agent MCP Server v0.1 - Overview
10+
## VisionAgent MCP Server v0.1 - Overview
1411

15-
Modern LLM “agents” call external tools through the **Model Context Protocol (MCP)**.
16-
**Vision Agent MCP** is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s Vision Agent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
12+
Modern LLM “agents” call external tools through the **[Model Context Protocol (MCP)](https://modelcontextprotocol.io/).** **VisionAgent MCP** is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
1713

1814

1915
## 📸 Demo
@@ -31,7 +27,7 @@ Modern LLM “agents” call external tools through the **Model Context Protocol
3127
| **`activity-recognition`** | Recognise multiple activities in video with start/end timestamps. |
3228
| **`depth-pro`** | High-resolution monocular depth estimation for single images. |
3329

34-
> Run **`npm run generate-tools`** whenever Vision Agent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
30+
> Run **`npm run generate-tools`** whenever VisionAgent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
3531
3632

3733
## 🗺 Table of Contents
@@ -54,13 +50,13 @@ If you do not have a VisionAgent API key, [create an account](https://va.landing
5450
# 1 Install
5551
npm install -g vision-tools-mcp
5652

57-
# 2 Set your Vision Agent API key
53+
# 2 Set your VisionAgent API key
5854
export VISION_AGENT_API_KEY="<YOUR_API_KEY>"
5955

6056
# 3 Configure your MCP client with the following settings:
6157
{
6258
"mcpServers": {
63-
"Vision Agent": {
59+
"VisionAgent": {
6460
"command": "npx",
6561
"args": ["vision-tools-mcp"],
6662
"env": {
@@ -89,7 +85,7 @@ If your client supports inline resources, you’ll see bounding-box overlays; ot
8985
| Software | Minimum Version |
9086
| ------------------------ | ---------------------------------------- |
9187
| **Node.js** | 20 (LTS) |
92-
| **Vision Agent account** | Any paid or free tier (needs API key) |
88+
| **VisionAgent account** | Any paid or free tier (needs API key) |
9389
| **MCP client** | Claude Desktop / Cursor / Cline / *etc.* |
9490

9591

@@ -106,7 +102,7 @@ If your client supports inline resources, you’ll see bounding-box overlays; ot
106102
```jsonc
107103
{
108104
"mcpServers": {
109-
"Vision Agent": {
105+
"VisionAgent": {
110106
"command": "npx",
111107
"args": ["vision-tools-mcp"],
112108
"env": {
@@ -137,21 +133,21 @@ For MCP clients without image display capabilities, like Cursor, set IMAGE_DISPL
137133

138134
```text
139135
┌────────────────────┐ 1. human prompt ┌───────────────────┐
140-
│ MCP-capable client │───────────────────────────▶│ Vision Agent MCP │
136+
│ MCP-capable client │───────────────────────────▶│ VisionAgent MCP │
141137
│ (Cursor, Claude) │ │ (this repo) │
142138
└────────────────────┘ └─────────▲─────────┘
143139
▲ 6. rendered PNG / JSON │ 2. JSON tool call
144140
│ │
145141
│ 5. preview path / data 3. HTTPS │
146142
│ ▼
147-
local disk ◀──────────┐ Landing AI Vision Agent
143+
local disk ◀──────────┐ Landing AI VisionAgent
148144
└────────────── Cloud APIs
149145
4. JSON / media blob
150146
```
151147

152148
1. **Prompt → tool-call** The client converts your natural-language prompt into a structured MCP call.
153149
2. **Validation** The server validates args with Zod schemas derived from the live OpenAPI spec.
154-
3. **Forward** An authenticated Axios request hits the Vision Agent endpoint.
150+
3. **Forward** An authenticated Axios request hits the VisionAgent endpoint.
155151
4. **Response** JSON + any base64 media are returned.
156152
5. **Visualization** If enabled, masks / boxes / depth maps are rendered to files.
157153
6. **Return to chat** The MCP client receives data + file paths (or inline previews).
@@ -189,7 +185,7 @@ Here’s how to dive into the code, add new endpoints, or troubleshoot issues.
189185

190186
### Environment Variables
191187

192-
- `VISION_AGENT_API_KEY` - **Required** API key for Vision Agent authentication
188+
- `VISION_AGENT_API_KEY` - **Required** API key for VisionAgent authentication
193189
- `OUTPUT_DIRECTORY` - Optional directory for saving processed outputs (supports relative and absolute paths)
194190
- `IMAGE_DISPLAY_ENABLED` - Set to `"true"` to enable image visualization features
195191

@@ -200,7 +196,7 @@ After building, configure your MCP client with the following settings:
200196
```json
201197
{
202198
"mcpServers": {
203-
"Vision Agent": {
199+
"VisionAgent": {
204200
"command": "node",
205201
"args": [
206202
"/path/to/build/index.js"
@@ -227,7 +223,7 @@ After building, configure your MCP client with the following settings:
227223
| `npm run generate-tools` | Fetch latest OpenAPI and regenerate `toolDefinitionMap.ts`. |
228224
| `npm run build:all` | Convenience: `npm run build` + `npm run generate-tools`. |
229225

230-
> **Pro Tip**: If you modify any files under `src/` or want to pick up new endpoints from Vision Agent, run `npm run build:all` to recompile + regenerate tool definitions.
226+
> **Pro Tip**: If you modify any files under `src/` or want to pick up new endpoints from VisionAgent, run `npm run build:all` to recompile + regenerate tool definitions.
231227
232228

233229
### 📂 Project Layout
@@ -279,7 +275,7 @@ vision-agent-mcp/
279275

280276
1. **`src/generateTools.ts`**
281277

282-
* Fetches `https://api.va.landing.ai/openapi.json` (Vision Agent’s public OpenAPI).
278+
* Fetches `https://api.va.landing.ai/openapi.json` (VisionAgent’s public OpenAPI).
283279
* Filters endpoints via a whitelist (or you can disable filtering to include all).
284280
* Converts JSON Schema → Zod schemas, writes `toolDefinitionMap.ts` with a `Map<string, McpToolDefinition>`.
285281
* Run: `npm run generate-tools`.
@@ -297,7 +293,7 @@ vision-agent-mcp/
297293
* Validates incoming `arguments` with Zod.
298294
* If file-based args (e.g., `imagePath`, `pdfPath`), reads & base64-encodes via `src/utils/file.ts`.
299295
* Builds a multipart/form-data or JSON payload for Axios.
300-
* Calls Vision Agent endpoint, catches errors, returns MCP-compliant JSON response.
296+
* Calls VisionAgent endpoint, catches errors, returns MCP-compliant JSON response.
301297
* If `IMAGE_DISPLAY_ENABLED=true`, calls `src/server/visualization.ts` to save PNGs/JSON.
302298

303299
4. **`src/server/visualization.ts`**
@@ -315,7 +311,7 @@ vision-agent-mcp/
315311

316312
* Configures Axios with base URL `https://api.va.landing.ai`.
317313
* Adds `Authorization: Bearer ${VISION_AGENT_API_KEY}` header.
318-
* Wraps calls to Vision Agent endpoints, handles 4xx/5xx, formats errors into MCP error objects.
314+
* Wraps calls to VisionAgent endpoints, handles 4xx/5xx, formats errors into MCP error objects.
319315

320316
7. **`src/validation/schema.ts`**
321317

@@ -359,7 +355,7 @@ vision-agent-mcp/
359355
"id": 4,
360356
"error": {
361357
"code": -32000,
362-
"message": "Vision Agent API error: 502 Bad Gateway"
358+
"message": "VisionAgent API error: 502 Bad Gateway"
363359
}
364360
}
365361
```

0 commit comments

Comments
 (0)