feat(mcp): hugegraph-mcp#57
Conversation
This workflow will be triggered when a pull request is opened. It will then post a comment "@codecov-ai-reviewer review" to help with automated AI code reviews. It will use the `peter-evans/create-or-update-comment` action to create the comment. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
@codecov-ai-reviewer review |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Walkthrough新增 hugegraph-mcp 子项目(FastMCP 服务、包、入口、打包配置与测试),实现实时 schema 查询、Gremlin 读/写与批量 schema 操作;改进 pyhugegraph 日志初始化错误处理;调整 Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client/助手
participant MCP as FastMCP 服务
participant Schema as schema_tools
participant Gremlin as gremlin_tools
Client->>MCP: 调用 get_live_schema_tool()
MCP->>Schema: get_live_schema()
Schema-->>MCP: 返回 { schema, simple_schema, graphspace?, readonly }
MCP-->>Client: 返回 schema 结果
alt 读查询流程
Client->>MCP: execute_gremlin_read_tool(gremlin_query)
MCP->>Gremlin: execute_gremlin_read(query)
Gremlin-->>MCP: 返回 { data, total, duration_ms, is_read }
MCP-->>Client: 返回读结果
else 写查询(服务允许写)
Client->>MCP: execute_gremlin_write_tool(gremlin_query)
MCP->>Gremlin: execute_gremlin_write(query)
Gremlin-->>MCP: 返回 { success, affected, duration_ms, is_write }
MCP-->>Client: 返回写结果
else 写查询(只读模式)
Client->>MCP: execute_gremlin_write_tool(gremlin_query)
MCP-->>Client: PermissionError / 拒绝执行
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 需要额外关注的区域:
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
On it! We are reviewing the PR and will provide feedback shortly. |
Summary of ChangesHello @fantasy-lotus, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant structural and documentation updates by renaming the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| # before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
| *.manifest | ||
| *.spec | ||
| # Exception: allow .spec/ directory to be tracked |
There was a problem hiding this comment.
The gitignore patterns for .spec/ directory are functionally correct, but the comment '# Exception: allow .spec/ directory to be tracked' may be misleading to future maintainers. The original *.spec pattern ignores files like 'application.spec', not '.spec' directories. The negation patterns !.spec/ and !.spec/** correctly whitelist the hidden .spec/ directory. However, clarify the intent: Are you intentionally moving specification docs into a hidden directory? This may reduce discoverability. Consider adding a comment explaining why specs are placed in a hidden directory and updating the project's CONTRIBUTING.md or README.md to guide developers to find documentation in .spec/ instead of spec/.
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: .gitignore#L37-L40
Potential issue: The gitignore patterns for .spec/ directory are functionally correct,
but the comment '# Exception: allow .spec/ directory to be tracked' may be misleading to
future maintainers. The original `*.spec` pattern ignores files like 'application.spec',
not '.spec' directories. The negation patterns `!.spec/` and `!.spec/**` correctly
whitelist the hidden `.spec/` directory. However, clarify the intent: Are you
intentionally moving specification docs into a hidden directory? This may reduce
discoverability. Consider adding a comment explaining why specs are placed in a hidden
directory and updating the project's CONTRIBUTING.md or README.md to guide developers to
find documentation in `.spec/` instead of `spec/`.
Did we get this right? 👍 / 👎 to inform future reviews.
| # HugeGraph MCP Server 需求文档 | ||
|
|
||
| ## 简介 | ||
|
|
||
| HugeGraph MCP Server 是一个基于 Model Context Protocol (MCP) 的服务实现,旨在为大语言模型(LLM)提供与 HugeGraph 图数据库的标准化交互接口。该服务使 Claude Desktop 等 MCP 客户端能够通过自然语言操作 HugeGraph 数据库,执行图数据的 CRUD 操作、Schema 管理、图算法调用等功能。 | ||
|
|
||
| 本实现将参考 Neo4j MCP 和 NebulaGraph MCP 的设计,采用 Python 语言开发,集成到现有的 hugegraph-llm 项目中,复用现有的 PyHugeClient 和相关工具链。 | ||
|
|
||
| ## 需求列表 | ||
|
|
||
| ### 需求 1: 项目基础设施 | ||
|
|
||
| **用户故事:** 作为开发者,我希望能够快速安装和配置 HugeGraph MCP Server,以便在不同环境中使用。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN 用户通过 `uvx hugegraph-mcp-server` 命令启动 THEN 系统应成功加载配置并启动 MCP 服务 | ||
| 2. WHEN 用户提供环境变量(HUGEGRAPH_URL, HUGEGRAPH_GRAPH, HUGEGRAPH_USER, HUGEGRAPH_PASSWORD) THEN 系统应使用这些配置连接到 HugeGraph | ||
| 3. WHEN 用户提供 .env 文件配置 THEN 系统应优先从 .env 文件读取配置信息 | ||
| 4. WHEN 用户未提供配置 THEN 系统应从现有的 hugegraph_llm.config.huge_settings 中读取默认配置 | ||
| 5. WHEN 用户指定 `--transport stdio` 参数 THEN 系统应以 STDIO 模式启动 | ||
| 6. WHEN 用户指定 `--transport http` 参数 THEN 系统应以 HTTP 模式启动,并支持 --host 和 --port 参数 | ||
| 7. WHEN 启动失败(如连接不上 HugeGraph) THEN 系统应返回清晰的错误信息并退出 | ||
|
|
||
| ### 需求 2: 顶点(Vertex)的 CRUD 操作 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够通过自然语言创建、查询、更新和删除图中的顶点,以便管理图数据。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `get_vertex` 工具并提供顶点 ID THEN 系统应返回该顶点的完整信息(包括 label 和所有 properties) | ||
| 2. WHEN LLM 调用 `get_vertices` 工具并提供多个顶点 ID THEN 系统应批量返回这些顶点的信息 | ||
| 3. WHEN LLM 调用 `query_vertices` 工具并提供查询条件(label, property filters) THEN 系统应返回符合条件的顶点列表 | ||
| 4. WHEN 查询顶点的结果超过默认限制(如 100 个) THEN 系统应自动分页并返回前 N 个结果及提示信息 | ||
| 5. WHEN LLM 调用 `create_vertex` 工具并提供 label 和 properties THEN 系统应创建新顶点并返回顶点 ID | ||
| 6. WHEN LLM 调用 `create_vertices` 工具并提供顶点列表 THEN 系统应批量创建顶点并返回成功/失败的详细结果 | ||
| 7. WHEN 创建顶点时 label 不存在于 schema 中 THEN 系统应返回错误信息,提示 label 不存在 | ||
| 8. WHEN 创建顶点时 properties 类型与 schema 定义不匹配 THEN 系统应返回验证错误并说明期望的类型 | ||
| 9. WHEN LLM 调用 `update_vertex` 工具并提供顶点 ID 和新的 properties THEN 系统应更新顶点属性并返回更新后的顶点信息 | ||
| 10. WHEN 更新顶点时提供的 property key 不存在于 schema THEN 系统应返回错误信息 | ||
| 11. WHEN LLM 调用 `delete_vertex` 工具并提供顶点 ID THEN 系统应删除该顶点及其所有关联的边,并返回删除确认 | ||
| 12. WHEN 删除的顶点不存在 THEN 系统应返回 404 错误信息 | ||
|
|
||
| ### 需求 3: 边(Edge)的 CRUD 操作 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够通过自然语言创建、查询和更新图中的边,以便建立和管理实体之间的关系。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `get_edge` 工具并提供边 ID THEN 系统应返回该边的完整信息(包括 label, source, target 和 properties) | ||
| 2. WHEN LLM 调用 `get_edges` 工具并提供多个边 ID THEN 系统应批量返回这些边的信息 | ||
| 3. WHEN LLM 调用 `query_edges` 工具并提供查询条件(label, source_id, target_id, property filters) THEN 系统应返回符合条件的边列表 | ||
| 4. WHEN 查询边时仅提供 source_id THEN 系统应返回该顶点的所有出边 | ||
| 5. WHEN 查询边时仅提供 target_id THEN 系统应返回该顶点的所有入边 | ||
| 6. WHEN LLM 调用 `create_edge` 工具并提供 label, source_id, target_id 和 properties THEN 系统应创建新边并返回边 ID | ||
| 7. WHEN LLM 调用 `create_edges` 工具并提供边列表 THEN 系统应批量创建边并返回成功/失败的详细结果 | ||
| 8. WHEN 创建边时 source 或 target 顶点不存在 THEN 系统应返回错误信息,提示顶点不存在 | ||
| 9. WHEN 创建边时 label 与 source/target 顶点的 label 组合不符合 schema 定义 THEN 系统应返回错误信息 | ||
| 10. WHEN LLM 调用 `update_edge` 工具并提供边 ID 和新的 properties THEN 系统应更新边属性并返回更新后的边信息 | ||
| 11. WHEN 批量创建边时部分成功部分失败 THEN 系统应返回成功的边 ID 列表和失败的错误详情 | ||
|
|
||
| ### 需求 4: Schema 管理 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够查询和理解图数据库的 Schema 结构,以便正确地创建和查询数据。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `get_schema` 工具 THEN 系统应返回完整的 schema 信息(包括 vertexlabels, edgelabels, propertykeys) | ||
| 2. WHEN LLM 通过 MCP Resource 访问 `hugegraph://schema` THEN 系统应返回简化的 schema 信息(仅包含必要字段) | ||
| 3. WHEN LLM 调用 `get_vertex_labels` 工具 THEN 系统应返回所有顶点 label 及其属性定义 | ||
| 4. WHEN LLM 调用 `get_edge_labels` 工具 THEN 系统应返回所有边 label 及其属性定义和约束(source_label, target_label) | ||
| 5. WHEN schema 为空(新图) THEN 系统应返回空的 schema 结构并提示用户需要先定义 schema | ||
| 6. WHEN LLM 请求 schema 的统计信息 THEN 系统应返回 vertex label 数量、edge label 数量和 property key 数量 | ||
|
|
||
| ### 需求 5: 图统计信息 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够获取图数据库的统计信息,以便了解数据规模和概况。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 通过 MCP Resource 访问 `hugegraph://statistics` THEN 系统应返回图的统计信息 | ||
| 2. WHEN 获取统计信息 THEN 系统应返回顶点总数、边总数 | ||
| 3. WHEN 获取统计信息 THEN 系统应返回最多 10000 个顶点 ID 的样本列表 | ||
| 4. WHEN 获取统计信息 THEN 系统应返回最多 200 个边 ID 的样本列表 | ||
| 5. WHEN 返回样本数据时数据量超过限制 THEN 系统应在返回结果中包含说明信息,告知这只是部分数据 | ||
|
|
||
| ### 需求 6: 自然语言转 Gremlin 查询 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够使用自然语言描述查询需求,系统自动生成对应的 Gremlin 查询,以便更便捷地操作图数据库。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `text_to_gremlin` 工具并提供自然语言查询 THEN 系统应调用现有的 text2gremlin flow 生成 Gremlin 查询 | ||
| 2. WHEN 生成 Gremlin 查询 THEN 系统应同时返回模板化的查询和原始查询 | ||
| 3. WHEN LLM 通过 MCP Resource 访问 `hugegraph://gremlin-examples` THEN 系统应返回预定义的 Gremlin 查询示例 | ||
| 4. WHEN 自然语言查询模糊或无法转换 THEN 系统应返回可能的查询建议或要求用户澄清 | ||
| 5. WHEN 生成的 Gremlin 查询可能修改数据 THEN 系统应在返回结果中添加警告信息 | ||
|
|
||
| ### 需求 7: 图算法支持 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够快速调用常用的图算法,以便分析图结构和关系。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `shortest_path` 工具并提供起始顶点 ID 和目标顶点 ID THEN 系统应返回最短路径(包括路径上的顶点和边) | ||
| 2. WHEN 两个顶点之间不存在路径 THEN 系统应返回空路径并说明原因 | ||
| 3. WHEN LLM 调用 `k_neighbor` 工具并提供顶点 ID 和深度 k THEN 系统应返回 k 度邻居的所有顶点 | ||
| 4. WHEN k 值过大(如 >5) THEN 系统应限制最大深度并返回警告信息 | ||
| 5. WHEN k 度邻居查询结果数量过多 THEN 系统应限制返回数量(如最多 1000 个)并提示 | ||
| 6. WHEN 调用图算法时提供的顶点 ID 不存在 THEN 系统应返回错误信息 | ||
|
|
||
| ### 需求 8: 错误处理和验证 | ||
|
|
||
| **用户故事:** 作为开发者和用户,我希望系统能够提供清晰的错误信息和输入验证,以便快速定位和解决问题。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN 任何操作失败 THEN 系统应返回详细的错误信息(包括错误类型、原因、建议) | ||
| 2. WHEN 连接 HugeGraph 失败 THEN 系统应返回连接错误并提示检查配置 | ||
| 3. WHEN 输入参数缺失或类型错误 THEN 系统应在执行前验证并返回参数验证错误 | ||
| 4. WHEN 批量操作部分成功 THEN 系统应返回成功列表和失败列表,每个失败项包含详细错误信息 | ||
| 5. WHEN 创建或更新操作违反 schema 约束 THEN 系统应返回约束违反错误并说明具体的约束要求 | ||
| 6. WHEN 顶点或边 ID 格式错误 THEN 系统应返回格式错误信息 | ||
| 7. WHEN 操作超时 THEN 系统应返回超时错误并建议优化查询或增加超时时间 | ||
|
|
||
| ### 需求 9: MCP 协议规范 | ||
|
|
||
| **用户故事:** 作为 MCP 客户端,我希望 HugeGraph MCP Server 完全遵循 MCP 协议规范,以便能够无缝集成。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN MCP 客户端发送 `initialize` 请求 THEN 服务器应返回服务器信息和支持的协议版本 | ||
| 2. WHEN MCP 客户端请求 `tools/list` THEN 服务器应返回所有可用的工具列表及其描述 | ||
| 3. WHEN MCP 客户端请求 `resources/list` THEN 服务器应返回所有可用的资源列表(schema, statistics, gremlin-examples) | ||
| 4. WHEN MCP 客户端调用 `tools/call` 并提供工具名称和参数 THEN 服务器应执行工具并返回结构化结果 | ||
| 5. WHEN MCP 客户端请求 `resources/read` 并提供资源 URI THEN 服务器应返回对应的资源内容 | ||
| 6. WHEN 工具调用失败 THEN 服务器应返回符合 MCP 规范的错误响应 | ||
| 7. WHEN 以 STDIO 模式运行 THEN 服务器应通过标准输入输出进行 JSON-RPC 通信 | ||
| 8. WHEN 以 HTTP 模式运行 THEN 服务器应支持 HTTP POST 请求和 Server-Sent Events (SSE) | ||
|
|
||
| ### 需求 10: 配置和部署 | ||
|
|
||
| **用户故事:** 作为系统管理员,我希望能够灵活配置和部署 HugeGraph MCP Server,以便适应不同的环境需求。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN 用户设置环境变量 HUGEGRAPH_URL THEN 系统应使用该 URL 连接 HugeGraph | ||
| 2. WHEN 用户设置环境变量 HUGEGRAPH_GRAPH THEN 系统应连接到指定的图实例 | ||
| 3. WHEN 用户设置环境变量 HUGEGRAPH_USER 和 HUGEGRAPH_PASSWORD THEN 系统应使用这些凭据进行认证 | ||
| 4. WHEN 用户设置环境变量 HUGEGRAPH_GRAPHSPACE THEN 系统应在指定的 graphspace 中操作 | ||
| 5. WHEN .env 文件存在 THEN 系统应自动加载 .env 文件中的配置 | ||
| 6. WHEN 环境变量和 .env 文件都存在 THEN 环境变量应优先级更高 | ||
| 7. WHEN 未提供任何配置 THEN 系统应尝试从 hugegraph_llm.config.huge_settings 读取默认配置 | ||
| 8. WHEN 用户指定 --log-level 参数 THEN 系统应设置对应的日志级别(DEBUG, INFO, WARNING, ERROR) | ||
|
|
||
| ### 需求 11: 测试和质量保证 | ||
|
|
||
| **用户故事:** 作为开发者,我希望项目包含完善的测试,以便保证代码质量和功能正确性。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN 运行单元测试 THEN 所有 MCP 工具的核心逻辑应通过单元测试(使用 mock) | ||
| 2. WHEN 运行集成测试 THEN 系统应能够连接真实的 HugeGraph 实例并执行端到端测试 | ||
| 3. WHEN 使用 @modelcontextprotocol/inspector 工具 THEN 应能够成功连接并测试所有工具 | ||
| 4. WHEN 执行测试时 THEN 测试覆盖率应达到核心功能代码的 80% 以上 | ||
| 5. WHEN 测试失败 THEN 应输出清晰的失败原因和堆栈信息 | ||
|
|
||
| ### 需求 12: 文档和示例 | ||
|
|
||
| **用户故事:** 作为新用户,我希望能够通过清晰的文档快速了解和使用 HugeGraph MCP Server。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN 用户查看 README.md THEN 应包含项目简介、安装方法、配置说明和快速开始示例 | ||
| 2. WHEN 用户查看 README.md THEN 应包含每个 MCP 工具的简要说明和使用示例 | ||
| 3. WHEN 用户查看 API 文档 THEN 应包含每个工具的详细参数说明、返回值格式和错误码 | ||
| 4. WHEN 用户查看集成指南 THEN 应包含如何在 Claude Desktop 中配置 HugeGraph MCP Server 的步骤 | ||
| 5. WHEN 用户查看集成指南 THEN 应包含在其他 MCP 客户端(如 Cursor, VS Code)中的配置示例 | ||
| 6. WHEN 用户查看示例代码 THEN 应包含常见使用场景的完整示例(创建知识图谱、查询关系等) |
There was a problem hiding this comment.
The requirements document is comprehensive, but several acceptance criteria lack quantifiable metrics or thresholds. For example, Requirement 3 states 'WHEN 查询边时仅提供 source_id THEN 系统应返回该顶点的所有出边' without specifying performance expectations or result limits. Similarly, Requirement 5 mentions '最多 10000 个顶点 ID' and '最多 200 个边 ID' but doesn't explain why these specific limits were chosen or how they should be enforced. Recommendation: Add a section defining performance SLAs, result limits, and timeout values to make acceptance testing unambiguous.
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: .spec/hugegraph-mcp/graph_mcp/requirements.md#L1-L179
Potential issue: The requirements document is comprehensive, but several acceptance
criteria lack quantifiable metrics or thresholds. For example, Requirement 3 states
'WHEN 查询边时仅提供 source_id THEN 系统应返回该顶点的所有出边' without specifying performance expectations
or result limits. Similarly, Requirement 5 mentions '最多 10000 个顶点 ID' and '最多 200 个边 ID'
but doesn't explain why these specific limits were chosen or how they should be
enforced. Recommendation: Add a section defining performance SLAs, result limits, and
timeout values to make acceptance testing unambiguous.
Did we get this right? 👍 / 👎 to inform future reviews.
|
|
||
| 1. WHEN LLM 调用 `get_vertex` 工具并提供顶点 ID THEN 系统应返回该顶点的完整信息(包括 label 和所有 properties) | ||
| 2. WHEN LLM 调用 `get_vertices` 工具并提供多个顶点 ID THEN 系统应批量返回这些顶点的信息 | ||
| 3. WHEN LLM 调用 `query_vertices` 工具并提供查询条件(label, property filters) THEN 系统应返回符合条件的顶点列表 | ||
| 4. WHEN 查询顶点的结果超过默认限制(如 100 个) THEN 系统应自动分页并返回前 N 个结果及提示信息 | ||
| 5. WHEN LLM 调用 `create_vertex` 工具并提供 label 和 properties THEN 系统应创建新顶点并返回顶点 ID | ||
| 6. WHEN LLM 调用 `create_vertices` 工具并提供顶点列表 THEN 系统应批量创建顶点并返回成功/失败的详细结果 | ||
| 7. WHEN 创建顶点时 label 不存在于 schema 中 THEN 系统应返回错误信息,提示 label 不存在 | ||
| 8. WHEN 创建顶点时 properties 类型与 schema 定义不匹配 THEN 系统应返回验证错误并说明期望的类型 | ||
| 9. WHEN LLM 调用 `update_vertex` 工具并提供顶点 ID 和新的 properties THEN 系统应更新顶点属性并返回更新后的顶点信息 | ||
| 10. WHEN 更新顶点时提供的 property key 不存在于 schema THEN 系统应返回错误信息 | ||
| 11. WHEN LLM 调用 `delete_vertex` 工具并提供顶点 ID THEN 系统应删除该顶点及其所有关联的边,并返回删除确认 | ||
| 12. WHEN 删除的顶点不存在 THEN 系统应返回 404 错误信息 | ||
|
|
||
| ### 需求 3: 边(Edge)的 CRUD 操作 | ||
|
|
There was a problem hiding this comment.
Requirement 2 (Vertex CRUD) Acceptance Criterion 4 states 'WHEN 查询顶点的结果超过默认限制(如 100 个)' without clearly defining when and how pagination should be triggered. Is this limit hard-enforced, or advisory? Who decides the 'default limit'? Clarify this to ensure consistent implementation and testing.
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: .spec/hugegraph-mcp/graph_mcp/requirements.md#L30-L45
Potential issue: Requirement 2 (Vertex CRUD) Acceptance Criterion 4 states 'WHEN
查询顶点的结果超过默认限制(如 100 个)' without clearly defining when and how pagination should be
triggered. Is this limit hard-enforced, or advisory? Who decides the 'default limit'?
Clarify this to ensure consistent implementation and testing.
Did we get this right? 👍 / 👎 to inform future reviews.
| 5. WHEN 返回样本数据时数据量超过限制 THEN 系统应在返回结果中包含说明信息,告知这只是部分数据 | ||
|
|
||
| ### 需求 6: 自然语言转 Gremlin 查询 | ||
|
|
||
| **用户故事:** 作为 LLM 用户,我希望能够使用自然语言描述查询需求,系统自动生成对应的 Gremlin 查询,以便更便捷地操作图数据库。 | ||
|
|
||
| #### 验收标准 | ||
|
|
||
| 1. WHEN LLM 调用 `text_to_gremlin` 工具并提供自然语言查询 THEN 系统应调用现有的 text2gremlin flow 生成 Gremlin 查询 | ||
| 2. WHEN 生成 Gremlin 查询 THEN 系统应同时返回模板化的查询和原始查询 | ||
| 3. WHEN LLM 通过 MCP Resource 访问 `hugegraph://gremlin-examples` THEN 系统应返回预定义的 Gremlin 查询示例 | ||
| 4. WHEN 自然语言查询模糊或无法转换 THEN 系统应返回可能的查询建议或要求用户澄清 | ||
| 5. WHEN 生成的 Gremlin 查询可能修改数据 THEN 系统应在返回结果中添加警告信息 | ||
|
|
There was a problem hiding this comment.
Requirement 6 (Natural Language to Gremlin) Acceptance Criterion 4 mentions 'WHEN 自然语言查询模糊或无法转换 THEN 系统应返回可能的查询建议或要求用户澄清' but doesn't specify how the system should rank or present suggestions. Without clear criteria, this acceptance standard is difficult to test. Recommend defining: (1) how suggestions are generated and ranked, (2) how many suggestions should be returned, and (3) confidence thresholds.
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: .spec/hugegraph-mcp/graph_mcp/requirements.md#L85-L98
Potential issue: Requirement 6 (Natural Language to Gremlin) Acceptance Criterion 4
mentions 'WHEN 自然语言查询模糊或无法转换 THEN 系统应返回可能的查询建议或要求用户澄清' but doesn't specify how the
system should rank or present suggestions. Without clear criteria, this acceptance
standard is difficult to test. Recommend defining: (1) how suggestions are generated and
ranked, (2) how many suggestions should be returned, and (3) confidence thresholds.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Code Review
This pull request renames the spec directory to .spec and adds two new documents: one for MCP server requirements and another defining rules for an AI agent. The changes to .gitignore correctly accommodate the directory rename. The new requirements document for the HugeGraph MCP Server is very detailed and well-structured. I've added a few suggestions to further refine the requirements, particularly around pagination and configurability, to ensure the resulting implementation is robust and flexible. The AI agent rules document is also comprehensive.
| 3. WHEN LLM 调用 `query_vertices` 工具并提供查询条件(label, property filters) THEN 系统应返回符合条件的顶点列表 | ||
| 4. WHEN 查询顶点的结果超过默认限制(如 100 个) THEN 系统应自动分页并返回前 N 个结果及提示信息 |
There was a problem hiding this comment.
The acceptance criterion for query_vertices mentions pagination but doesn't specify the mechanism for retrieving subsequent pages (e.g., using page numbers, offsets, or cursors). To make the requirement clearer for implementation, it would be beneficial to define how a client can request the next page of results.
|
|
||
| 1. WHEN LLM 调用 `get_edge` 工具并提供边 ID THEN 系统应返回该边的完整信息(包括 label, source, target 和 properties) | ||
| 2. WHEN LLM 调用 `get_edges` 工具并提供多个边 ID THEN 系统应批量返回这些边的信息 | ||
| 3. WHEN LLM 调用 `query_edges` 工具并提供查询条件(label, source_id, target_id, property filters) THEN 系统应返回符合条件的边列表 |
There was a problem hiding this comment.
| 1. WHEN LLM 调用 `shortest_path` 工具并提供起始顶点 ID 和目标顶点 ID THEN 系统应返回最短路径(包括路径上的顶点和边) | ||
| 2. WHEN 两个顶点之间不存在路径 THEN 系统应返回空路径并说明原因 | ||
| 3. WHEN LLM 调用 `k_neighbor` 工具并提供顶点 ID 和深度 k THEN 系统应返回 k 度邻居的所有顶点 | ||
| 4. WHEN k 值过大(如 >5) THEN 系统应限制最大深度并返回警告信息 |
There was a problem hiding this comment.
Hardcoding the maximum depth k to 5 for the k_neighbor tool might be too restrictive for some graph analysis use cases. It would be more flexible to make this limit a configurable parameter with a sensible default, rather than a fixed value. This would allow users to adjust it based on their specific needs and graph structure.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.gitignore (1)
40-42: 改进隐藏目录的可发现性注释。虽然
.gitignore规则在功能上是正确的,但将需求文档放在隐藏目录(.spec/)中可能会降低新开发者对这些文件的发现性。建议在此处添加更详细的注释,说明为什么选择隐藏目录,或者在项目的 README 或 CONTRIBUTING 指南中添加指向.spec/目录的指针。.spec/hugegraph-mcp/graph_mcp/requirements.md (1)
1-179: 添加可量化的指标和 SLA 以澄清验收标准。需求文档框架完善,但许多验收标准缺乏可量化的指标、阈值或性能预期。例如:
- 需求 3(边查询)未指定结果限制或超时值
- 需求 5(统计信息)提及"最多 10000 个顶点 ID"和"最多 200 个边 ID",但未说明这些限制的理由或强制方式
- 需求 6(自然语言转 Gremlin)未指定如何生成、排名和返回建议
建议在文档中添加一个专门的性能 SLA、结果限制和超时值章节,以使验收测试更加明确和可追踪。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.gitignore(1 hunks).spec/hugegraph-mcp/graph_mcp/requirements.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
.spec/hugegraph-mcp/graph_mcp/requirements.md
[uncategorized] ~166-~166: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:清晰"地"失败
Context: ...达到核心功能代码的 80% 以上 5. WHEN 测试失败 THEN 应输出清晰的失败原因和堆栈信息 ### 需求 12: 文档和示例 用户故事: 作...
(wb4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
🔇 Additional comments (1)
.spec/hugegraph-mcp/graph_mcp/requirements.md (1)
126-140: 验证 MCP 协议规范版本与实现的一致性。需求文档参考了 MCP 协议的各个方面(工具、资源、STDIO、HTTP 等)。最新的 MCP 规范版本发布于 2025 年 6 月 18 日,重点是结构化工具输出、基于 OAuth 的授权、用户交互相关的引发机制和改进的安全最佳实践。建议确保实现符合最新的 MCP 规范,特别是在安全和授权方面。如果项目使用较旧的 MCP 版本,请在需求文档中明确指定版本号,以避免与新客户端的兼容性问题。
| 2. WHEN 运行集成测试 THEN 系统应能够连接真实的 HugeGraph 实例并执行端到端测试 | ||
| 3. WHEN 使用 @modelcontextprotocol/inspector 工具 THEN 应能够成功连接并测试所有工具 | ||
| 4. WHEN 执行测试时 THEN 测试覆盖率应达到核心功能代码的 80% 以上 | ||
| 5. WHEN 测试失败 THEN 应输出清晰的失败原因和堆栈信息 |
There was a problem hiding this comment.
修正语法错误。
第 166 行:"应输出清晰的失败原因" 应改为"应清晰地输出失败原因"或"应输出失败原因"。根据中文语法,动词的修饰需要使用副词加"地"的形式,或简化表述。
建议修改为:
5. WHEN 测试失败 THEN 应清晰地输出失败原因和堆栈信息
或
5. WHEN 测试失败 THEN 应输出失败原因和堆栈信息(必须清晰)
以下是建议的修复:
-5. WHEN 测试失败 THEN 应输出清晰的失败原因和堆栈信息
+5. WHEN 测试失败 THEN 应清晰地输出失败原因和堆栈信息🧰 Tools
🪛 LanguageTool
[uncategorized] ~166-~166: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:清晰"地"失败
Context: ...达到核心功能代码的 80% 以上 5. WHEN 测试失败 THEN 应输出清晰的失败原因和堆栈信息 ### 需求 12: 文档和示例 用户故事: 作...
(wb4)
🤖 Prompt for AI Agents
In .spec/hugegraph-mcp/graph_mcp/requirements.md around line 166, the Chinese
phrasing "应输出清晰的失败原因" is grammatically incorrect; update the sentence to use an
adverbial form or simplify it — for example change it to "5. WHEN 测试失败 THEN
应清晰地输出失败原因和堆栈信息" or "5. WHEN 测试失败 THEN 应输出失败原因和堆栈信息(必须清晰)" so the verb is
properly modified.
735db9e to
99b6922
Compare
Updated configuration instructions and file paths for MCP.
7caa2c9 to
f8d2c5c
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
hugegraph-python-client/src/pyhugegraph/utils/log.py (1)
120-132: 建议补充目录创建失败时的警告日志。虽然之前的审查评论标记为"已解决",但当前代码在日志目录创建失败时仍然没有记录任何警告信息。这会导致以下问题:
- 用户期望文件日志但不知道实际失败了
- 降低了可观测性和调试体验
- 在受限环境(如只读文件系统、容器化部署)中,静默回退使问题难以诊断
建议在捕获 OSError 后记录警告:
try: os.makedirs(os.path.dirname(log_filename), exist_ok=True) except OSError: # If we can't create the log directory (e.g., read-only filesystem), # fall back to console-only logging + warning_msg = f"Failed to create log directory for {log_filename}. Falling back to console-only logging." if stdout_logging: # Already have console handler, just skip file logging - pass + log_instance.warning(warning_msg) else: # No stdout logging and can't create file - create a null handler + import warnings + warnings.warn(warning_msg, RuntimeWarning) null_handler = logging.NullHandler() log_instance.addHandler(null_handler) return log_instancehugegraph-mcp/hugegraph_mcp/server.py (1)
65-70: 全局logging.disable(logging.CRITICAL)过于激进,建议仅抑制特定 logger这里直接调用
logging.disable(logging.CRITICAL)会让进程中所有 logger 完全静默,包括未来可能在同一进程中运行的其他代码或依赖,调试体验会比较差。这一点与之前的 review 反馈一致。可以只针对 FastMCP/HugeGraph 相关 logger 提升级别,同时确保日志输出走 stderr 而不是 stdout,例如:
-logging.disable(logging.CRITICAL) +# Suppress noisy info-level logs from FastMCP / HugeGraph MCP so stdout stays JSON-only +for name in ("fastmcp", "hugegraph_mcp"): + logger = logging.getLogger(name) + logger.setLevel(logging.ERROR) + logger.propagate = False这样既避免干扰 MCP JSON 协议的 stdout,又保留了必要的错误级别日志用于排查问题。
🧹 Nitpick comments (2)
hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (1)
23-293: Gremlin 执行与错误处理整体设计合理,仅有少量可选优化点整体结构清晰:集中在
_execute_gremlin_with_error_handling中统一封装异常到结构化结果,execute_gremlin_read/execute_gremlin_write再做轻量转换,行为和 tests 描述一致,看不出逻辑问题或异常分支遗漏 👍可以考虑的后续优化(非阻塞,仅建议):
_WRITE_KEYWORDS目前用子串匹配(例如"addV")来拦截写操作,在极端情况下如果关键字出现在常量字符串或注释中,可能产生误报;如果后续确实遇到这种情况,可以再演进为更精细的正则或 token 级检测。GremlinExecutor.get_read_client/get_write_client每次都会构造新的PyHugeClient,在当前 MCP 使用场景下没问题;如果之后要在长生命周期、高 QPS 场景下复用同一进程,可以考虑在 executor 内缓存 client 实例,并在出错时重建,以减少连接开销。hugegraph-mcp/README.md (1)
14-16: 为架构示意代码块补充语言标识以消除 markdownlint 报警第 14 行的 fenced code block 只有 ``` 没有指定语言,触发了 markdownlint MD040。可以改成类似:
```text IDE/Claude Desktop → MCP Protocol → HugeGraph MCP Server → HugeGraph Database这样既不影响渲染,又能通过 markdownlint 检查。 </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 788f3634302ded071c04566d7524d0cba5e4a91d and d385ceed702b9d10515576d549e2633b2cbfe836. </details> <details> <summary>📒 Files selected for processing (12)</summary> * `.gitignore` (1 hunks) * `hugegraph-mcp/README.md` (1 hunks) * `hugegraph-mcp/hugegraph_mcp/gremlin_tools.py` (1 hunks) * `hugegraph-mcp/hugegraph_mcp/schema_tools.py` (1 hunks) * `hugegraph-mcp/hugegraph_mcp/server.py` (1 hunks) * `hugegraph-mcp/pyproject.toml` (1 hunks) * `hugegraph-mcp/tests/test_error_handling.py` (1 hunks) * `hugegraph-mcp/tests/test_execute_gremlin_write.py` (1 hunks) * `hugegraph-mcp/tests/test_execute_schema_operations.py` (1 hunks) * `hugegraph-mcp/tests/test_get_live_schema.py` (1 hunks) * `hugegraph-mcp/tests/test_readonly_mode.py` (1 hunks) * `hugegraph-python-client/src/pyhugegraph/utils/log.py` (1 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (5)</summary> * hugegraph-mcp/tests/test_readonly_mode.py * hugegraph-mcp/tests/test_get_live_schema.py * hugegraph-mcp/tests/test_execute_gremlin_write.py * .gitignore * hugegraph-mcp/pyproject.toml </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (4)</summary> <details> <summary>📚 Learning: 2025-09-16T06:40:44.968Z</summary>Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/utils/**/*.py : Place utilities, logging, and decorators under src/hugegraph_llm/utils/**Applied to files:** - `hugegraph-python-client/src/pyhugegraph/utils/log.py` </details> <details> <summary>📚 Learning: 2025-09-16T06:40:44.968Z</summary>Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/tests/**/*.py : Place unit tests under src/tests/ and ensure they are discoverable by unittest/pytest**Applied to files:** - `hugegraph-mcp/tests/test_error_handling.py` </details> <details> <summary>📚 Learning: 2025-09-16T06:40:44.968Z</summary>Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py : Maintain the Text2Gremlin pipeline in src/hugegraph_llm/operators/gremlin_generate_task.py**Applied to files:** - `hugegraph-mcp/hugegraph_mcp/gremlin_tools.py` </details> <details> <summary>📚 Learning: 2025-09-16T06:40:44.968Z</summary>Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Type-check Python code with mypy**Applied to files:** - `hugegraph-mcp/hugegraph_mcp/gremlin_tools.py` </details> </details><details> <summary>🧬 Code graph analysis (4)</summary> <details> <summary>hugegraph-mcp/hugegraph_mcp/server.py (2)</summary><blockquote> <details> <summary>hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (2)</summary> * `execute_gremlin_read` (235-261) * `execute_gremlin_write` (264-293) </details> <details> <summary>hugegraph-mcp/hugegraph_mcp/schema_tools.py (2)</summary> * `execute_schema_operations` (245-261) * `get_live_schema` (108-134) </details> </blockquote></details> <details> <summary>hugegraph-mcp/tests/test_execute_schema_operations.py (2)</summary><blockquote> <details> <summary>hugegraph-python-client/src/pyhugegraph/structure/response_data.py (1)</summary> * `result` (34-35) </details> <details> <summary>hugegraph-mcp/hugegraph_mcp/schema_tools.py (1)</summary> * `execute_schema_operations` (245-261) </details> </blockquote></details> <details> <summary>hugegraph-mcp/tests/test_error_handling.py (3)</summary><blockquote> <details> <summary>hugegraph-mcp/tests/test_execute_gremlin_write.py (1)</summary> * `exec` (29-31) </details> <details> <summary>hugegraph-mcp/tests/test_execute_gremlin_read.py (1)</summary> * `exec` (29-32) </details> <details> <summary>hugegraph-python-client/src/pyhugegraph/structure/response_data.py (1)</summary> * `result` (34-35) </details> </blockquote></details> <details> <summary>hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (2)</summary><blockquote> <details> <summary>hugegraph-python-client/src/pyhugegraph/client.py (1)</summary> * `PyHugeClient` (49-102) </details> <details> <summary>hugegraph-mcp/hugegraph_mcp/schema_tools.py (2)</summary> * `from_env` (30-48) * `_build_client` (54-73) </details> </blockquote></details> </details><details> <summary>🪛 GitHub Actions: Ruff Code Quality</summary> <details> <summary>hugegraph-mcp/tests/test_execute_schema_operations.py</summary> [error] 1-1: Ruff format check would reformat this file. Run 'ruff format' to fix. (Command: uv run ruff format --check .) </details> <details> <summary>hugegraph-mcp/tests/test_error_handling.py</summary> [error] 1-1: Ruff format check would reformat this file. Run 'ruff format' to fix. (Command: uv run ruff format --check .) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>hugegraph-mcp/README.md</summary> 14-14: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>🔇 Additional comments (2)</summary><blockquote> <details> <summary>hugegraph-mcp/tests/test_error_handling.py (1)</summary><blockquote> `14-139`: **Fix import sorting and line length violations** The test file has solid coverage of error handling scenarios (connection failures, HTTP errors, authentication, readonly mode, validation, syntax, and successful cases). However, two Ruff issues need addressing: 1. **Import block (line 16)**: Organize imports with standard library first, then third-party, then local imports 2. **Line breaks (lines 27, 47, 77)**: Three lines exceed the configured limit and need splitting: - Line 27: `requests.exceptions.ConnectionError("Connection refused")` - Line 47: `requests.exceptions.HTTPError("Internal Server Error", response=mock_response)` - Line 77: `assert any("Check HUGEGRAPH_USER" in suggestion for suggestion in result["suggestions"])` Run `ruff check --fix hugegraph-mcp/tests/test_error_handling.py` to fix the import sorting automatically, then manually break the three long lines. <details> <summary>⛔ Skipped due to learnings</summary>Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Adhere to ruff code style for Python code</details> </blockquote></details> <details> <summary>hugegraph-mcp/tests/test_execute_schema_operations.py (1)</summary><blockquote> `22-83`: **Test coverage is sound; format the file to resolve CI checks** Both tests effectively verify: - `execute_schema_operations` correctly delegates to `_run_schema_operations` - Full success and partial failure scenarios properly preserve `success`, `errors`, and per-operation `status` fields The test logic matches the implementation exactly. To fix the CI formatting failure, run: ```shell ruff format hugegraph-mcp/tests/test_execute_schema_operations.py
| def execute_schema_operations(operations: list[dict[str, Any]]) -> dict[str, Any]: | ||
| """Execute a sequence of idempotent schema operations. | ||
|
|
||
| Behaviour covered by tests: | ||
| - Delegates actual execution to `_run_schema_operations`. | ||
| - Respects HUGEGRAPH_MCP_READONLY environment variable. | ||
| """ | ||
|
|
||
| result = _run_schema_operations(operations) | ||
|
|
||
| # Normalise result keys a bit so callers always get predictable fields. | ||
| if "errors" not in result: | ||
| result["errors"] = [] | ||
| if "success" not in result: | ||
| result["success"] = not bool(result["errors"]) | ||
|
|
||
| return result |
There was a problem hiding this comment.
execute_schema_operations 未检查 HUGEGRAPH_MCP_READONLY,可能绕过只读保护
注释里写明了该函数会“Respects HUGEGRAPH_MCP_READONLY environment variable”,但当前实现只是直接调用 _run_schema_operations 并规范化 errors/success 字段,没有任何只读检查:
result = _run_schema_operations(operations)虽然在 server.py 中只在非只读模式下注册 execute_schema_operations_tool,但:
- 该函数本身是公开 API(tests 也直接调用),
- 一旦有调用方绕过 MCP server 直接在只读环境里调用
execute_schema_operations,仍然会对 HugeGraph 发送实际的 schema 变更请求。
为保证库层也具备只读防线,建议仿照 gremlin_tools.execute_gremlin_write 在函数开头增加环境变量检查,例如:
def execute_schema_operations(operations: list[dict[str, Any]]) -> dict[str, Any]:
@@
- result = _run_schema_operations(operations)
+ readonly_env = os.getenv("HUGEGRAPH_MCP_READONLY", "").lower()
+ if readonly_env in {"1", "true", "yes"}:
+ raise PermissionError(
+ "HugeGraph MCP server is in read-only mode; schema operations are disabled"
+ )
+
+ result = _run_schema_operations(operations)这样无论是通过 MCP 工具还是直接调用函数,只要设置了只读环境变量,都不会执行任何写操作,安全语义更加一致。
🤖 Prompt for AI Agents
In hugegraph-mcp/hugegraph_mcp/schema_tools.py around lines 245 to 261, the
function execute_schema_operations currently calls _run_schema_operations
unconditionally and therefore ignores the HUGEGRAPH_MCP_READONLY env var; add an
early readonly check at the top of the function that reads
os.environ.get("HUGEGRAPH_MCP_READONLY") (or uses the existing config helper if
present), and if readonly is truthy return a normalized result dict (ensure
"errors" is a list containing a clear readonly error message and "success" is
False) without calling _run_schema_operations so no write schema changes are
performed when the env var is set.
| # 1. Patch os.makedirs to silently skip 'logs' directory creation | ||
| _original_makedirs = os.makedirs | ||
|
|
||
|
|
||
| def _safe_makedirs(name, mode=0o777, exist_ok=False): | ||
| # Silently succeed for 'logs' directory (don't actually create it) | ||
| if isinstance(name, str) and ("logs" in name or name == "logs"): | ||
| return None # Pretend success | ||
| return _original_makedirs(name, mode, exist_ok) | ||
|
|
||
|
|
||
| os.makedirs = _safe_makedirs | ||
|
|
||
| # 2. Patch RotatingFileHandler to return NullHandler for 'logs/' files | ||
| _OriginalRotatingFileHandler = RotatingFileHandler | ||
|
|
||
|
|
||
| class _NoOpFileHandler(logging.NullHandler): | ||
| """A no-op handler that silently ignores all log records (used to disable file logging in MCP).""" | ||
|
|
||
| def __init__(self, *args, **kwargs): | ||
| # Ignore all arguments (filename, maxBytes, etc.) and just create a NullHandler | ||
| super().__init__() | ||
|
|
||
|
|
||
| def _patched_rotating_handler(filename, *args, **kwargs): | ||
| # If the filename contains 'logs', disable file logging by returning a no-op handler | ||
| if "logs" in str(filename): | ||
| return _NoOpFileHandler() | ||
| return _OriginalRotatingFileHandler(filename, *args, **kwargs) | ||
|
|
||
|
|
||
| logging.handlers.RotatingFileHandler = _patched_rotating_handler |
There was a problem hiding this comment.
os.makedirs / RotatingFileHandler 的 monkey-patch 匹配过宽,可能误伤其他目录
目前 _safe_makedirs 使用:
if isinstance(name, str) and ("logs" in name or name == "logs"):
return None只要路径字符串里包含 "logs" 就会直接“假成功”,例如 "/var/www/blogs"、"/tmp/user-logs" 等路径也会被跳过,这在未来如果有其他依赖尝试创建这些目录时会比较隐蔽且难以排查。同样地,_patched_rotating_handler 里对 filename 的 "logs" 子串匹配也有类似问题。
建议将匹配收窄到真正的 logs 目录段,例如只检查最后一段路径名或 path segments,而不是任意子串。示例改法:
-from logging.handlers import RotatingFileHandler
+from logging.handlers import RotatingFileHandler
+from pathlib import Path
@@
-def _safe_makedirs(name, mode=0o777, exist_ok=False):
- # Silently succeed for 'logs' directory (don't actually create it)
- if isinstance(name, str) and ("logs" in name or name == "logs"):
- return None # Pretend success
- return _original_makedirs(name, mode, exist_ok)
+def _safe_makedirs(name, mode=0o777, exist_ok=False):
+ # Silently succeed when the last path component is exactly 'logs'
+ if isinstance(name, str) and Path(name).name == "logs":
+ return None # Pretend success
+ return _original_makedirs(name, mode, exist_ok)
@@
-def _patched_rotating_handler(filename, *args, **kwargs):
- # If the filename contains 'logs', disable file logging by returning a no-op handler
- if "logs" in str(filename):
- return _NoOpFileHandler()
+def _patched_rotating_handler(filename, *args, **kwargs):
+ # If the path contains a 'logs' directory segment, disable file logging by returning a no-op handler
+ path = Path(str(filename))
+ if "logs" in path.parts:
+ return _NoOpFileHandler()
return _OriginalRotatingFileHandler(filename, *args, **kwargs)这样既能拦截 HugeGraph 这类写入 logs/*.log 的场景,又大幅降低对无关目录的误伤概率。
🤖 Prompt for AI Agents
In hugegraph-mcp/hugegraph_mcp/server.py around lines 25 to 57, the monkey-patch
matches any path that contains the substring "logs" which can incorrectly skip
unrelated directories or handlers; change the checks to match path segments
instead: use pathlib.Path to normalize the path and check if any path part
equals "logs" (or check parent.name == "logs" / last directory segment) before
short-circuiting, and for os.makedirs ensure you forward mode and exist_ok to
the original call when not matching; apply the same segment-based check for the
RotatingFileHandler filename and only return the no-op handler when the filename
lives under a path segment named "logs".
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
hugegraph-mcp/hugegraph_mcp/schema_tools.py (1)
245-261: 缺少只读模式检查,与文档描述不符。函数注释声称"Respects HUGEGRAPH_MCP_READONLY environment variable"(line 250),但实际实现中直接调用
_run_schema_operations而没有进行任何只读检查。这与gremlin_tools.py中execute_gremlin_write的行为不一致(该函数在 lines 274-278 检查了只读模式)。虽然
server.py只在非只读模式下注册工具,但如果有调用方直接使用此函数(例如测试代码),仍可能在只读环境中执行 schema 变更操作。建议在函数开头添加只读检查:
def execute_schema_operations(operations: list[dict[str, Any]]) -> dict[str, Any]: """Execute a sequence of idempotent schema operations. Behaviour covered by tests: - Delegates actual execution to `_run_schema_operations`. - Respects HUGEGRAPH_MCP_READONLY environment variable. """ + # Enforce read-only mode at the library level + readonly_env = os.getenv("HUGEGRAPH_MCP_READONLY", "").lower() + if readonly_env in {"1", "true", "yes"}: + return { + "success": False, + "errors": [{ + "message": "HugeGraph MCP server is in read-only mode; schema operations are disabled" + }], + "results": [] + } + result = _run_schema_operations(operations)hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (1)
23-50: 配置类字段命名与 schema_tools 不一致。
HugeGraphGremlinConfig使用password和graph字段,而schema_tools.py中的HugeGraphMCPConfig使用graph_pwd和graph_name。虽然两者功能相同,但字段名不一致可能导致使用混淆。建议统一命名约定,例如都使用
graph_name和graph_password(或graph_pwd),以提高代码一致性。
🧹 Nitpick comments (2)
hugegraph-mcp/hugegraph_mcp/schema_tools.py (1)
54-73: 建议提取重复的客户端构建逻辑。此函数与
gremlin_tools.py中的GremlinExecutor._build_client()(lines 66-83)逻辑几乎完全相同。考虑提取到共享工具模块(例如client_utils.py)以减少重复代码,便于统一维护 graphspace 处理逻辑。可以创建一个共享的客户端工厂函数:
# hugegraph_mcp/client_utils.py def build_hugegraph_client( url: str, graph: str, user: str, pwd: str, graphspace: str | None = None ) -> PyHugeClient: if graphspace and graphspace.strip(): return PyHugeClient( url=url, graph=graph, user=user, pwd=pwd, graphspace=graphspace.strip() ) return PyHugeClient(url=url, graph=graph, user=user, pwd=pwd)然后在两个模块中调用此函数。
hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (1)
103-103: 写操作关键字检测可以增强。当前的
_WRITE_KEYWORDS使用简单的子字符串匹配来识别写操作。这种方法可能存在以下局限:
- 可能遗漏:如
drop().iterate()、sideEffect()等写操作- 潜在误判:如果关键字出现在字符串字面量或注释中
可以考虑:
- 扩充关键字列表,包含更多写操作(如 "sideEffect", "aggregate", "store")
- 使用更精确的解析(如 AST 分析),但这会增加复杂度
当前实现对于大多数场景已足够,可以根据实际需求逐步完善。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
hugegraph-mcp/hugegraph_mcp/gremlin_tools.py(1 hunks)hugegraph-mcp/hugegraph_mcp/schema_tools.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py : Maintain the Text2Gremlin pipeline in src/hugegraph_llm/operators/gremlin_generate_task.py
Applied to files:
hugegraph-mcp/hugegraph_mcp/gremlin_tools.py
🧬 Code graph analysis (2)
hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (2)
hugegraph-python-client/src/pyhugegraph/client.py (1)
PyHugeClient(49-102)hugegraph-mcp/tests/test_execute_gremlin_read.py (1)
exec(29-32)
hugegraph-mcp/hugegraph_mcp/schema_tools.py (6)
hugegraph-python-client/src/pyhugegraph/client.py (1)
PyHugeClient(49-102)hugegraph-python-client/src/pyhugegraph/structure/response_data.py (1)
result(34-35)hugegraph-python-client/src/pyhugegraph/api/schema_manage/property_key.py (3)
asInt(45-47)asDouble(60-62)asText(65-67)hugegraph-python-client/src/pyhugegraph/structure/edge_label_data.py (1)
frequency(50-51)hugegraph-python-client/src/pyhugegraph/structure/index_label_data.py (1)
fields(45-46)hugegraph-python-client/src/pyhugegraph/api/schema_manage/index_label.py (4)
onV(30-33)onE(36-39)by(42-48)secondary(51-53)
🔇 Additional comments (3)
hugegraph-mcp/hugegraph_mcp/schema_tools.py (1)
21-48: 配置类实现合理,已修复了之前的默认值不一致问题。
HugeGraphMCPConfig的默认值现在与from_env()和文档保持一致(http://127.0.0.1:8080),之前评审中指出的内网 IP 问题已解决。graphspace 的处理逻辑也很清晰。hugegraph-mcp/hugegraph_mcp/gremlin_tools.py (2)
85-89: 读写客户端分离设计合理,便于未来扩展。虽然当前
get_read_client()和get_write_client()返回相同的客户端实例,但这种分离为将来差异化配置(如不同的超时设置、连接池大小、甚至不同的端点)提供了良好的扩展点。
235-262: 读写操作分离清晰,安全检查到位。两个公共 API 实现良好:
execute_gremlin_read:在执行前验证查询不包含写操作关键字,防止误用execute_gremlin_write:强制执行只读模式检查,确保在受保护环境中不会执行写操作返回格式统一,错误处理完善,符合预期的安全语义。
Also applies to: 264-293
| except requests.exceptions.ConnectionError: | ||
| return { | ||
| "success": False, | ||
| "error_type": "connection_error", | ||
| "message": f"Cannot connect to HugeGraph server at " | ||
| f"{client._url if hasattr(client, '_url') else 'unknown address'}", | ||
| "suggestions": [ | ||
| "Check if HugeGraph server is running", | ||
| "Verify the HUGEGRAPH_URL environment variable", | ||
| "Check network connectivity to the server", | ||
| ], | ||
| "duration_ms": (time.time() - start) * 1000.0, | ||
| "operation_type": operation_type, | ||
| } |
There was a problem hiding this comment.
避免访问客户端的私有属性。
Line 144 中通过 client._url 访问客户端的内部属性来获取 URL 信息。这种做法依赖于 PyHugeClient 或 GremlinManager 的内部实现细节,可能导致以下问题:
- 如果上游库修改内部结构,代码会失效
- 降低了代码的健壮性
建议使用更安全的方式获取 URL:
- f"{client._url if hasattr(client, '_url') else 'unknown address'}",
+ f"{_cfg.url}/gremlin",直接使用配置中的 URL 更加可靠且不依赖内部实现。
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In hugegraph-mcp/hugegraph_mcp/gremlin_tools.py around lines 139 to 152, avoid
accessing the client's private attribute client._url; instead obtain the server
URL from a public source (e.g., a public client attribute if available, the
configuration object/parameter used to create the client, or the HUGEGRAPH_URL
environment variable) and use that value in the error message, falling back to
'unknown address' if none is available; update the code to read the URL from
those safe sources and, if you read the environment, ensure os is imported at
the top.
Reformat 4 test files to pass Ruff Code Quality CI: - tests/test_error_handling.py - tests/test_execute_gremlin_read.py - tests/test_execute_gremlin_write.py - tests/test_execute_schema_operations.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hugegraph-mcp/tests/test_execute_schema_operations.py (1)
14-19: sys.path 操作方式可接受,但建议后续考虑改进。当前使用
sys.path.insert来添加项目根目录是一种常见的测试配置方式。如果项目配置了pyproject.toml或setup.py的可编辑安装(pip install -e .),则可以通过 pytest 直接发现模块,从而移除这段路径操作代码。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hugegraph-mcp/tests/test_execute_schema_operations.py` around lines 14 - 19, Remove the ad-hoc sys.path manipulation by relying on proper package discovery; specifically replace the block that defines PROJECT_ROOT and calls sys.path.insert (the variables PROJECT_ROOT and the call sys.path.insert(0, str(PROJECT_ROOT))) by ensuring the project is installed in editable mode (pip install -e .) or configure pytest to use the package (pyproject.toml/setup.py), then delete the import-time path injection from tests/test_execute_schema_operations.py so imports resolve via normal package discovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hugegraph-mcp/tests/test_execute_schema_operations.py`:
- Around line 14-19: Remove the ad-hoc sys.path manipulation by relying on
proper package discovery; specifically replace the block that defines
PROJECT_ROOT and calls sys.path.insert (the variables PROJECT_ROOT and the call
sys.path.insert(0, str(PROJECT_ROOT))) by ensuring the project is installed in
editable mode (pip install -e .) or configure pytest to use the package
(pyproject.toml/setup.py), then delete the import-time path injection from
tests/test_execute_schema_operations.py so imports resolve via normal package
discovery.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 57d45e6b-b9f0-4a7a-8c6e-2db5957f09d4
📒 Files selected for processing (4)
hugegraph-mcp/tests/test_error_handling.pyhugegraph-mcp/tests/test_execute_gremlin_read.pyhugegraph-mcp/tests/test_execute_gremlin_write.pyhugegraph-mcp/tests/test_execute_schema_operations.py
🚧 Files skipped from review as they are similar to previous changes (3)
- hugegraph-mcp/tests/test_execute_gremlin_write.py
- hugegraph-mcp/tests/test_execute_gremlin_read.py
- hugegraph-mcp/tests/test_error_handling.py
- Add design_schema() function in schema_tools.py with best practices documentation - Add design_schema_tool() MCP tool wrapper in server.py - Update README.md with new feature description - Include usage guidelines: when to use, when not to use, and workflow
rename spec folder & add mcp requirements
Summary by CodeRabbit
新功能
文档
测试
修复
杂项