fix(graph): improve property graph JSON parsing robustness for LLM outputs by linmengmeng-1314 · Pull Request #332 · apache/hugegraph-ai

linmengmeng-1314 · 2026-05-18T11:57:36Z

Summary

Improve _extract_and_filter_label to handle varying LLM output formats
Strip markdown code blocks before JSON extraction
Support both {"vertices":[...], "edges":[...]} (object) and flat array formats
Auto-convert flat arrays to the expected object structure

Problem

When using reasoning models (e.g., DeepSeek V4) for graph extraction, the LLM may return:

JSON wrapped in markdown code blocks (\``json ... ```), which breaks the greedy regex ({.*})`
A flat array [vertex, edge, ...] instead of the expected object {"vertices": [...], "edges": [...]}

Both cases cause json.JSONDecodeError and result in empty extraction output even though the LLM correctly identified entities and relationships.

Solution

Strip markdown code fences (\``json/````) before regex matching
Update regex to match both objects ({...}) and arrays ([...])
When a flat array is detected, partition items by type field into vertices and edges

Test plan

Test with OpenAI models (existing behavior should be preserved)
Test with DeepSeek models (markdown-wrapped array format)
Test with Ollama models
Verify both object and array formats are handled correctly

🤖 Generated with Claude Code

…tputs Different LLMs return graph extraction results in varying formats: - Some wrap JSON in markdown code blocks (```json ... ```) - Some return a flat array of vertices/edges instead of a structured object This causes json.JSONDecodeError when the greedy regex ({.*}) captures invalid content from markdown-wrapped or array-formatted responses. Changes: - Strip markdown code blocks before JSON extraction - Support both object ({...}) and array ([...]) JSON formats - Auto-convert flat arrays to {"vertices": [...], "edges": [...]} format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels May 18, 2026

linmengmeng-1314 mentioned this pull request May 18, 2026

LLM output format compatibility issues with reasoning models (DeepSeek, etc.) #333

Open

github-actions Bot added the llm label May 18, 2026

imbajin requested a review from Copilot May 18, 2026 13:31

Copilot started reviewing on behalf of imbajin May 18, 2026 13:31 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(graph): improve property graph JSON parsing robustness for LLM outputs#332

fix(graph): improve property graph JSON parsing robustness for LLM outputs#332
linmengmeng-1314 wants to merge 1 commit into
apache:mainfrom
linmengmeng-1314:fix/graph-extract-json-parsing

linmengmeng-1314 commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

linmengmeng-1314 commented May 18, 2026

Summary

Problem

Solution

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants