Skip to content

LLM output format compatibility issues with reasoning models (DeepSeek, etc.) #333

@linmengmeng-1314

Description

@linmengmeng-1314

Description

When using reasoning models like DeepSeek V4 for HugeGraph-LLM RAG pipeline, several components fail due to LLM output format incompatibility. Reasoning models tend to wrap outputs in markdown code blocks or use different JSON structures than what the parsers expect.

Affected Components

1. Uvicorn reload=True in Docker deployment

  • File: hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py:205
  • reload=True enables file watching which conflicts with Docker bind mounts, causing the service to hang
  • Suggestion: Make reload configurable via environment variable (disable by default in production)

2. Keyword extraction parser incompatible with reasoning model output

  • File: hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract.py:146-180
  • _extract_keywords_from_response expects KEYWORDS: word:score format, but reasoning models may wrap output in markdown or return different structures
  • Workaround: Use KEYWORD_EXTRACT_TYPE=textrank instead of llm
  • Suggestion: Improve parser robustness similar to the fix in fix(llm): improve graph JSON parsing robustness for LLM outputs #332 for graph extraction

Related PRs

Environment

  • HugeGraph Server: 1.7.0
  • LLM: DeepSeek V4 Pro (via OpenAI-compatible API)
  • Embedding: SiliconFlow BGE-M3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions