fix: backend error serialization and add HTTP status code detection #4786

tzulingk · 2025-12-06T07:05:07Z

Overview:

Fixes backend error handling so clients receive actual TensorRT-LLM error messages instead of cryptic deserialization errors. When backends encounter errors (e.g., prompt exceeds max_seq_len), clients now see the real error message with proper HTTP status codes.

Details:

Problem:

Clients received "invalid type: unit variant, expected newtype variant" instead of actual backend error messages
Root cause: Python was sending {"finish_reason": "error"} (unit variant), but Rust expects {"finish_reason": {"error": "msg"}} (newtype variant) for FinishReason::Error(String)
Streaming requests had no error detection at all

Changes:

Python Handler (components/src/dynamo/trtllm/request_handlers/handler_base.py):
- Changed error responses from {"finish_reason": "error"} to {"finish_reason": {"error": error_msg}}
- Properly serializes into Rust's FinishReason::Error(String) enum variant
- Applied to both RequestError and general Exception catch blocks
Rust Error Detection (lib/llm/src/http/service/openai.rs):
- Added extract_backend_error_if_present() function that detects errors via multiple methods:
  - event: "error" field (from postprocessor)
  - code >= 400 in data/comment payloads
  - Comments with no data (legacy)
- Added check_for_backend_error() call for streaming requests (previously only non-streaming)
- Extracts HTTP status codes (400, 404, 500, etc.) from error payloads
- Added 5 unit tests covering different error scenarios

Testing:

# All tests pass
cargo test --package dynamo-llm --lib http::service::openai::tests::test_check_for_backend_error

Run

# run frontend
python3 -m dynamo.frontend --router-mode kv --http-port 8080 >& tzuling_frontend.log &
# run worker
python3 -m dynamo.trtllm \
  --model-path "$MODEL_PATH" \
  --served-model-name "$SERVED_MODEL_NAME" \
  --extra-engine-args "$AGG_ENGINE_ARGS" \
  --max-seq-len 20 \
  --max-num-tokens 100 \
  --publish-events-and-metrics >& tzuling_backend.log &

Send prompt > max_seq_len request
curl -s localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-0.6B",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello! It’s a wonderful day. How are you today? Have you had breakfast?"
    }
  ],
  "stream": false,
  "max_tokens": 300
}'

Before: {"message":"invalid type: unit variant...","code":500}
After: {"message":"The sum of prompt length (79.0)... should not exceed max_num_tokens (23)","code":500}

Where should the reviewer start?

handler_base.py lines 371-378, 390-395: Python error response format changes - verify the {"finish_reason": {"error": error_msg}} format is correct
openai.rs lines 723-797: The extract_backend_error_if_present() function - verify the error detection logic covers all cases
openai.rs lines 919-923: Error checking added for streaming requests - ensure this doesn't impact performance
openai.rs lines 2139-2221: Unit tests - verify they cover the main error scenarios

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-967

Summary by CodeRabbit

Bug Fixes
- Enhanced error response structure to include detailed error messages instead of generic error labels.
- Improved backend error detection in streaming chat completions, enabling early error identification and response handling.
- Strengthened error handling for OpenAI chat completion streams to prevent incomplete or malformed responses.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…r error responses Signed-off-by: [email protected] <[email protected]>

coderabbitai · 2025-12-06T07:08:24Z

Walkthrough

Changes error handling behavior in TRT-LLM request handler to structure error messages as dictionaries with an "error" key, and adds backend error detection to the OpenAI service's streaming chat completion handler to catch and propagate errors early from upstream services.

Changes

Cohort / File(s)	Summary
TRT-LLM Error Response Restructuring `components/src/dynamo/trtllm/request_handlers/handler_base.py`	Modified error response payloads to emit `finish_reason` as `{"error": <error_msg>}` dictionary instead of plain string "error". Captures exception message as string in `error_msg` variable and propagates it through both per-request and final error response paths.
OpenAI Service Backend Error Detection `lib/llm/src/http/service/openai.rs`	Added `extract_backend_error_if_present()` function to detect errors from Annotated events. Added `check_for_backend_error()` async function to inspect stream's first event and reconstruct stream if valid. Integrated pre-check into chat completions handler before streaming. Imported `NvCreateChatCompletionStreamResponse` type. Added comprehensive test suite for error detection variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Stream reconstruction logic — Verify that check_for_backend_error() correctly preserves all stream events while enabling early error detection
Error extraction across event types — Confirm extract_backend_error_if_present() handles Annotated event wrappers, JSON-encoded payloads, and status code mappings consistently
Test coverage — Ensure tests adequately exercise error event detection, JSON parsing, empty stream edge cases, and the integrated flow in chat completions handler
Cross-language impact — Error format change in Python handler may affect downstream consumers expecting the original string format

Poem

🐰 Errors once hidden in plain sight,
Now wrapped in dictionaries, structured tight!
Streams flow safer, faults detected fast,
Backend whispers caught before they pass.
Hop, hop! We've caught them at the gate! ✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: fixing backend error serialization in Python handlers and adding HTTP status code detection in Rust error handling.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering the overview, detailed problem statement, changes made to both Python and Rust components, testing instructions, and specific file references for reviewers.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

lib/llm/src/http/service/openai.rs (3)
811-811: Remove redundant import.

StreamExt is already imported at file level (line 28: use futures::{StreamExt, stream};).
-    use futures::stream::StreamExt;
-
     // Peek at the first event
830-833: Consider returning an error for empty streams instead of silently succeeding.

An empty stream from the backend likely indicates a problem (e.g., backend crashed before sending any data). Currently this returns Ok with an empty stream, which may mask issues. The downstream code will eventually fail when trying to fold/aggregate an empty response, but the error message won't indicate the backend returned nothing.
     } else {
-        // Empty stream - this shouldn't happen but handle gracefully
-        Ok(futures::stream::iter(vec![]).chain(stream))
+        // Empty stream - this indicates a backend issue
+        Err((
+            StatusCode::BAD_GATEWAY,
+            Json(ErrorMessage {
+                message: "Backend returned empty response stream".to_string(),
+                error_type: map_error_code_to_error_type(StatusCode::BAD_GATEWAY),
+                code: StatusCode::BAD_GATEWAY.as_u16(),
+            }),
+        ))
     }
2183-2312: Good test coverage for core error detection scenarios.

The tests cover the main paths through check_for_backend_error and extract_backend_error_if_present.

Consider adding tests for:

Data payload containing {"code": 400, "message": "..."} (strategy 2 detection)

Comment containing valid JSON with code < 400 (should NOT be treated as error)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 046229f and 4d52397.

📒 Files selected for processing (2)

components/src/dynamo/trtllm/request_handlers/handler_base.py (2 hunks)
lib/llm/src/http/service/openai.rs (6 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-11T03:24:47.820Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

lib/llm/src/http/service/openai.rs

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/http/service/openai.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/http/service/openai.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: trtllm (arm64)
GitHub Check: sglang (amd64)
GitHub Check: sglang (arm64)
GitHub Check: operator (amd64)
GitHub Check: operator (arm64)
GitHub Check: vllm (arm64)

🔇 Additional comments (6)

components/src/dynamo/trtllm/request_handlers/handler_base.py (2)

371-377: LGTM! Error serialization format correctly updated for Rust compatibility.

The change from "finish_reason": "error" to "finish_reason": {"error": error_msg} properly serializes into Rust's FinishReason::Error(String) newtype variant, resolving the deserialization error described in the PR objectives.

388-395: LGTM! Consistent error format applied to general exception handler.

The error structure matches the RequestError path, ensuring uniform error serialization across all error scenarios.

lib/llm/src/http/service/openai.rs (4)

44-47: LGTM! Import added for stream type used in error checking.

The NvCreateChatCompletionStreamResponse import is necessary for the new check_for_backend_error function signature.

940-944: LGTM! Backend error check properly integrated before streaming begins.

The early error detection ensures clients receive meaningful error messages rather than deserialization failures. Logging includes request_id for traceability.

978-993: LGTM! Error message updated for clarity.

The change from "fold" to "parse" better describes the operation and aligns with the function name from_annotated_stream.

791-794: Remove the concern about overly aggressive comment-only fallback.

The fallback at lines 791-794 is well-guarded and intentional. It only triggers after confirming: (1) comments exist and are non-empty, (2) the attempt to parse comments as error JSON failed, and (3) event.data is None. This combination—comments without data on the first event—legitimately indicates an error condition rather than a keep-alive comment. Legitimate SSE implementations would send data alongside keep-alive or metadata comments.

Signed-off-by: [email protected] <[email protected]>

Fix backend error serialization and add HTTP status code detection fo…

4d52397

…r error responses Signed-off-by: [email protected] <[email protected]>

tzulingk requested review from a team as code owners December 6, 2025 07:05

pull-request-size bot added the size/L label Dec 6, 2025

github-actions bot added the fix label Dec 6, 2025

tzulingk requested review from keivenchang, kthui and tanmayv25 December 6, 2025 07:07

tzulingk enabled auto-merge (squash) December 6, 2025 07:07

coderabbitai bot reviewed Dec 6, 2025

View reviewed changes

keivenchang approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into tzulingk/parse

89ef073

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 21:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 22:14 Inactive

tzulingk added 2 commits December 8, 2025 21:03

format, collaspe the if branches

773834b

Signed-off-by: [email protected] <[email protected]>

comments

8bb65dd

Signed-off-by: [email protected] <[email protected]>

copy-pr-bot bot had a problem deploying to GITLAB December 9, 2025 05:03 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: backend error serialization and add HTTP status code detection #4786

fix: backend error serialization and add HTTP status code detection #4786

tzulingk commented Dec 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 6, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: backend error serialization and add HTTP status code detection #4786

Are you sure you want to change the base?

fix: backend error serialization and add HTTP status code detection #4786

Conversation

tzulingk commented Dec 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 6, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tzulingk commented Dec 6, 2025 •

edited by coderabbitai bot

Loading