-
Notifications
You must be signed in to change notification settings - Fork 728
fix: backend error serialization and add HTTP status code detection #4786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…r error responses Signed-off-by: [email protected] <[email protected]>
WalkthroughChanges error handling behavior in TRT-LLM request handler to structure error messages as dictionaries with an "error" key, and adds backend error detection to the OpenAI service's streaming chat completion handler to catch and propagate errors early from upstream services. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/llm/src/http/service/openai.rs (3)
811-811: Remove redundant import.
StreamExtis already imported at file level (line 28:use futures::{StreamExt, stream};).- use futures::stream::StreamExt; - // Peek at the first event
830-833: Consider returning an error for empty streams instead of silently succeeding.An empty stream from the backend likely indicates a problem (e.g., backend crashed before sending any data). Currently this returns
Okwith an empty stream, which may mask issues. The downstream code will eventually fail when trying to fold/aggregate an empty response, but the error message won't indicate the backend returned nothing.} else { - // Empty stream - this shouldn't happen but handle gracefully - Ok(futures::stream::iter(vec![]).chain(stream)) + // Empty stream - this indicates a backend issue + Err(( + StatusCode::BAD_GATEWAY, + Json(ErrorMessage { + message: "Backend returned empty response stream".to_string(), + error_type: map_error_code_to_error_type(StatusCode::BAD_GATEWAY), + code: StatusCode::BAD_GATEWAY.as_u16(), + }), + )) }
2183-2312: Good test coverage for core error detection scenarios.The tests cover the main paths through
check_for_backend_errorandextract_backend_error_if_present.Consider adding tests for:
- Data payload containing
{"code": 400, "message": "..."}(strategy 2 detection)- Comment containing valid JSON with
code < 400(should NOT be treated as error)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py(2 hunks)lib/llm/src/http/service/openai.rs(6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-11T03:24:47.820Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.
Applied to files:
lib/llm/src/http/service/openai.rs
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.
Applied to files:
lib/llm/src/http/service/openai.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/src/http/service/openai.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (arm64)
🔇 Additional comments (6)
components/src/dynamo/trtllm/request_handlers/handler_base.py (2)
371-377: LGTM! Error serialization format correctly updated for Rust compatibility.The change from
"finish_reason": "error"to"finish_reason": {"error": error_msg}properly serializes into Rust'sFinishReason::Error(String)newtype variant, resolving the deserialization error described in the PR objectives.
388-395: LGTM! Consistent error format applied to general exception handler.The error structure matches the
RequestErrorpath, ensuring uniform error serialization across all error scenarios.lib/llm/src/http/service/openai.rs (4)
44-47: LGTM! Import added for stream type used in error checking.The
NvCreateChatCompletionStreamResponseimport is necessary for the newcheck_for_backend_errorfunction signature.
940-944: LGTM! Backend error check properly integrated before streaming begins.The early error detection ensures clients receive meaningful error messages rather than deserialization failures. Logging includes
request_idfor traceability.
978-993: LGTM! Error message updated for clarity.The change from "fold" to "parse" better describes the operation and aligns with the function name
from_annotated_stream.
791-794: Remove the concern about overly aggressive comment-only fallback.The fallback at lines 791-794 is well-guarded and intentional. It only triggers after confirming: (1) comments exist and are non-empty, (2) the attempt to parse comments as error JSON failed, and (3)
event.dataisNone. This combination—comments without data on the first event—legitimately indicates an error condition rather than a keep-alive comment. Legitimate SSE implementations would send data alongside keep-alive or metadata comments.
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Overview:
Fixes backend error handling so clients receive actual TensorRT-LLM error messages instead of cryptic deserialization errors. When backends encounter errors (e.g., prompt exceeds max_seq_len), clients now see the real error message with proper HTTP status codes.
Details:
Problem:
"invalid type: unit variant, expected newtype variant"instead of actual backend error messages{"finish_reason": "error"}(unit variant), but Rust expects{"finish_reason": {"error": "msg"}}(newtype variant) forFinishReason::Error(String)Changes:
Python Handler (
components/src/dynamo/trtllm/request_handlers/handler_base.py):{"finish_reason": "error"}to{"finish_reason": {"error": error_msg}}FinishReason::Error(String)enum variantRequestErrorand generalExceptioncatch blocksRust Error Detection (
lib/llm/src/http/service/openai.rs):extract_backend_error_if_present()function that detects errors via multiple methods:event: "error"field (from postprocessor)code >= 400in data/comment payloadscheck_for_backend_error()call for streaming requests (previously only non-streaming)Testing:
Run
Before:
{"message":"invalid type: unit variant...","code":500}After:
{"message":"The sum of prompt length (79.0)... should not exceed max_num_tokens (23)","code":500}Where should the reviewer start?
handler_base.pylines 371-378, 390-395: Python error response format changes - verify the{"finish_reason": {"error": error_msg}}format is correctopenai.rslines 723-797: Theextract_backend_error_if_present()function - verify the error detection logic covers all casesopenai.rslines 919-923: Error checking added for streaming requests - ensure this doesn't impact performanceopenai.rslines 2139-2221: Unit tests - verify they cover the main error scenariosRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-967
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.