fix: worker to graceful shutdown after finishing in-flight requests #4838

tedzhouhk · 2025-12-10T00:54:59Z

Consistent handling of in flight requests across 3 transport implementations
Adhere to 3 phase graceful shutdown process introduced in chore: many bug fixes and improvements when testing planner #2776

Phase 1: Cancel endpoint_shutdown_token
Cleanup task wakes up, calls server.unregister_endpoint() which waits for inflight requests
Then tracker.unregister_endpoint() is called
Phase 2: tracker.get_count() waits if endpoints are still processing
Phase 3: Cancel main token (now safe, all requests completed)

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved graceful shutdown across network endpoints (HTTP, NATS, TCP, and push) to ensure in-flight requests complete before termination.
- Enhanced LLM mocker shutdown sequencing for more controlled service termination.
Chores
- Added structured logging with endpoint metadata during shutdown operations for better observability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: hongkuanz <[email protected]>

coderabbitai · 2025-12-10T19:54:19Z

Walkthrough

Changes introduce graceful shutdown mechanisms across the network ingress pipeline and mocker engine by tracking in-flight requests, awaiting their completion before finalization, and adding structured logging throughout shutdown paths. This affects cancellation semantics, join handle management, and request draining behavior.

Changes

Cohort / File(s)	Summary
Mocker Engine `lib/llm/src/mocker/engine.rs`	Replaced cancellation token source from `runtime.child_token()` to `runtime.primary_token()`, delaying shutdown until Phase 3. Added log entry and active_requests map clearing in scheduler background task on cancellation to unblock waiting handlers.
Ingress Endpoints `lib/runtime/src/pipeline/network/ingress/http_endpoint.rs`, `lib/runtime/src/pipeline/network/ingress/nats_server.rs`, `lib/runtime/src/pipeline/network/ingress/push_endpoint.rs`, `lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs`	Unified graceful shutdown pattern: structured logging with endpoint metadata, conditional waiting for in-flight requests completion via Notify, and JoinHandle tracking (in nats_server.rs). Log entries now include endpoint_name, inflight_count, and completion status.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Shutdown semantics alteration in mocker engine: verify cancellation token phase transition and active_requests cleanup impact on in-flight request handling
Synchronization primitives: ensure Notify usage and join_handle awaits are correctly integrated across endpoint types without deadlock or race conditions
Structured logging consistency: validate that endpoint metadata and inflight counts are correctly propagated in all shutdown paths

Poem

🐰 Shutdown's symphony, now sung with grace,
Join handles join the race,
Notify whispers "wait for me,"
Till inflight requests run free,
Logs tell the tale of clean goodbye,
Phase by phase, we wave goodbye. 🌙

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: implementing graceful shutdown that waits for in-flight requests to complete before terminating.
Description check	✅ Passed	The description provides context for the changes, explains the three-phase shutdown process, and references the related PR, but lacks specific details about which files were modified and where reviewers should focus.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

lib/runtime/src/pipeline/network/ingress/http_endpoint.rs (1)
108-128: Solid implementation of inflight request draining with one minor consideration.

The structured logging and wait-for-inflight logic is well-implemented. However, there's a subtle race condition: between loading inflight_count at line 114 and entering the wait loop at line 121-123, new notifications may be missed.

Consider this scenario:

inflight_count loads as 1

The last in-flight request completes, calls notify_one(), decrements to 0

The wait loop starts, but no more notifications will come

This is mitigated because you re-check handler.inflight.load() in the while condition, but you could miss wakeups if notification happens between the condition check and notified().await. The current pattern should work in practice due to the re-check, but you may want to use Notify::notified() before the condition check for stricter correctness:
 if inflight_count > 0 {
     tracing::info!(
         endpoint_name = %endpoint_name,
         inflight_count = inflight_count,
         "Waiting for inflight HTTP requests to complete"
     );
-    while handler.inflight.load(Ordering::SeqCst) > 0 {
-        handler.notify.notified().await;
-    }
+    loop {
+        let notified = handler.notify.notified();
+        if handler.inflight.load(Ordering::SeqCst) == 0 {
+            break;
+        }
+        notified.await;
+    }
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (1)

138-152: Good conditional waiting logic with consistent logging.

The implementation correctly waits for inflight requests only when inflight_count > 0, avoiding unnecessary waits. The same Notify race consideration from http_endpoint.rs applies here—consider registering notified() before checking the condition for stricter correctness (see comment on http_endpoint.rs).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9e445a and 6f82d16.

📒 Files selected for processing (5)

lib/llm/src/mocker/engine.rs (2 hunks)
lib/runtime/src/pipeline/network/ingress/http_endpoint.rs (1 hunks)
lib/runtime/src/pipeline/network/ingress/nats_server.rs (4 hunks)
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (1 hunks)
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (8)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

📚 Learning: 2025-08-27T17:56:14.690Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 2500
File: lib/llm/src/migration.rs:58-77
Timestamp: 2025-08-27T17:56:14.690Z
Learning: In lib/llm/src/migration.rs, the cancellation visibility in the Migration operator is intentionally one-way - it checks engine_ctx.is_stopped()/is_killed() to stop pulling from streams but doesn't link newly created streams as child contexts to the parent. This is a conscious architectural decision with plans for future enhancement.

Applied to files:

lib/llm/src/mocker/engine.rs

📚 Learning: 2025-07-14T21:25:56.930Z

Learnt from: ryanolson
Repo: ai-dynamo/dynamo PR: 1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Applied to files:

lib/llm/src/mocker/engine.rs

📚 Learning: 2025-05-29T06:20:12.901Z

Learnt from: ryanolson
Repo: ai-dynamo/dynamo PR: 1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Applied to files:

lib/llm/src/mocker/engine.rs
lib/runtime/src/pipeline/network/ingress/nats_server.rs

📚 Learning: 2025-06-02T19:37:27.666Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.

Applied to files:

lib/llm/src/mocker/engine.rs

📚 Learning: 2025-06-13T22:07:24.843Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

Applied to files:

lib/runtime/src/pipeline/network/ingress/nats_server.rs

📚 Learning: 2025-09-11T03:24:47.820Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

lib/runtime/src/pipeline/network/ingress/nats_server.rs
lib/runtime/src/pipeline/network/ingress/http_endpoint.rs
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs

📚 Learning: 2025-08-21T17:23:02.836Z

Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

lib/runtime/src/pipeline/network/ingress/http_endpoint.rs
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs

🧬 Code graph analysis (2)

lib/llm/src/mocker/engine.rs (1)

lib/runtime/src/utils/tasks/tracker.rs (2)

cancel_token (1951-1954)

active (1429-1431)

lib/runtime/src/pipeline/network/ingress/http_endpoint.rs (2)

lib/runtime/src/transports/nats.rs (1)

subject (848-850)

lib/llm/src/kv/reserved.rs (1)

inflight_count (20-22)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: vllm (arm64)
GitHub Check: vllm (amd64)
GitHub Check: sglang (amd64)
GitHub Check: sglang (arm64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (6)

lib/llm/src/mocker/engine.rs (2)

63-67: LGTM on the token change for graceful shutdown semantics.

The switch from child_token() to primary_token() correctly implements Phase 3 shutdown behavior, ensuring the mocker continues serving in-flight requests during Phases 1-2. The inline comments clearly document the rationale.

149-155: Good approach to unblock waiting request handlers.

Clearing active_requests on cancellation correctly causes pending request_rx.recv() calls to return None, allowing request handler tasks to exit gracefully. The subsequent active.remove(&request_uuid) at line 349 is safe since removing a non-existent key is a no-op.

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (1)

153-158: LGTM on the non-graceful shutdown path.

Adding endpoint_name to the skip log improves observability and maintains consistency with the graceful path.

lib/runtime/src/pipeline/network/ingress/nats_server.rs (3)

35-35: LGTM on adding JoinHandle for lifecycle tracking.

Storing the JoinHandle enables proper task lifecycle management during shutdown.

149-172: Good task lifecycle management.

Capturing the JoinHandle from tokio::spawn and storing it in EndpointTask enables proper coordination during unregistration.

Also applies to: 184-184

198-216: Well-structured shutdown coordination with proper panic handling.

The implementation correctly:

Cancels the endpoint-specific token to trigger graceful shutdown

Waits for the task to complete (which internally waits for inflight requests via PushEndpoint)

Handles panics gracefully with a warning log

Logs completion for observability

This properly implements the layered shutdown approach where NatsMultiplexedServer delegates inflight draining to PushEndpoint.

lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs

…i-dynamo#4838) Signed-off-by: hongkuanz <[email protected]> Co-authored-by: Biswa Panda <[email protected]>

stage

a539598

Signed-off-by: hongkuanz <[email protected]>

pull-request-size bot added the size/M label Dec 10, 2025

github-actions bot added the fix label Dec 10, 2025

fix inflight http and tcp server

e52a7fb

copy-pr-bot bot had a problem deploying to GITLAB December 10, 2025 07:05 Failure

consistent logging and handling across implementations

6f82d16

pull-request-size bot added size/L and removed size/M labels Dec 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 19:33 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 19:35 Inactive

tedzhouhk marked this pull request as ready for review December 10, 2025 19:50

tedzhouhk requested a review from a team as a code owner December 10, 2025 19:50

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs Show resolved Hide resolved

update

f021291

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 20:45 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 20:46 Inactive

biswapanda enabled auto-merge (squash) December 10, 2025 20:47

biswapanda approved these changes Dec 10, 2025

View reviewed changes

rmccorm4 requested a review from kthui December 10, 2025 21:04

biswapanda disabled auto-merge December 10, 2025 21:24

grahamking approved these changes Dec 10, 2025

View reviewed changes

Merge branch 'main' into hzhou/graceful-fix

9c7fbf9

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 22:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 22:31 Inactive

Merge branch 'main' into hzhou/graceful-fix

c655e3f

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 23:19 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 23:20 Inactive

biswapanda enabled auto-merge (squash) December 11, 2025 00:20

biswapanda merged commit c8845b4 into main Dec 11, 2025
38 of 39 checks passed

biswapanda deleted the hzhou/graceful-fix branch December 11, 2025 00:25

zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025

fix: worker to graceful shutdown after finishing in-flight requests (a…

c717436

…i-dynamo#4838) Signed-off-by: hongkuanz <[email protected]> Co-authored-by: Biswa Panda <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: worker to graceful shutdown after finishing in-flight requests #4838

fix: worker to graceful shutdown after finishing in-flight requests #4838

Uh oh!

tedzhouhk commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: worker to graceful shutdown after finishing in-flight requests #4838

fix: worker to graceful shutdown after finishing in-flight requests #4838

Uh oh!

Conversation

tedzhouhk commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 10, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented Dec 10, 2025 •

edited by coderabbitai bot

Loading