Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Dec 11, 2025

Overview:

  • New Features

    • TCP ports now default to automatic frees port assignment (OS assigned) instead of requiring manual configuration, simplifying deployment.
    • Reduced likelihood of port conflicts through dynamic binding support.
  • Documentation

    • Clarified TCP configuration documentation to reflect optional port settings and provide guidance on when manual port configuration is beneficial.

closes: DYN-1583

Summary by CodeRabbit

  • New Features

    • OS-assigned TCP ports when DYN_TCP_RPC_PORT is unset; actual bound port retrievable at runtime.
    • Added new TCP configuration options (DYN_TCP_CONNECT_TIMEOUT, DYN_TCP_CHANNEL_BUFFER).
  • Documentation

    • Updated TCP configuration guidance emphasizing optional port settings and automatic port selection behavior with expanded examples.
  • Bug Fixes

    • Reduced TCP port conflicts through OS-assigned port support.

✏️ Tip: You can customize this high-level summary in your review settings.

@biswapanda biswapanda requested a review from a team as a code owner December 11, 2025 09:28
@github-actions github-actions bot added the feat label Dec 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

This PR makes DYN_TCP_RPC_PORT optional and enables OS-assigned TCP ports. It refactors TCP server startup to bind and record the actual address/port, exposes a global accessor for the bound port, updates endpoint transport construction to be async and endpoint-aware, and updates docs and tests to reflect the changes.

Changes

Cohort / File(s) Summary
Documentation
docs/guides/request_plane.md
Rework TCP port docs: DYN_TCP_RPC_PORT is optional, OS-assigned ports used when unset, expanded TCP config options (e.g., DYN_TCP_CONNECT_TIMEOUT, DYN_TCP_CHANNEL_BUFFER, DYN_TCP_RPC_HOST), updated migration/troubleshooting and guidance.
Endpoint / Transport refactor
lib/runtime/src/component/endpoint.rs
Make transport construction async and endpoint-aware: build_transport_type signature changed to async pub async fn build_transport_type(endpoint: &Endpoint, endpoint_id: &EndpointId, connection_id: u64) -> Result<TransportType>; added build_transport_type_inner; health-check registration now awaits transport build and uses async transport builder.
Shared TCP server: binding & lifecycle
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
Track actual bound address: add actual_addr: RwLock<Option<SocketAddr>>, add bind_and_start() (returns actual SocketAddr), actual_address() accessor, extract accept_loop(), refactor start() to delegate to bind_and_start(), prefer actual address for registration/logging, add tests for graceful shutdown/unregister.
Network manager & global port access
lib/runtime/src/pipeline/network/manager.rs
Make config port optional (tcp_port: Option<u16>), load DYN_TCP_RPC_PORT optionally, bind with port unwrap_or(0) for OS assignment, capture actual bound port and store in OnceLock<u16>, add pub fn get_actual_tcp_rpc_port() -> anyhow::Result<u16>, adjust logs and synchronous startup to capture actual port.
Call-site updates
lib/llm/src/discovery/model_manager.rs
Updated transport construction call to await and pass endpoint to new async build_transport_type(endpoint, ...) instead of using a local request_plane_mode.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Check async signature changes and all call-site await updates (especially endpoint.rs and model_manager.rs).
  • Verify OnceLock usage and race conditions between binding and get_actual_tcp_rpc_port().
  • Validate bind_and_start() lifecycle and cancellation/accept loop correctness.
  • Ensure address resolution (actual vs configured) is used consistently for logs/registration.
  • Review new tests for correctness and coverage.

Poem

🐇 I hopped to bind where kernels choose,

A port unknown — no need to lose.
The OS gave a number bright,
I stored it safe and logged it right.
Hooray — dynamic ports take flight! 🎉

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive Description covers overview and issues closed, but lacks detailed change explanation and reviewer guidance on specific files to examine. Expand Details section to explain implementation changes (async transport building, TCP server binding flow, port tracking) and add Where should the reviewer start section identifying critical files.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately captures the main feature: enabling OS-assigned free port defaults for TCP RPC instead of manual configuration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b83e955 and e6e9532.

📒 Files selected for processing (4)
  • lib/llm/src/discovery/model_manager.rs (1 hunks)
  • lib/runtime/src/component/endpoint.rs (4 hunks)
  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (6 hunks)
  • lib/runtime/src/pipeline/network/manager.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
📚 Learning: 2025-09-11T03:24:47.820Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
  • lib/runtime/src/component/endpoint.rs
📚 Learning: 2025-07-14T21:25:56.930Z
Learnt from: ryanolson
Repo: ai-dynamo/dynamo PR: 1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
📚 Learning: 2025-07-16T12:41:12.543Z
Learnt from: grahamking
Repo: ai-dynamo/dynamo PR: 1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
📚 Learning: 2025-06-17T00:50:44.845Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/llm/src/discovery/model_manager.rs
🧬 Code graph analysis (2)
lib/runtime/src/pipeline/network/manager.rs (2)
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (2)
  • new (58-67)
  • new (474-481)
lib/runtime/src/utils/ip_resolver.rs (1)
  • get_tcp_rpc_host_from_env (63-65)
lib/llm/src/discovery/model_manager.rs (4)
lib/runtime/src/component/endpoint.rs (1)
  • build_transport_type (287-308)
lib/runtime/src/component.rs (1)
  • endpoint (238-245)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • endpoint (131-135)
lib/llm/src/kv_router.rs (1)
  • router_endpoint_id (85-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: vllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (launch/dynamo-run)
🔇 Additional comments (17)
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (8)

14-14: LGTM: RwLock for actual address tracking is appropriate.

The use of RwLock<Option<SocketAddr>> is well-suited for this use case where the address is set once during binding and read many times thereafter.

Also applies to: 39-42, 61-66


69-98: LGTM: Bind and start flow is correct.

The method properly:

  1. Binds to the configured address
  2. Captures the actual bound address (critical for port 0)
  3. Stores it before spawning the accept loop
  4. Returns the actual address for caller use

The spawned task running the accept loop in the background is appropriate for a long-running server.


100-106: LGTM: actual_address accessor is straightforward.

The method correctly returns the bound address after bind_and_start has been called.


107-136: LGTM: Accept loop with proper cancellation support.

The loop correctly:

  • Uses tokio::select! for cancellation
  • Spawns independent tasks per connection
  • Handles errors gracefully
  • Returns cleanly on shutdown

168-172: LGTM: Register endpoint logging uses actual address.

The fallback to bind_addr ensures logging works even if called before binding (though that shouldn't happen in practice).


207-217: LGTM: Legacy start method refactored cleanly.

The method now delegates to bind_and_start() while maintaining backward compatibility. The cancellation wait ensures the server keeps running until shutdown.


434-439: LGTM: Address method returns actual bound address.

The fallback logic ensures a valid address is always returned, with preference for the actual bound address after server startup.


452-665: LGTM: Comprehensive graceful shutdown test.

The test thoroughly validates that unregister_endpoint waits for inflight requests to complete before returning. The manual inflight counter management is appropriate for this unit test.

lib/runtime/src/component/endpoint.rs (4)

116-141: LGTM: Health check registration updated for async transport building.

The code correctly awaits the new async build_transport_type function. The race condition protection is handled within build_transport_type itself (see lines 287-308).


201-201: LGTM: Discovery transport construction updated consistently.


235-280: LGTM: Transport builder properly errors if TCP port uninitialized.

The function correctly:

  1. Uses explicit port from DYN_TCP_RPC_PORT if set
  2. Falls back to get_actual_tcp_rpc_port()? which errors if not initialized
  3. Returns Result<TransportType> to propagate errors

282-308: LGTM: Race condition fix is correctly implemented.

This is the critical fix mentioned in the PR comments. When TCP mode uses an OS-assigned port, the function ensures the server is initialized (which sets the global port) before building the transport. The logic is:

  1. Only force server init when no fixed port is configured
  2. Server initialization calls bind_and_start() which sets ACTUAL_TCP_RPC_PORT
  3. Then build_transport_type_inner can safely retrieve the port

This prevents the race where get_actual_tcp_rpc_port() could be called before the server binds.

lib/llm/src/discovery/model_manager.rs (1)

337-337: LGTM: Call site updated to match new async API.

The change correctly passes endpoint as the first parameter and awaits the result, matching the new build_transport_type signature.

lib/runtime/src/pipeline/network/manager.rs (4)

22-50: LGTM: Global port storage using OnceLock is appropriate.

The use of OnceLock<u16> is well-suited for this use case where the port is set once during server initialization and never changes. The error handling in get_actual_tcp_rpc_port() provides clear diagnostics if called before initialization.


62-63: LGTM: TCP port is now optional to support OS assignment.

The change from u16 to Option<u16> correctly enables OS-assigned ports when DYN_TCP_RPC_PORT is not set.

Also applies to: 92-97


172-200: LGTM: Enhanced logging shows port source.

The logging clearly indicates whether the TCP port comes from configuration or will be OS-assigned, improving observability during initialization.


302-332: LGTM: TCP server creation properly captures and stores actual port.

The flow correctly:

  1. Uses port 0 when no explicit port is configured (enables OS assignment)
  2. Calls bind_and_start() to bind and get actual address
  3. Stores the actual port globally via set_actual_tcp_rpc_port()
  4. Logs comprehensive information

The synchronous flow after bind_and_start().await ensures the port is set before this function returns, which is critical for preventing the race condition.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/runtime/src/pipeline/network/manager.rs (2)

25-39: Consider logging a warning if set_actual_tcp_rpc_port is called multiple times.

The current implementation silently ignores subsequent calls to set_actual_tcp_rpc_port. While the comment notes this "shouldn't happen in normal operation," silently ignoring could mask configuration issues or unexpected behavior in edge cases (e.g., tests, service restarts).

 /// Set the actual TCP RPC port (called internally after server binds).
 fn set_actual_tcp_rpc_port(port: u16) {
-    // Ignore if already set (shouldn't happen in normal operation)
-    let _ = ACTUAL_TCP_RPC_PORT.set(port);
+    if let Err(existing) = ACTUAL_TCP_RPC_PORT.set(port) {
+        tracing::warn!(
+            existing_port = existing,
+            new_port = port,
+            "TCP RPC port already set, ignoring new value"
+        );
+    }
 }

29-33: Returning 0 as a sentinel value could cause subtle bugs.

get_actual_tcp_rpc_port() returns 0 if the TCP server hasn't started yet. Port 0 is a valid value (meaning "OS assigns port"), but returning it here could lead to invalid transport addresses being constructed (e.g., host:0/endpoint) if called prematurely.

Consider documenting this behavior clearly or returning an Option<u16> to make the "not yet set" state explicit.

-/// Get the actual TCP RPC port that the server is listening on.
-/// Returns 0 if the TCP server hasn't been started yet.
-pub fn get_actual_tcp_rpc_port() -> u16 {
-    ACTUAL_TCP_RPC_PORT.get().copied().unwrap_or(0)
-}
+/// Get the actual TCP RPC port that the server is listening on.
+///
+/// # Returns
+/// - `Some(port)` if the TCP server has bound and the port is known
+/// - `None` if the TCP server hasn't been started yet
+///
+/// # Panics
+/// Callers should ensure the TCP server is started before calling this in TCP mode.
+pub fn get_actual_tcp_rpc_port() -> Option<u16> {
+    ACTUAL_TCP_RPC_PORT.get().copied()
+}

Note: This would require updating endpoint.rs to handle None appropriately, potentially with a clearer error message.

lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (1)

86-88: Consider handling potential panic from unwrap() on RwLock.

While unwrap() on a std::sync::RwLock write guard only panics if the lock is poisoned (another thread panicked while holding it), this is a single-writer scenario and poisoning is extremely unlikely. However, for robustness in a server context:

-        *self.actual_addr.write().unwrap() = Some(actual_addr);
+        match self.actual_addr.write() {
+            Ok(mut guard) => *guard = Some(actual_addr),
+            Err(poisoned) => {
+                tracing::warn!("RwLock poisoned, recovering");
+                *poisoned.into_inner() = Some(actual_addr);
+            }
+        }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6b440e and b83e955.

📒 Files selected for processing (4)
  • docs/guides/request_plane.md (4 hunks)
  • lib/runtime/src/component/endpoint.rs (1 hunks)
  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (5 hunks)
  • lib/runtime/src/pipeline/network/manager.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

  • lib/runtime/src/component/endpoint.rs
  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
📚 Learning: 2025-06-02T19:37:27.666Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.

Applied to files:

  • lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs
🧬 Code graph analysis (2)
lib/runtime/src/pipeline/network/manager.rs (1)
lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (2)
  • new (58-65)
  • new (472-479)
lib/runtime/src/component/endpoint.rs (1)
lib/runtime/src/pipeline/network/manager.rs (1)
  • get_actual_tcp_rpc_port (31-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: clippy (.)
🔇 Additional comments (8)
docs/guides/request_plane.md (2)

111-119: Documentation accurately reflects the new TCP port behavior.

The documentation clearly explains that DYN_TCP_RPC_PORT is now optional and that OS-assigned ports are recommended for most deployments. The guidance on when to set a fixed port (e.g., firewall rules) is helpful.


129-134: Good addition of new TCP configuration options.

The documentation now includes DYN_TCP_CONNECT_TIMEOUT and DYN_TCP_CHANNEL_BUFFER which provides comprehensive coverage of TCP configuration options.

lib/runtime/src/pipeline/network/manager.rs (1)

268-297: TCP server creation flow looks correct.

The implementation properly:

  1. Binds with port 0 when tcp_port is None
  2. Awaits bind_and_start() to get the actual address
  3. Stores the actual port globally before returning
  4. Logs both the source and actual values

This ensures the port is available before any endpoint registration occurs.

lib/runtime/src/pipeline/network/ingress/shared_tcp_endpoint.rs (4)

41-43: Using std::sync::RwLock in async context is acceptable here.

While std::sync::RwLock can block the async runtime, in this case it's used for very brief read/write operations on a single Option<SocketAddr>. The lock is never held across await points, so this is safe and avoids the overhead of tokio::sync::RwLock.


74-96: Clean implementation of bind_and_start.

The method properly:

  1. Binds first, capturing the actual address
  2. Stores the address before spawning the accept loop
  3. Returns the actual address for the caller to use

This ensures the address is available immediately after the method returns.


205-215: Legacy start() method properly delegates to new implementation.

The refactored start() method maintains backward compatibility while using the new bind_and_start() internally. It correctly waits on the cancellation token after starting.


512-662: Comprehensive test for graceful shutdown behavior.

The test validates that unregister_endpoint properly waits for inflight requests to complete. A few observations:

  1. The test manually increments/decrements the inflight counter (lines 573, 629-630) to simulate tracking, which mirrors the real code flow but could diverge if implementation changes.

  2. The test uses reasonable timeouts and assertions to verify the waiting behavior.

Overall, this is a valuable addition for ensuring graceful shutdown correctness.

lib/runtime/src/component/endpoint.rs (1)

257-268: Proper ordering confirmed: TCP port is set before endpoint registration.

The implementation correctly retrieves the actual bound port via get_actual_tcp_rpc_port() after the TCP server has completed binding. The call ordering is guaranteed because request_plane_server().await (line 141) is awaited before any build_transport_type() calls. The server initialization uses get_or_try_init() to ensure single, synchronous binding, and create_tcp_server() calls set_actual_tcp_rpc_port() before returning control to the endpoint registration code.

@biswapanda biswapanda marked this pull request as draft December 11, 2025 18:36
@biswapanda biswapanda assigned biswapanda and unassigned biswapanda Dec 11, 2025
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding config parity between TCP and NATS quickly!

I think there could be a race condition that will cause endpoints to incorrectly advertise port 0:

In EndpointConfigBuilder::start (line 95), we call build_transport_type(...) which internally calls manager::get_actual_tcp_rpc_port(). At this exact point in execution, the request plane server has not been started yet, so the global ACTUAL_TCP_RPC_PORT is unset, and the function returns default 0.

The server is only initialized later at line 141 (endpoint.drt().request_plane_server().await?). While that call eventually sets the global port variable, it's too late that the transport variable has already been created with port 0.

(summarized by AI)

@biswapanda
Copy link
Contributor Author

Thanks for adding config parity between TCP and NATS quickly!

I think there could be a race condition that will cause endpoints to incorrectly advertise port 0:

In EndpointConfigBuilder::start (line 95), we call build_transport_type(...) which internally calls manager::get_actual_tcp_rpc_port(). At this exact point in execution, the request plane server has not been started yet, so the global ACTUAL_TCP_RPC_PORT is unset, and the function returns default 0.
The server is only initialized later at line 141 (endpoint.drt().request_plane_server().await?). While that call eventually sets the global port variable, it's too late that the transport variable has already been created with port 0.

(summarized by AI)

Fixed issue was handled in e02bb11

Latest flow ~

  1. Check if we're in TCP mode without a fixed port
  2. If so, calls request_plane_server().await before building the transport
  3. Only then calls build_transport_type_inner which reads the port

@biswapanda biswapanda marked this pull request as ready for review December 11, 2025 23:13
@biswapanda
Copy link
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@biswapanda biswapanda requested a review from kthui December 11, 2025 23:20
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@biswapanda biswapanda merged commit 64a30d3 into main Dec 11, 2025
43 of 44 checks passed
@biswapanda biswapanda deleted the bis/default-tcp-rpc-port branch December 11, 2025 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants