Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Dec 11, 2025

Overview:

fix test fixtures for tcp req plane: request cancellation and migration tests

  • set different DYN_TCP_RPC_PORT for prefill and decode workers when they are running on same machine

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Tests
    • Extended cancellation and migration test suites to run across multiple request plane configurations (NATS and TCP).
    • Added parameterized testing support for different KV store backends.
    • Enhanced test infrastructure to allow dynamic configuration of request planes and storage options across worker and router processes.
    • Marked known unstable TCP multi-worker configurations as expected to fail.

✏️ Tip: You can customize this high-level summary in your review settings.

@biswapanda biswapanda requested review from a team as code owners December 11, 2025 02:55
@github-actions github-actions bot added the test label Dec 11, 2025
@biswapanda biswapanda self-assigned this Dec 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Introduces multi-plane test parametrization by adding request_plane fixture support across test infrastructure and test files, enabling tests to execute with both NATS and TCP transport planes. Infrastructure changes propagate request plane configuration through conftest fixtures and environment variables in worker/service initialization code.

Changes

Cohort / File(s) Summary
Test Infrastructure Fixtures
lib/bindings/python/tests/conftest.py, tests/conftest.py
Adds new store_kv and request_plane fixtures with defaults ("file"/"etcd" and "nats" respectively). Updates runtime and runtime_services signatures to accept these fixtures. Introduces runtime_services_session for shared service instances via session-scoped NATS and Etcd servers. Implements conditional service startup logic based on request_plane and store_kv values.
Cancellation Tests
lib/bindings/python/tests/cancellation/test_cancellation.py, tests/fault_tolerance/cancellation/test_*.py, tests/fault_tolerance/cancellation/utils.py
Adds request_plane parametrization ([nats, tcp]) to multiple async test functions. Updates DynamoFrontendProcess and DynamoWorkerProcess to set DYN_REQUEST_PLANE from fixture and conditionally set DYN_TCP_RPC_PORT for TCP plane. Marks TCP variants as xfail where instability is known.
Migration Tests
tests/fault_tolerance/migration/test_*.py, tests/fault_tolerance/migration/utils.py
Applies same request_plane parametrization and environment variable setup (DYN_REQUEST_PLANE, DYN_TCP_RPC_PORT) to worker initialization. Marks TCP as xfail in multi-worker scenarios. Adds get_pid() method to DynamoWorkerProcess class in sglang and trtllm test files.
Router Tests
tests/router/common.py, tests/router/test_router_e2e_with_mockers.py
Adds request_plane parameter to KVRouterProcess, MockerProcess, and port generation functions. Implements plane_offset (0 for "nats", 25 for "tcp") in port calculation. Sets DYN_REQUEST_PLANE environment variable in mocker processes. Parametrizes test_busy_threshold_endpoint over request_plane.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • tests/conftest.py: Conditional service startup logic with multiple branches (nats+etcd, nats-only, etcd-only, none); requires tracing logic paths and session-scoped fixture behavior.
  • tests/router/test_router_e2e_with_mockers.py: Port offset calculations and parametrization propagation through multiple layers.
  • Parametrization consistency: Verify that all indirect fixture parametrizations correctly map to fixture definitions across multiple conftest files.
  • Environment variable propagation: Ensure DYN_REQUEST_PLANE and DYN_TCP_RPC_PORT are correctly threaded through initialization code in cancellation and migration test utilities.

Poem

🐰 Hop along, both planes in sight,
NATS and TCP, shining bright!
Parametrized tests now dance and play,
Multi-plane testing saves the day! 🚀
With fixtures stacked and logic lean,
The most robust suite ever seen!

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning The PR description is incomplete with placeholder sections. While it mentions the issue (DYN-1563) and provides an overview, the 'Details' and 'Where should the reviewer start?' sections are empty. Expand the 'Details' section to describe the specific changes (DYN_TCP_RPC_PORT handling across test files). Add guidance in 'Where should the reviewer start?' pointing to key files like tests/fault_tolerance/cancellation/, tests/fault_tolerance/migration/, and tests/router/.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: fix tcp based test port clash' directly describes the main objective of the changeset: addressing TCP port conflicts in test infrastructure by introducing parameterized request_plane fixtures and dynamic port allocation.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for quickly finding root cause and compiling a fix! I'll review in details tomorrow.

@biswapanda biswapanda enabled auto-merge (squash) December 11, 2025 03:31
@biswapanda biswapanda changed the title test: fix tcp ports for tcp based tests test: fix tcp based test's port clash Dec 11, 2025
@biswapanda biswapanda changed the title test: fix tcp based test's port clash test: fix tcp based test port clash Dec 11, 2025
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding the quick fix!

Would be nice if we can pick a random port if DYN_TCP_RPC_PORT is unset, then we will have alignment between NATS and TCP, suggested by @nnshah1 offline!

@biswapanda
Copy link
Contributor Author

LGTM! Thanks for adding the quick fix!

Would be nice if we can pick a random port if DYN_TCP_RPC_PORT is unset, then we will have alignment between NATS and TCP, suggested by @nnshah1 offline!

sounds good. I'm addressing it in
#4891

@biswapanda
Copy link
Contributor Author

@biswapanda biswapanda closed this Dec 12, 2025
auto-merge was automatically disabled December 12, 2025 03:00

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants