Skip to content

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Dec 8, 2025

Overview:

Pre-download models before the first test case is ran, avoid model download time from be counted against the first test case.

Details:

Added pytest_runtestloop that checks for model marker and download those models before any test case is started.

The new pre-download is verified on CI, over pre_merge:

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /opt/dynamo/venv/bin/python3
cachedir: /tmp/.pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace
plugins: anyio-4.12.0, forked-1.6.0, asyncio-1.3.0, pytest_httpserver-1.1.3, md-report-0.7.0, mock-3.15.1, timeout-2.4.0, pytest_codeblocks-0.17.0, cov-7.0.0, xdist-3.8.0, benchmark-5.2.3, ai-dynamo-0.7.0, mypy-1.0.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function

collecting ... collected 450 items / 427 deselected / 23 selected
Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]
Fetching 10 files:  10%|█         | 1/10 [00:00<00:05,  1.56it/s]
Fetching 10 files:  70%|███████   | 7/10 [00:02<00:01,  2.64it/s]
Fetching 10 files: 100%|██████████| 10/10 [00:02<00:00,  3.67it/s]

Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]
Fetching 10 files:  10%|█         | 1/10 [00:00<00:01,  7.92it/s]
Fetching 10 files:  60%|██████    | 6/10 [00:04<00:03,  1.16it/s]
Fetching 10 files: 100%|██████████| 10/10 [00:04<00:00,  2.03it/s]

components/src/dynamo/vllm/tests/test_vllm_prometheus_utils.py::TestGetPrometheusExpfmt::test_vllm_use_case PASSED [  4%]
components/src/dynamo/vllm/tests/test_vllm_prometheus_utils.py::TestGetPrometheusExpfmt::test_error_handling PASSED [  8%]
components/src/dynamo/vllm/tests/test_vllm_unit.py::test_custom_jinja_template_invalid_path PASSED [ 13%]
components/src/dynamo/vllm/tests/test_vllm_unit.py::test_custom_jinja_template_valid_path PASSED [ 17%]
components/src/dynamo/vllm/tests/test_vllm_unit.py::test_custom_jinja_template_env_var_expansion PASSED [ 21%]
tests/dependencies/test_kvbm_imports.py::test_kvbm_wheel_exists_vllm PASSED [ 26%]
tests/dependencies/test_kvbm_imports.py::test_kvbm_imports_vllm PASSED   [ 30%]
tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_aggregated PASSED [ 34%]
tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_decode_cancel PASSED [ 39%]
tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_prefill_cancel PASSED [ 43%]
tests/fault_tolerance/migration/test_vllm.py::test_request_migration_vllm_worker_failure PASSED [ 47%]
tests/fault_tolerance/migration/test_vllm.py::test_request_migration_vllm_graceful_shutdown PASSED [ 52%]
tests/fault_tolerance/migration/test_vllm.py::test_no_request_migration_vllm_worker_failure PASSED [ 56%]
tests/fault_tolerance/migration/test_vllm.py::test_no_request_migration_vllm_graceful_shutdown PASSED [ 60%]
tests/profiler/test_profile_sla_dryrun.py::TestProfileSLADryRun::test_vllm_dryrun PASSED [ 65%]
tests/profiler/test_profile_sla_dryrun.py::TestProfileSLADryRun::test_profile_with_autogen_search_space_h100 PASSED [ 69%]
tests/router/test_router_e2e_with_vllm.py::test_vllm_kv_router_basic PASSED [ 73%]
tests/router/test_router_e2e_with_vllm.py::test_router_decisions_vllm_multiple_workers PASSED [ 78%]
tests/router/test_router_e2e_with_vllm.py::test_vllm_indexers_sync PASSED [ 82%]
tests/serve/test_vllm.py::test_serve_deployment[aggregated] PASSED       [ 86%]
tests/serve/test_vllm.py::test_serve_deployment[aggregated_lmcache] PASSED [ 91%]
tests/serve/test_vllm.py::test_serve_deployment[agg-request-plane-tcp] PASSED [ 95%]
tests/serve/test_vllm.py::test_serve_deployment[agg-request-plane-http] PASSED [100%]

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------ generated xml file: /workspace/test-results/pytest_test_report.xml ------
============================= slowest 10 durations =============================
182.64s call     tests/fault_tolerance/migration/test_vllm.py::test_request_migration_vllm_graceful_shutdown
180.68s call     tests/fault_tolerance/migration/test_vllm.py::test_request_migration_vllm_worker_failure
95.23s call     tests/serve/test_vllm.py::test_serve_deployment[aggregated]
95.12s call     tests/serve/test_vllm.py::test_serve_deployment[agg-request-plane-http]
94.10s call     tests/serve/test_vllm.py::test_serve_deployment[agg-request-plane-tcp]
91.60s call     tests/serve/test_vllm.py::test_serve_deployment[aggregated_lmcache]
85.69s call     tests/fault_tolerance/migration/test_vllm.py::test_no_request_migration_vllm_graceful_shutdown
75.50s call     tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_decode_cancel
75.21s call     tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_prefill_cancel
68.23s call     tests/fault_tolerance/cancellation/test_vllm.py::test_request_cancellation_vllm_aggregated
========= 23 passed, 427 deselected, 2 warnings in 1425.10s (0:23:45) ==========
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
🧪 Tests completed with exit code: 0

https://github.com/ai-dynamo/dynamo/actions/runs/20046561895/job/57493248332?pr=4811#step:8:108

Where should the reviewer start?

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

  • Tests
    • Refactored test infrastructure to optimize model download timing and eliminate redundant setup operations across test suites.

✏️ Tip: You can customize this high-level summary in your review settings.

@kthui kthui self-assigned this Dec 8, 2025
@github-actions github-actions bot added the test label Dec 8, 2025
@kthui kthui marked this pull request as ready for review December 9, 2025 00:39
@kthui kthui requested review from a team as code owners December 9, 2025 00:39
@kthui kthui changed the title test: Pre-download models before test cases are ran test: Pre-download models before tests are ran Dec 9, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

A new pytest hook is added to conftest.py to download models before test execution with timing measurements. Simultaneously, the predownload_models fixture parameter is removed from test function signatures across fault tolerance test suites, as the hook now handles downloads globally.

Changes

Cohort / File(s) Summary
Pytest Configuration
tests/conftest.py
Added pytest_runtestloop(session) hook that runs after test collection to download models with timing and logging. Imported time module for measurement.
Fault Tolerance Cancellation Tests
tests/fault_tolerance/cancellation/test_sglang.py, test_trtllm.py, test_vllm.py
Removed predownload_models fixture parameter from test function signatures across 9 test functions (2 in sglang, 4 in trtllm, 3 in vllm). Test bodies unchanged.
Fault Tolerance Migration Tests
tests/fault_tolerance/migration/test_sglang.py, test_trtllm.py, test_vllm.py
Removed predownload_models fixture parameter from test function signatures across 12 test functions (4 per file). Test bodies unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • pytest hook logic: Verify that pytest_runtestloop(session) correctly retrieves models_to_download from config, executes before tests, handles errors gracefully, and timing measurements are accurate
  • Parameter removals: Confirm the predownload_models dependency is truly obsolete across all modified tests and no test logic inadvertently relied on this fixture
  • Integration consistency: Ensure the global hook covers all models required by the removed fixture across different test backends (sglang, trtllm, vllm)

Poem

A hook hops in before the tests take flight,
Downloads all the models, sets things right,
The fixture fades, no longer needed here,
Pre-downloads run once, crystal clear! 🐰📥

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding pre-download of models before test execution to avoid counting download time against test timeouts.
Description check ✅ Passed The description follows the template structure with Overview, Details, Where to start, and Related Issues sections. It explains the change, provides CI verification, and summarizes the implementation.
Docstring Coverage ✅ Passed Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80dfb82 and 6144d78.

📒 Files selected for processing (7)
  • tests/conftest.py (2 hunks)
  • tests/fault_tolerance/cancellation/test_sglang.py (2 hunks)
  • tests/fault_tolerance/cancellation/test_trtllm.py (4 hunks)
  • tests/fault_tolerance/cancellation/test_vllm.py (3 hunks)
  • tests/fault_tolerance/migration/test_sglang.py (4 hunks)
  • tests/fault_tolerance/migration/test_trtllm.py (4 hunks)
  • tests/fault_tolerance/migration/test_vllm.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Applied to files:

  • tests/fault_tolerance/cancellation/test_trtllm.py
🧬 Code graph analysis (3)
tests/fault_tolerance/cancellation/test_sglang.py (4)
tests/conftest.py (1)
  • runtime_services (416-419)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)
  • runtime_services (105-109)
tests/frontend/test_completion_mocker_engine.py (1)
  • runtime_services (137-141)
tests/frontend/test_vllm.py (1)
  • runtime_services (173-177)
tests/fault_tolerance/cancellation/test_vllm.py (4)
tests/conftest.py (2)
  • runtime_services (416-419)
  • set_ucx_tls_no_mm (82-93)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)
  • runtime_services (105-109)
tests/frontend/test_completion_mocker_engine.py (1)
  • runtime_services (137-141)
tests/frontend/test_vllm.py (1)
  • runtime_services (173-177)
tests/fault_tolerance/migration/test_trtllm.py (1)
tests/conftest.py (2)
  • runtime_services (416-419)
  • set_ucx_tls_no_mm (82-93)
🪛 Ruff (0.14.8)
tests/fault_tolerance/cancellation/test_sglang.py

164-164: Unused function argument: runtime_services

(ARG001)


248-248: Unused function argument: runtime_services

(ARG001)

tests/fault_tolerance/migration/test_vllm.py

118-118: Unused function argument: runtime_services

(ARG001)


118-118: Unused function argument: set_ucx_tls_no_mm

(ARG001)


162-162: Unused function argument: runtime_services

(ARG001)


162-162: Unused function argument: set_ucx_tls_no_mm

(ARG001)


210-210: Unused function argument: runtime_services

(ARG001)


210-210: Unused function argument: set_ucx_tls_no_mm

(ARG001)


270-270: Unused function argument: runtime_services

(ARG001)


270-270: Unused function argument: set_ucx_tls_no_mm

(ARG001)

tests/fault_tolerance/cancellation/test_vllm.py

137-137: Unused function argument: runtime_services

(ARG001)


210-210: Unused function argument: runtime_services

(ARG001)


210-210: Unused function argument: set_ucx_tls_no_mm

(ARG001)


280-280: Unused function argument: runtime_services

(ARG001)


280-280: Unused function argument: set_ucx_tls_no_mm

(ARG001)

tests/fault_tolerance/migration/test_sglang.py

118-118: Unused function argument: runtime_services

(ARG001)


118-118: Unused function argument: set_ucx_tls_no_mm

(ARG001)


162-162: Unused function argument: runtime_services

(ARG001)


162-162: Unused function argument: set_ucx_tls_no_mm

(ARG001)


210-210: Unused function argument: runtime_services

(ARG001)


210-210: Unused function argument: set_ucx_tls_no_mm

(ARG001)


270-270: Unused function argument: runtime_services

(ARG001)


270-270: Unused function argument: set_ucx_tls_no_mm

(ARG001)

tests/fault_tolerance/migration/test_trtllm.py

114-114: Unused function argument: runtime_services

(ARG001)


114-114: Unused function argument: set_ucx_tls_no_mm

(ARG001)


158-158: Unused function argument: runtime_services

(ARG001)


158-158: Unused function argument: set_ucx_tls_no_mm

(ARG001)


206-206: Unused function argument: runtime_services

(ARG001)


206-206: Unused function argument: set_ucx_tls_no_mm

(ARG001)


266-266: Unused function argument: runtime_services

(ARG001)


266-266: Unused function argument: set_ucx_tls_no_mm

(ARG001)

tests/fault_tolerance/cancellation/test_trtllm.py

144-144: Unused function argument: runtime_services

(ARG001)


216-216: Unused function argument: runtime_services

(ARG001)


287-287: Unused function argument: runtime_services

(ARG001)


372-372: Unused function argument: runtime_services

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (9)
tests/conftest.py (2)

20-20: LGTM: Import supports timing measurements.

The time import is correctly added to support the model download timing measurements in the new pytest_runtestloop hook.


230-248: LGTM: Hook correctly pre-downloads models before test execution.

The pytest_runtestloop hook is properly implemented to download models after collection but before any tests run. This effectively decouples model download time from individual test timeouts, addressing the PR objective.

Key aspects verified:

  • Hook retrieves models collected by pytest_collection_modifyitems (lines 208-228)
  • Timing measurements are logged appropriately
  • Hook implicitly returns None, allowing default test loop execution to proceed
tests/fault_tolerance/cancellation/test_sglang.py (2)

164-164: LGTM: Fixture removal aligns with new pre-download hook.

The predownload_models fixture has been correctly removed from the test signature. Model downloads are now handled by the pytest_runtestloop hook in conftest.py, which downloads models before any tests execute.

Note: The static analysis hint about runtime_services being unused is a false positive. This fixture provides essential side effects (starting NATS and etcd servers) via its context manager, even though the yielded values aren't directly referenced in the test body.


248-248: LGTM: Fixture removal aligns with new pre-download hook.

The predownload_models fixture has been correctly removed from the test signature, consistent with the new model download strategy.

Note: The static analysis hint about runtime_services being unused is a false positive - this fixture is used for its side effects.

tests/fault_tolerance/cancellation/test_trtllm.py (1)

144-144: LGTM: Fixture removals align with new pre-download hook.

The predownload_models fixture has been correctly removed from all four test function signatures. Model downloads are now centrally handled by the pytest_runtestloop hook in conftest.py, which pre-downloads models before test execution begins.

Note: The static analysis hints about runtime_services being unused are false positives. This fixture provides essential side effects (starting NATS and etcd servers) through its context manager setup/teardown, even though the yielded values aren't explicitly referenced in the test bodies.

Also applies to: 216-216, 287-287, 372-372

tests/fault_tolerance/cancellation/test_vllm.py (1)

137-137: LGTM: Fixture removals align with new pre-download hook.

The predownload_models fixture has been correctly removed from all three test function signatures. Models are now pre-downloaded via the pytest_runtestloop hook before test execution.

Note: The static analysis hints about runtime_services and set_ucx_tls_no_mm being unused are false positives. Both fixtures provide essential side effects:

  • runtime_services: Starts NATS and etcd servers via context manager
  • set_ucx_tls_no_mm: Sets the UCX_TLS environment variable to mitigate UCX/NIXL mm transport issues (documented in conftest.py lines 81-93)

Also applies to: 210-210, 280-280

tests/fault_tolerance/migration/test_sglang.py (1)

118-118: LGTM: Fixture removals align with new pre-download hook.

The predownload_models fixture has been correctly removed from all four test function signatures, consistent with the new centralized model download approach via the pytest_runtestloop hook.

Note: The static analysis hints about runtime_services and set_ucx_tls_no_mm being unused are false positives. Both fixtures provide essential side effects through their setup/teardown logic, even though their yielded values aren't explicitly used in the test bodies.

Also applies to: 162-162, 210-210, 270-270

tests/fault_tolerance/migration/test_trtllm.py (1)

114-114: LGTM: Fixture removals align with new pre-download hook.

The predownload_models fixture has been correctly removed from all four test function signatures. Model pre-downloading is now handled centrally by the pytest_runtestloop hook before test execution begins.

Note: The static analysis hints about runtime_services and set_ucx_tls_no_mm being unused are false positives. Both fixtures are used for their side effects (service orchestration and environment configuration), not their return values.

Also applies to: 158-158, 206-206, 266-266

tests/fault_tolerance/migration/test_vllm.py (1)

118-118: LGTM: Fixture removals align with new pre-download hook.

The predownload_models fixture has been correctly removed from all four test function signatures. The new pytest_runtestloop hook in conftest.py now handles model pre-downloading before any tests execute, ensuring download time doesn't count against individual test timeouts.

Note: The static analysis hints about runtime_services and set_ucx_tls_no_mm being unused are false positives. Both fixtures provide essential side effects through their setup/teardown mechanisms, even though their yielded values aren't directly referenced in the test bodies.

Also applies to: 162-162, 210-210, 270-270


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE

@kthui kthui merged commit 4ca1679 into main Dec 9, 2025
30 of 31 checks passed
@kthui kthui deleted the jacky-ft-test-timeout-download-time branch December 9, 2025 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants