Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Dec 9, 2025

Overview:

Add CI tests for the lora shell scripts (agg and agg router) -

  • examples/backends/vllm/launch/lora/agg_lora.sh
  • examples/backends/vllm/launch/lora/agg_lora_router.sh

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Tests

    • Added comprehensive testing infrastructure for LoRA model integration
    • Introduced end-to-end test coverage for LoRA inference across aggregated and router-based deployments
  • Chores

    • Enhanced test utilities for MinIO-based model service management

✏️ Tip: You can customize this high-level summary in your review settings.

@biswapanda biswapanda requested review from a team as code owners December 9, 2025 02:33
@biswapanda biswapanda changed the title tests: add ci tests for lora agg and router test: add ci tests for lora agg and router Dec 9, 2025
@github-actions github-actions bot added the test label Dec 9, 2025
@biswapanda biswapanda self-assigned this Dec 9, 2025
@biswapanda biswapanda changed the title test: add ci tests for lora agg and router test: add ci tests for lora (agg and router) Dec 9, 2025
@biswapanda
Copy link
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

The pull request introduces a comprehensive MinIO/LoRA testing infrastructure in conftest.py with a data class for configuration, a service class to manage Docker-based MinIO, and a pytest fixture that orchestrates LoRA download from Hugging Face and upload to MinIO. Additionally, test_vllm.py receives LoRA testing support through new test payload classes and two integration test cases for single and multi-worker LoRA inference scenarios.

Changes

Cohort / File(s) Summary
MinIO Configuration & Service Infrastructure
tests/serve/conftest.py
Introduces MinioLoraConfig dataclass with S3 URI generation and environment variable helpers. Adds MinioService class managing Docker lifecycle (start/stop), HTTP readiness probing with retry timeout, bucket creation via AWS S3 CLI, LoRA download from Hugging Face, and upload to MinIO with cleanup routines. Includes constants for MinIO endpoint, credentials, bucket, and LoRA repository details.
MinIO LoRA Fixture & Helper
tests/serve/conftest.py
Introduces load_lora_adapter() function to load LoRA via system API endpoint with polling for model availability. Adds minio_lora_service() pytest fixture that orchestrates MinIO startup, bucket creation, LoRA download/upload, and cleanup.
LoRA Test Payload & Factories
tests/serve/test_vllm.py
Adds LoraTestChatPayload class with state tracking and _ensure_lora_loaded() method that polls /v1/models endpoint until LoRA availability. Introduces lora_chat_payload() factory function for creating test payload instances with LoRA defaults.
LoRA Integration Tests
tests/serve/test_vllm.py
Adds test_lora_aggregated() for single-deployment LoRA inference with MinIO integration. Adds test_lora_aggregated_router() for multi-worker LoRA inference with KV router and deterministic hash seeding. Both tests configure environment variables and exercise LoRA loading and inference workflows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Docker lifecycle management: MinioService.start() and stop() methods with container orchestration and cleanup—verify proper error handling and resource cleanup on failures
  • HTTP readiness probing: MinioService._wait_for_ready() retry logic with timeout—check for off-by-one errors and appropriate exception handling
  • AWS S3 CLI interactions: Bucket creation and LoRA sync operations with endpoint override—verify correct argument passing and error scenarios
  • API polling: load_lora_adapter() function polling /v1/models endpoint—validate timeout logic and polling interval
  • State management in LoraTestChatPayload: The _ensure_lora_loaded() logic and URL override mechanism—ensure thread-safety and proper state transitions
  • Integration test orchestration: Both test_lora_aggregated and test_lora_aggregated_router fixture dependencies and environment variable setup—verify fixture ordering and isolation

Poem

🐰 A MinIO tale springs forth with glee,
LoRAs now dock in S3,
Docker containers, AWS CLI calls,
Through API polls and bucket halls,
Our tests now hop with fresh delight,
Each inference adapter–loaded just right!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: adding CI tests for LoRA functionality in both aggregated and router modes.
Description check ✅ Passed The description covers the overview and related issues but lacks the 'Details' and 'Where should the reviewer start?' sections from the template.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
tests/serve/test_vllm.py (3)

543-564: Use explicit None union in type hints.

The default mutable argument pattern with None is correct, but the type hints should explicitly indicate the parameter accepts None per PEP 484.

     def __init__(
         self,
         body: dict,
         lora_name: str,
         s3_uri: str,
         system_port: int = 8081,
         repeat_count: int = 1,
-        expected_response: list = None,
-        expected_log: list = None,
+        expected_response: list | None = None,
+        expected_log: list | None = None,
         timeout: int = 60,
     ):

566-614: Consider moving imports to module level.

The time and requests imports are placed inside the method. While this works, moving them to the module level improves readability and avoids repeated import resolution.

 import base64
 import logging
 import os
 import random
+import time
 from dataclasses import dataclass, field

 import pytest
+import requests

Then remove lines 569-571 from the method body.


622-653: Same type hint issue as __init__.

 def lora_chat_payload(
     lora_name: str,
     s3_uri: str,
     system_port: int = 8081,
     repeat_count: int = 2,
-    expected_response: list = None,
-    expected_log: list = None,
+    expected_response: list | None = None,
+    expected_log: list | None = None,
     max_tokens: int = 100,
     temperature: float = 0.0,
 ) -> LoraTestChatPayload:
tests/serve/conftest.py (3)

92-109: Consider pinning the MinIO Docker image version.

Using quay.io/minio/minio without a version tag means tests could behave differently across runs if the image is updated. Consider pinning to a specific version for reproducibility.

-            "quay.io/minio/minio",
+            "quay.io/minio/minio:RELEASE.2024-01-01T00-00-00Z",  # Pin to specific version

98-101: Hardcoded ports may cause conflicts in parallel test runs.

Ports 9000 and 9001 are hardcoded. If tests run in parallel or another MinIO instance is running, this will cause port conflicts. Consider using dynamic port allocation or documenting this limitation.


267-282: Consider accepting other 2xx status codes.

The check response.status_code != 200 rejects 201 (Created) and other successful status codes. If the API could return 201 for newly created resources, this would incorrectly raise an error.

-    if response.status_code != 200:
+    if not response.ok:  # Accepts any 2xx status code
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27a1b4e and 42c043a.

📒 Files selected for processing (2)
  • tests/serve/conftest.py (2 hunks)
  • tests/serve/test_vllm.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/serve/test_vllm.py (3)
tests/serve/conftest.py (5)
  • MinioLoraConfig (39-64)
  • load_lora_adapter (267-282)
  • minio_lora_service (321-361)
  • get_s3_uri (50-52)
  • get_env_vars (54-64)
tests/utils/payload_builder.py (1)
  • chat_payload (129-156)
tests/utils/payloads.py (2)
  • ChatPayload (102-155)
  • url (44-46)
tests/serve/conftest.py (2)
tests/conftest.py (1)
  • logger (193-204)
tests/serve/test_vllm.py (1)
  • url (616-619)
🪛 Ruff (0.14.8)
tests/serve/test_vllm.py

550-550: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


551-551: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


612-614: Avoid specifying long messages outside the exception class

(TRY003)


627-627: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


628-628: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


662-662: Unused function argument: runtime_services

(ARG001)


662-662: Unused function argument: predownload_models

(ARG001)


704-704: Unused function argument: runtime_services

(ARG001)


704-704: Unused function argument: predownload_models

(ARG001)

tests/serve/conftest.py

32-32: Possible hardcoded password assigned to: "MINIO_SECRET_KEY"

(S105)


63-63: Probable insecure usage of temporary file or directory: "/tmp/dynamo_loras_minio_test"

(S108)


111-111: subprocess call: check for execution of untrusted input

(S603)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


140-140: subprocess call: check for execution of untrusted input

(S603)


141-141: Starting a process with a partial executable path

(S607)


146-146: subprocess call: check for execution of untrusted input

(S603)


147-147: Starting a process with a partial executable path

(S607)


162-162: subprocess call: check for execution of untrusted input

(S603)


163-170: Starting a process with a partial executable path

(S607)


178-178: subprocess call: check for execution of untrusted input

(S603)


179-186: Starting a process with a partial executable path

(S607)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


200-200: subprocess call: check for execution of untrusted input

(S603)


201-209: Starting a process with a partial executable path

(S607)


215-215: Avoid specifying long messages outside the exception class

(TRY003)


238-238: subprocess call: check for execution of untrusted input

(S603)


239-249: Starting a process with a partial executable path

(S607)


255-255: Avoid specifying long messages outside the exception class

(TRY003)


278-280: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (10)
tests/serve/test_vllm.py (4)

17-30: LGTM!

Import additions are appropriate for the new LoRA testing functionality.


616-619: Side effect in url() is intentional but worth documenting.

The url() method triggers LoRA loading as a side effect. The docstring mentions this, which is good. This design ensures the LoRA is loaded lazily before the first request.


656-695: LGTM!

The test structure is well-organized. The runtime_services and predownload_models parameters are pytest fixtures that set up required infrastructure even though they're not directly referenced in the function body.


698-763: Test structure for multi-worker LoRA is well-designed.

The approach of loading LoRA on both workers separately (ports 8081 and 8082) before running inference ensures proper multi-worker setup. Setting PYTHONHASHSEED=0 for deterministic KV event IDs is a good practice for reproducible testing.

tests/serve/conftest.py (6)

29-36: Constants are appropriate for local testing.

The hardcoded credentials (flagged by static analysis) are acceptable here as these are default MinIO credentials for a local test container, not production secrets.


38-64: LGTM!

The dataclass provides clean configuration management with sensible defaults for test environments.


161-191: Bucket existence check conflates errors.

The s3 ls command returning non-zero could mean the bucket doesn't exist, but it could also indicate authentication failure or network issues. For test infrastructure this is acceptable, but be aware that bucket creation might fail with a confusing error if the real issue is auth/network.


193-222: LGTM!

The LoRA download logic properly handles errors and cleans up the cache directory.


224-255: LGTM!

The upload logic correctly syncs to MinIO while excluding git-related files.


320-361: Fixture cleanup logic is correct.

The dual cleanup_temp() calls serve different purposes:

  1. Before yield: cleans downloaded LoRA files to save space while MinIO retains the uploaded data
  2. In finally: cleans everything including MinIO's data directory

This is working as intended, though adding a brief comment would improve clarity.

@biswapanda biswapanda requested a review from a team as a code owner December 9, 2025 04:56
@biswapanda biswapanda force-pushed the bis/dep-546-lora-aware-kv-routing-support branch from 27a1b4e to 657702a Compare December 9, 2025 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants