Conversation
There was a problem hiding this comment.
Code Review
This pull request enables expert parallel (EP) inference support by adding a new CI test, updating the vLLM CLI argument builder to include the enable_expert_parallel flag, and refactoring EP inference tests to use the InferenceEngineState utility. The test suite now uses a smaller MoE model and ensures environment variables are correctly propagated during Ray initialization. A critical concern was raised regarding the commented-out connector cleanup logic in the remote inference client, which could lead to file descriptor leaks if the event loop changes.
| # TODO (aaron): fix cleanup | ||
| # _force_close_connector(self._session.connector) |
There was a problem hiding this comment.
Commenting out _force_close_connector could lead to file descriptor leaks if the event loop changes. The TODO indicates you're aware of this, but it's a potentially serious issue for a long-running service. Could you please address the cleanup now or explain why disabling this is safe? If it's causing issues in tests, perhaps there's a better way to handle session cleanup.
| # TODO (aaron): fix cleanup | ||
| # _force_close_connector(self._session.connector) |
There was a problem hiding this comment.
This can be reverted, I am addressing cleanup issues in #1387
| _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py | ||
| _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py | ||
| _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_lora.py | ||
| _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_expert_parallel_inference.py |
There was a problem hiding this comment.
Have you timed how long this takes on L4s?
There was a problem hiding this comment.
no, but I don't remember it taking more than 10 mins. The model is pretty small
modify ep tests to run with new inference, and add to skyrl train CI.