[CI] Add ep tests to CI by hao-aaron · Pull Request #1404 · NovaSky-AI/SkyRL

hao-aaron · 2026-03-28T01:11:31Z

modify ep tests to run with new inference, and add to skyrl train CI.

verify that ep tests pass with old and new inference on L4
run full Ci

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

gemini-code-assist

Code Review

This pull request enables expert parallel (EP) inference support by adding a new CI test, updating the vLLM CLI argument builder to include the enable_expert_parallel flag, and refactoring EP inference tests to use the InferenceEngineState utility. The test suite now uses a smaller MoE model and ensures environment variables are correctly propagated during Ray initialization. A critical concern was raised regarding the commented-out connector cleanup logic in the remote inference client, which could lead to file descriptor leaks if the event loop changes.

gemini-code-assist · 2026-03-28T01:41:33Z

+            # TODO (aaron): fix cleanup
+            # _force_close_connector(self._session.connector)


Commenting out _force_close_connector could lead to file descriptor leaks if the event loop changes. The TODO indicates you're aware of this, but it's a potentially serious issue for a long-running service. Could you please address the cleanup now or explain why disabling this is safe? If it's causing issues in tests, perhaps there's a better way to handle session cleanup.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

SumanthRH · 2026-03-30T23:17:51Z

+            # TODO (aaron): fix cleanup
+            # _force_close_connector(self._session.connector)


This can be reverted, I am addressing cleanup issues in #1387

SumanthRH · 2026-03-30T23:20:08Z

 _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py
 _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py
 _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_lora.py
+_SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_expert_parallel_inference.py


Have you timed how long this takes on L4s?

no, but I don't remember it taking more than 10 mins. The model is pretty small

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH and others added 4 commits March 27, 2026 22:26

ep support

94c8561

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

6383d92

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

7c9ddb5

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

merge

4eaebc9

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review March 28, 2026 01:40

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

devin-ai-integration bot reviewed Mar 28, 2026

View reviewed changes

SumanthRH self-assigned this Mar 30, 2026

SumanthRH reviewed Mar 30, 2026

View reviewed changes

x

d4ca662

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH approved these changes Mar 31, 2026

View reviewed changes

SumanthRH merged commit 2d9cb02 into NovaSky-AI:main Mar 31, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add ep tests to CI#1404

[CI] Add ep tests to CI#1404
SumanthRH merged 5 commits intoNovaSky-AI:mainfrom
hao-aaron:ep-support

hao-aaron commented Mar 28, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 28, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

SumanthRH Mar 30, 2026

Uh oh!

SumanthRH Mar 30, 2026

Uh oh!

hao-aaron Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# TODO (aaron): fix cleanup
		# _force_close_connector(self._session.connector)

Conversation

hao-aaron commented Mar 28, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

SumanthRH Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

hao-aaron Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hao-aaron commented Mar 28, 2026 •

edited by devin-ai-integration bot

Loading