-
Notifications
You must be signed in to change notification settings - Fork 848
fix(inspect): Enable training on XPU using SingleXPUStrategy #3210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/geti-inspect
Are you sure you want to change the base?
fix(inspect): Enable training on XPU using SingleXPUStrategy #3210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables XPU (Intel GPU) training support by configuring the appropriate strategy and multiprocessing settings to resolve device mismatch and subprocess initialization errors.
- Sets
SingleXPUStrategywhen training on XPU devices to fix device type mismatches - Configures multiprocessing to use 'spawn' start method required for XPU subprocess compatibility
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| application/backend/src/services/training_service.py | Adds conditional SingleXPUStrategy configuration for XPU accelerator |
| application/backend/src/main.py | Sets multiprocessing start method to 'spawn' for XPU compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
application/backend/src/main.py
Outdated
| if mp.get_start_method(allow_none=True) != 'spawn': | ||
| mp.set_start_method('spawn', force=True) |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the multiprocessing start method globally at module import time can cause issues in environments where other code has already initialized multiprocessing with a different method, or where different start methods are needed for different operations. Consider documenting why this global configuration is necessary and whether it could impact other parts of the application that use multiprocessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| max_epochs=max_epochs, | ||
| callbacks=[GetiInspectProgressCallback(synchronization_parameters)], | ||
| accelerator=training_device, | ||
| **({"strategy": SingleXPUStrategy()} if training_device == "xpu" else {}), |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using dictionary unpacking with a conditional expression makes this line harder to read. Consider extracting this logic into a separate variable or using a more explicit conditional structure for better clarity.
aadf26c to
11fe3d9
Compare
Setting the SingleXPUStrategy fixes the error message, ``` Failed to train pending training job: Device should be xpu, got cpu instead ``` After this another error would show up, ``` Cannot re-initialize XPU in forked subproces. To use XPU with multiprocessing you must use the `spawn` method ``` This requires us to set the start method of our `main.py` file. Signed-off-by: Mark Redeman <[email protected]>
11fe3d9 to
8635475
Compare
| if mp.get_start_method(allow_none=True) != "spawn": | ||
| mp.set_start_method("spawn", force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, set_start_method should be called inside of the __main__ guard: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method
If called from a sub process, it might result in a run time error.
Setting the SingleXPUStrategy fixes the error message,
After this another error would show up,
This requires us to set the start method of our
main.pyfile.I've only verified these fixes on a lunar lake device on which I could train with both its CPU and iGPU.