Skip to content

Conversation

@MarkRedeman
Copy link
Contributor

Setting the SingleXPUStrategy fixes the error message,

Failed to train pending training job: Device should be xpu, got cpu
instead

After this another error would show up,

Cannot re-initialize XPU in forked subproces. To use XPU with
multiprocessing you must use the `spawn` method

This requires us to set the start method of our main.py file.


I've only verified these fixes on a lunar lake device on which I could train with both its CPU and iGPU.

@MarkRedeman MarkRedeman requested a review from maxxgx as a code owner December 11, 2025 07:42
Copilot AI review requested due to automatic review settings December 11, 2025 07:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables XPU (Intel GPU) training support by configuring the appropriate strategy and multiprocessing settings to resolve device mismatch and subprocess initialization errors.

  • Sets SingleXPUStrategy when training on XPU devices to fix device type mismatches
  • Configures multiprocessing to use 'spawn' start method required for XPU subprocess compatibility

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
application/backend/src/services/training_service.py Adds conditional SingleXPUStrategy configuration for XPU accelerator
application/backend/src/main.py Sets multiprocessing start method to 'spawn' for XPU compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 6 to 7
if mp.get_start_method(allow_none=True) != 'spawn':
mp.set_start_method('spawn', force=True)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the multiprocessing start method globally at module import time can cause issues in environments where other code has already initialized multiprocessing with a different method, or where different start methods are needed for different operations. Consider documenting why this global configuration is necessary and whether it could impact other parts of the application that use multiprocessing.

Copilot uses AI. Check for mistakes.
@ashwinvaidya17 ashwinvaidya17 changed the title Geti Inspect: Enable training on XPU using SingleXPUStrategy fix(inspect): Enable training on XPU using SingleXPUStrategy Dec 11, 2025
Copilot AI review requested due to automatic review settings December 11, 2025 09:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

max_epochs=max_epochs,
callbacks=[GetiInspectProgressCallback(synchronization_parameters)],
accelerator=training_device,
**({"strategy": SingleXPUStrategy()} if training_device == "xpu" else {}),
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dictionary unpacking with a conditional expression makes this line harder to read. Consider extracting this logic into a separate variable or using a more explicit conditional structure for better clarity.

Copilot uses AI. Check for mistakes.
@MarkRedeman MarkRedeman force-pushed the mark/fix-training-on-xpu branch from aadf26c to 11fe3d9 Compare December 12, 2025 08:23
Setting the SingleXPUStrategy fixes the error message,
```
Failed to train pending training job: Device should be xpu, got cpu
instead
```

After this another error would show up,
```
Cannot re-initialize XPU in forked subproces. To use XPU with
multiprocessing you must use the `spawn` method
```

This requires us to set the start method of our `main.py` file.

Signed-off-by: Mark Redeman <[email protected]>
@MarkRedeman MarkRedeman force-pushed the mark/fix-training-on-xpu branch from 11fe3d9 to 8635475 Compare December 12, 2025 08:23
Comment on lines +6 to +7
if mp.get_start_method(allow_none=True) != "spawn":
mp.set_start_method("spawn", force=True)
Copy link
Contributor

@maxxgx maxxgx Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, set_start_method should be called inside of the __main__ guard: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method

If called from a sub process, it might result in a run time error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants