Skip to content

Running Container Lost GPU Permission #50

@yukirora

Description

@yukirora

How to reproduce the issue:

  1. Create a job on the platform and run nvidia-smi in a infinite loop, it works well
  2. Run the sudo systemctl daemon-reload on the VM in the job
  3. The job will show 'Failed to initialize NVML: Unknown Error' failure

Related Issues:
NVIDIA/gpu-operator#485
NVIDIA/nvidia-container-toolkit#48

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions