How to reproduce the issue:
- Create a job on the platform and run nvidia-smi in a infinite loop, it works well
- Run the sudo systemctl daemon-reload on the VM in the job
- The job will show 'Failed to initialize NVML: Unknown Error' failure
Related Issues:
NVIDIA/gpu-operator#485
NVIDIA/nvidia-container-toolkit#48