Skip to content

Replace nvgpu with nvidia-ml-py#2160

Open
matthewfeickert wants to merge 2 commits intoPlasmaControl:masterfrom
matthewfeickert:feat/drop-nvgpu-for-nvidia-ml-py
Open

Replace nvgpu with nvidia-ml-py#2160
matthewfeickert wants to merge 2 commits intoPlasmaControl:masterfrom
matthewfeickert:feat/drop-nvgpu-for-nvidia-ml-py

Conversation

@matthewfeickert
Copy link
Copy Markdown

@matthewfeickert matthewfeickert commented Apr 12, 2026

Resolves #2159

  • As nvgpu is no longer maintained and uses pynvml, which directly tells the user to use nvidia-ml-py instead at import

The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you."

drop nvgpu and replace its nvgpu.gpu_info() call with a single function using nvidia-ml-py (which uses the pynvml namespace).

  • Place a lower bound on nvidia-ml-py of 12.535.77, which was the first release to support nvmlMemory_v2 which properly accounts for system-reserved memory.
  • Remove all mentions of nvgpu in other areas of the codebase and replace them with nvidia-ml-py, except for publications/ as this is historical information.

Example:

$ nvidia-smi --version
NVIDIA-SMI version  : 590.48.01
NVML version        : 590.48
DRIVER version      : 590.48.01
CUDA Version        : 13.1
$ uv venv main
$ . main/bin/activate
$ uv pip install .
$ python
Python 3.13.7 (main, Sep 18 2025, 19:47:49) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import desc
>>> desc.set_device("gpu")
/tmp/DESC/main/lib/python3.13/site-packages/nvgpu/__init__.py:8: SyntaxWarning: invalid escape sequence '\('
  gpu_infos = [re.match('GPU ([0-9]+): (.+?) \(UUID: ([^)]+)\)', gpu) for gpu in gpus]
>>> import os
>>> os.environ["CUDA_VISIBLE_DEVICES"]
'0'
>>> 
$ python -c 'import nvgpu; print(nvgpu.gpu_info())'
[{'index': '0', 'type': 'NVIDIA GeForce RTX 4060 Laptop GPU', 'uuid': 'GPU-7fef9454-d8d1-86cf-c4b3-e2fd5e35e862', 'mem_used': 8, 'mem_total': 8188, 'mem_used_percent': 0.09770395701025891}]
  • This PR
$ uv venv
$ . .venv/bin/activate
$ uv pip install .
$ python
Python 3.13.7 (main, Sep 18 2025, 19:47:49) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import desc
>>> desc.set_device("gpu")
>>> import os
>>> os.environ["CUDA_VISIBLE_DEVICES"]
'0'
>>> 

As I made it a guarded import I can't directly import it from desc, but is the same code

# _implementation.py
from pynvml import (
    nvmlDeviceGetCount,
    nvmlDeviceGetHandleByIndex,
    nvmlDeviceGetMemoryInfo,
    nvmlDeviceGetName,
    nvmlDeviceGetUUID,
    nvmlInit,
    nvmlMemory_v2,
    nvmlShutdown,
)


def _gpu_info():
    """Equivalent to nvgpu.gpu_info() using nvidia-ml-py."""
    nvmlInit()
    try:
        info = []
        for device_idx in range(nvmlDeviceGetCount()):
            handle = nvmlDeviceGetHandleByIndex(device_idx)
            mem = nvmlDeviceGetMemoryInfo(handle, version=nvmlMemory_v2)
            _bytes_to_mib = 1024 * 1024
            mem_used = mem.used // _bytes_to_mib
            mem_total = mem.total // _bytes_to_mib
            info.append(
                {
                    "index": str(device_idx),
                    "type": nvmlDeviceGetName(handle),
                    "uuid": nvmlDeviceGetUUID(handle),
                    "mem_used": mem_used,
                    "mem_total": mem_total,
                    "mem_used_percent": 100.0 * mem_used / mem_total,
                }
            )
        return info
    finally:
        nvmlShutdown()


if __name__ == "__main__":
    print(_gpu_info())

so

$ python ./_implementation.py
[{'index': '0', 'type': 'NVIDIA GeForce RTX 4060 Laptop GPU', 'uuid': 'GPU-7fef9454-d8d1-86cf-c4b3-e2fd5e35e862', 'mem_used': 7, 'mem_total': 8188, 'mem_used_percent': 0.08549096238397655}]

So the memory consumption is effectively the same (good), and an unmaintained dependency can be replaced with a maintained one.

@matthewfeickert matthewfeickert marked this pull request as ready for review April 12, 2026 07:15
@matthewfeickert matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch from 3671f39 to f63fb54 Compare April 12, 2026 07:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 12, 2026

Memory benchmark result

|               Test Name                |      %Δ      |    Master (MB)     |      PR (MB)       |    Δ (MB)    |    Time PR (s)     |  Time Master (s)   |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
  test_objective_jac_w7x                 |    1.12 %    |     4.147e+03      |     4.194e+03      |    46.34     |       42.08        |       39.87        |
  test_proximal_jac_w7x_with_eq_update   |   -0.90 %    |     6.617e+03      |     6.557e+03      |    -59.54    |       164.59       |       164.78       |
  test_proximal_freeb_jac                |   -0.26 %    |     1.343e+04      |     1.340e+04      |    -34.41    |       89.57        |       89.58        |
  test_proximal_freeb_jac_blocked        |   -0.10 %    |     7.728e+03      |     7.720e+03      |    -7.67     |       79.54        |       78.70        |
  test_proximal_freeb_jac_batched        |    0.15 %    |     7.713e+03      |     7.724e+03      |    11.63     |       77.81        |       78.62        |
  test_proximal_jac_ripple               |   -1.13 %    |     3.648e+03      |     3.606e+03      |    -41.31    |       62.49        |       64.32        |
  test_proximal_jac_ripple_bounce1d      |   -1.72 %    |     3.853e+03      |     3.787e+03      |    -66.38    |       76.13        |       76.65        |
  test_eq_solve                          |    2.62 %    |     2.209e+03      |     2.267e+03      |    57.96     |       99.69        |       99.39        |

For the memory plots, go to the summary of Memory Benchmarks workflow and download the artifact.

@matthewfeickert matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch from f63fb54 to 11c073b Compare April 12, 2026 07:40
@matthewfeickert
Copy link
Copy Markdown
Author

This is ready for review, but needs a maintainer to approve the CI runs. Let me know if you have any questions. 👍

@matthewfeickert matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch 2 times, most recently from a2a0a9c to aba1019 Compare April 12, 2026 07:58
* As nvgpu is no longer maintained and uses pynvml, which directly tells the user
  to use nvidia-ml-py instead at import

  "The pynvml package is deprecated. Please install nvidia-ml-py instead. If you
   did not install pynvml directly, please report this to the maintainers of the
   package that installed pynvml for you."

  drop nvgpu and replace its nvgpu.gpu_info() call with a single function using
  nvidia-ml-py (which uses the pynvml namespace).
* Place a lower bound on nvidia-ml-py of 12.535.77, which was the first release
  to support nvmlMemory_v2 which properly accounts for system-reserved memory.
* Remove all mentions of nvgpu in other areas of the codebase and replace them
  with nvidia-ml-py, except for publications/ as this is historical information.
   - Do NOT add nvidia-ml-py to dependabot.yml as pinning this tightly is an
     anti-pattern in library design that will cause installation issues,
     especially with NVIDIA libraries.
* Note that nvgpu has been replaced with nvidia-ml-py.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.41%. Comparing base (2471d55) to head (37170df).

Files with missing lines Patch % Lines
desc/__init__.py 0.00% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2160      +/-   ##
==========================================
- Coverage   94.45%   94.41%   -0.04%     
==========================================
  Files         101      101              
  Lines       28593    28606      +13     
==========================================
+ Hits        27008    27009       +1     
- Misses       1585     1597      +12     
Files with missing lines Coverage Δ
desc/__init__.py 36.48% <0.00%> (-7.78%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@matthewfeickert
Copy link
Copy Markdown
Author

Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Given that the changes in this PR should be covered by

desc.set_device("gpu")

desc.set_device("gpu")

set_device("gpu")

I assume the lack of coverage reported is that the benchmarks haven't finished running / need additional approval to run.

@dpanici
Copy link
Copy Markdown
Collaborator

dpanici commented Apr 12, 2026

Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Given that the changes in this PR should be covered by

desc.set_device("gpu")

desc.set_device("gpu")

set_device("gpu")

I assume the lack of coverage reported is that the benchmarks haven't finished running / need additional approval to run.

Benchmarks don't increase coverage, and our CI cannot run GPU things, so this will just have 0 coverage which is fine with the devs

@matthewfeickert
Copy link
Copy Markdown
Author

Benchmarks don't increase coverage, and our CI cannot run GPU things, so this will just have 0 coverage which is fine with the devs

Great. Thanks for the very fast follow up!

@ddudt ddudt requested review from a team, YigitElma, ddudt, dpanici, f0uriest, rahulgaur104 and unalmis and removed request for a team April 13, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is nvgpu a critical dependency or can it be replaced with nvidia-ml-py?

2 participants