Replace nvgpu with nvidia-ml-py by matthewfeickert · Pull Request #2160 · PlasmaControl/DESC

matthewfeickert · 2026-04-12T06:59:05Z

Resolves #2159

As nvgpu is no longer maintained and uses pynvml, which directly tells the user to use nvidia-ml-py instead at import

The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you."

drop nvgpu and replace its nvgpu.gpu_info() call with a single function using nvidia-ml-py (which uses the pynvml namespace).

Place a lower bound on nvidia-ml-py of 12.535.77, which was the first release to support nvmlMemory_v2 which properly accounts for system-reserved memory.
Remove all mentions of nvgpu in other areas of the codebase and replace them with nvidia-ml-py, except for publications/ as this is historical information.
- Do not add nvidia-ml-py to dependabot.yml as pinning this tightly will cause installation issues, especially with NVIDIA libraries.

Example:

$ nvidia-smi --version
NVIDIA-SMI version  : 590.48.01
NVML version        : 590.48
DRIVER version      : 590.48.01
CUDA Version        : 13.1

On main (2471d55)

$ uv venv main
$ . main/bin/activate
$ uv pip install .
$ python
Python 3.13.7 (main, Sep 18 2025, 19:47:49) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import desc
>>> desc.set_device("gpu")
/tmp/DESC/main/lib/python3.13/site-packages/nvgpu/__init__.py:8: SyntaxWarning: invalid escape sequence '\('
  gpu_infos = [re.match('GPU ([0-9]+): (.+?) \(UUID: ([^)]+)\)', gpu) for gpu in gpus]
>>> import os
>>> os.environ["CUDA_VISIBLE_DEVICES"]
'0'
>>>

$ python -c 'import nvgpu; print(nvgpu.gpu_info())'
[{'index': '0', 'type': 'NVIDIA GeForce RTX 4060 Laptop GPU', 'uuid': 'GPU-7fef9454-d8d1-86cf-c4b3-e2fd5e35e862', 'mem_used': 8, 'mem_total': 8188, 'mem_used_percent': 0.09770395701025891}]

This PR

$ uv venv
$ . .venv/bin/activate
$ uv pip install .
$ python
Python 3.13.7 (main, Sep 18 2025, 19:47:49) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import desc
>>> desc.set_device("gpu")
>>> import os
>>> os.environ["CUDA_VISIBLE_DEVICES"]
'0'
>>>

As I made it a guarded import I can't directly import it from desc, but is the same code

# _implementation.py
from pynvml import (
    nvmlDeviceGetCount,
    nvmlDeviceGetHandleByIndex,
    nvmlDeviceGetMemoryInfo,
    nvmlDeviceGetName,
    nvmlDeviceGetUUID,
    nvmlInit,
    nvmlMemory_v2,
    nvmlShutdown,
)


def _gpu_info():
    """Equivalent to nvgpu.gpu_info() using nvidia-ml-py."""
    nvmlInit()
    try:
        info = []
        for device_idx in range(nvmlDeviceGetCount()):
            handle = nvmlDeviceGetHandleByIndex(device_idx)
            mem = nvmlDeviceGetMemoryInfo(handle, version=nvmlMemory_v2)
            _bytes_to_mib = 1024 * 1024
            mem_used = mem.used // _bytes_to_mib
            mem_total = mem.total // _bytes_to_mib
            info.append(
                {
                    "index": str(device_idx),
                    "type": nvmlDeviceGetName(handle),
                    "uuid": nvmlDeviceGetUUID(handle),
                    "mem_used": mem_used,
                    "mem_total": mem_total,
                    "mem_used_percent": 100.0 * mem_used / mem_total,
                }
            )
        return info
    finally:
        nvmlShutdown()


if __name__ == "__main__":
    print(_gpu_info())

so

$ python ./_implementation.py
[{'index': '0', 'type': 'NVIDIA GeForce RTX 4060 Laptop GPU', 'uuid': 'GPU-7fef9454-d8d1-86cf-c4b3-e2fd5e35e862', 'mem_used': 7, 'mem_total': 8188, 'mem_used_percent': 0.08549096238397655}]

So the memory consumption is effectively the same (good), and an unmaintained dependency can be replaced with a maintained one.

github-actions · 2026-04-12T07:24:06Z

Memory benchmark result

|               Test Name                |      %Δ      |    Master (MB)     |      PR (MB)       |    Δ (MB)    |    Time PR (s)     |  Time Master (s)   |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
  test_objective_jac_w7x                 |    1.12 %    |     4.147e+03      |     4.194e+03      |    46.34     |       42.08        |       39.87        |
  test_proximal_jac_w7x_with_eq_update   |   -0.90 %    |     6.617e+03      |     6.557e+03      |    -59.54    |       164.59       |       164.78       |
  test_proximal_freeb_jac                |   -0.26 %    |     1.343e+04      |     1.340e+04      |    -34.41    |       89.57        |       89.58        |
  test_proximal_freeb_jac_blocked        |   -0.10 %    |     7.728e+03      |     7.720e+03      |    -7.67     |       79.54        |       78.70        |
  test_proximal_freeb_jac_batched        |    0.15 %    |     7.713e+03      |     7.724e+03      |    11.63     |       77.81        |       78.62        |
  test_proximal_jac_ripple               |   -1.13 %    |     3.648e+03      |     3.606e+03      |    -41.31    |       62.49        |       64.32        |
  test_proximal_jac_ripple_bounce1d      |   -1.72 %    |     3.853e+03      |     3.787e+03      |    -66.38    |       76.13        |       76.65        |
  test_eq_solve                          |    2.62 %    |     2.209e+03      |     2.267e+03      |    57.96     |       99.69        |       99.39        |

For the memory plots, go to the summary of Memory Benchmarks workflow and download the artifact.

matthewfeickert · 2026-04-12T07:44:17Z

This is ready for review, but needs a maintainer to approve the CI runs. Let me know if you have any questions. 👍

* As nvgpu is no longer maintained and uses pynvml, which directly tells the user to use nvidia-ml-py instead at import "The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you." drop nvgpu and replace its nvgpu.gpu_info() call with a single function using nvidia-ml-py (which uses the pynvml namespace). * Place a lower bound on nvidia-ml-py of 12.535.77, which was the first release to support nvmlMemory_v2 which properly accounts for system-reserved memory. * Remove all mentions of nvgpu in other areas of the codebase and replace them with nvidia-ml-py, except for publications/ as this is historical information. - Do NOT add nvidia-ml-py to dependabot.yml as pinning this tightly is an anti-pattern in library design that will cause installation issues, especially with NVIDIA libraries.

* Note that nvgpu has been replaced with nvidia-ml-py.

codecov · 2026-04-12T18:54:04Z

Codecov Report

❌ Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.41%. Comparing base (2471d55) to head (37170df).

Files with missing lines	Patch %	Lines
desc/__init__.py	0.00%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2160      +/-   ##
==========================================
- Coverage   94.45%   94.41%   -0.04%     
==========================================
  Files         101      101              
  Lines       28593    28606      +13     
==========================================
+ Hits        27008    27009       +1     
- Misses       1585     1597      +12

Files with missing lines	Coverage Δ
desc/__init__.py	`36.48% <0.00%> (-7.78%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

matthewfeickert · 2026-04-12T23:12:47Z

Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Given that the changes in this PR should be covered by

DESC/tests/benchmarks/benchmark_gpu_small.py

Line 11 in 2471d55

desc.set_device("gpu")

DESC/tests/benchmarks/benchmark_gpu_large.py

Line 12 in 2471d55

desc.set_device("gpu")

DESC/tests/benchmarks/memory_funcs.py

Line 21 in 2471d55

set_device("gpu")

I assume the lack of coverage reported is that the benchmarks haven't finished running / need additional approval to run.

dpanici · 2026-04-12T23:27:59Z

Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Given that the changes in this PR should be covered by

DESC/tests/benchmarks/benchmark_gpu_small.py

Line 11 in 2471d55

desc.set_device("gpu")

DESC/tests/benchmarks/benchmark_gpu_large.py

Line 12 in 2471d55

desc.set_device("gpu")

DESC/tests/benchmarks/memory_funcs.py

Line 21 in 2471d55

set_device("gpu")

I assume the lack of coverage reported is that the benchmarks haven't finished running / need additional approval to run.

Benchmarks don't increase coverage, and our CI cannot run GPU things, so this will just have 0 coverage which is fine with the devs

matthewfeickert · 2026-04-12T23:29:38Z

Benchmarks don't increase coverage, and our CI cannot run GPU things, so this will just have 0 coverage which is fine with the devs

Great. Thanks for the very fast follow up!

matthewfeickert marked this pull request as ready for review April 12, 2026 07:15

matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch from 3671f39 to f63fb54 Compare April 12, 2026 07:21

matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch from f63fb54 to 11c073b Compare April 12, 2026 07:40

matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch 2 times, most recently from a2a0a9c to aba1019 Compare April 12, 2026 07:58

matthewfeickert added 2 commits April 12, 2026 02:08

Add dependency change to CHANGELOG

37170df

* Note that nvgpu has been replaced with nvidia-ml-py.

matthewfeickert force-pushed the feat/drop-nvgpu-for-nvidia-ml-py branch from aba1019 to 37170df Compare April 12, 2026 08:08

matthewfeickert mentioned this pull request Apr 12, 2026

Add orthax and desc-opt conda-forge/staged-recipes#32937

Draft

9 tasks

ddudt requested review from a team, YigitElma, ddudt, dpanici, f0uriest, rahulgaur104 and unalmis and removed request for a team April 13, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace nvgpu with nvidia-ml-py#2160

Replace nvgpu with nvidia-ml-py#2160
matthewfeickert wants to merge 2 commits intoPlasmaControl:masterfrom
matthewfeickert:feat/drop-nvgpu-for-nvidia-ml-py

matthewfeickert commented Apr 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

dpanici commented Apr 12, 2026

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthewfeickert commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Memory benchmark result

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026

Codecov Report

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

dpanici commented Apr 12, 2026

Uh oh!

matthewfeickert commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthewfeickert commented Apr 12, 2026 •

edited

Loading

github-actions bot commented Apr 12, 2026 •

edited

Loading