CEP request: solver-usable package metadata for GPU architectures

Conda has grown several virtual packages over the years, which have provided very useful to distinguish system capabilities, and how those relate to the specific requirements of a given package (e.g. `__cuda`, `__archspec`, etc.).

Currently, there's no possibility though to reflect the underlying GPU architecture requirements of a given package, which lead to a very painful situation in https://github.com/conda-forge/cudnn-feedstock/issues/124.

Conda-forge assumed that `cudnn` would remain binary-compatible across the 9.x line - thus creating run-exports with `{{ pin_subpackage("cudnn", max_pin="x") }}`, and indeed the API/ABI didn't change. 

However, `cudnn` 9.11+ dropped various architectures that had previously been deprecated, _also_ for the CUDA 12.x line which still supports these architectures. This started causing segfaults (clearly something we need to fix), but only for users on affected cards.

By far the best solution would have been to imbue those packages with metadata that the required minimum architecture had changed, and let the solver discard those newer packages on systems with unsupported architectures.

Since this wasn't possible, the only option was to mark all those affected newer `cudnn` builds as broken permanently (a very disruptive operation given that many cudnn-dependent packages already have builds requiring newer cudnn through their run-export), and making it impossible for cudnn to be updated on the CUDA 12.x line, which leaves many improvements on the table for users that would have a new-enough architecture (CUDA 13.x is different because there, those older architectures were never supported in the first place).

Overall, this has been a very painful experience, not helped by the fact that nvidia simply refused to reinstate support for the dropped architectures (even as a one-off) in a newer cudnn release. Hopefully, the lessons learned by all the people involved will reduce the probability of a similar episode in the near future; however, it's clear that better support from conda here would be the most elegant solution which would rule out such issues from ever reoccurring in the first place.

PS. The wheelnext initiative over in PyPI-land (which has a bunch of problems that conda already solved) is considering this aspect of the problem space as well (see [here](https://wheelnext.dev/proposals/pepxxx_wheel_variant_support/#user-stories)), perhaps there's some useful cross-pollination to be had.

PPS. Originally I thought we could just use a linear ordering of the various `sm_50, sm_60, sm_70` etc., but CUDA 12.8 introduced the notion of architecture families, which will need to be accounted for. Also, builds against `sm_53` should still be compatible with `sm_50`, AFAIU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CEP request: solver-usable package metadata for GPU architectures #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CEP request: solver-usable package metadata for GPU architectures #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions