-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Conda has grown several virtual packages over the years, which have provided very useful to distinguish system capabilities, and how those relate to the specific requirements of a given package (e.g. __cuda, __archspec, etc.).
Currently, there's no possibility though to reflect the underlying GPU architecture requirements of a given package, which lead to a very painful situation in conda-forge/cudnn-feedstock#124.
Conda-forge assumed that cudnn would remain binary-compatible across the 9.x line - thus creating run-exports with {{ pin_subpackage("cudnn", max_pin="x") }}, and indeed the API/ABI didn't change.
However, cudnn 9.11+ dropped various architectures that had previously been deprecated, also for the CUDA 12.x line which still supports these architectures. This started causing segfaults (clearly something we need to fix), but only for users on affected cards.
By far the best solution would have been to imbue those packages with metadata that the required minimum architecture had changed, and let the solver discard those newer packages on systems with unsupported architectures.
Since this wasn't possible, the only option was to mark all those affected newer cudnn builds as broken permanently (a very disruptive operation given that many cudnn-dependent packages already have builds requiring newer cudnn through their run-export), and making it impossible for cudnn to be updated on the CUDA 12.x line, which leaves many improvements on the table for users that would have a new-enough architecture (CUDA 13.x is different because there, those older architectures were never supported in the first place).
Overall, this has been a very painful experience, not helped by the fact that nvidia simply refused to reinstate support for the dropped architectures (even as a one-off) in a newer cudnn release. Hopefully, the lessons learned by all the people involved will reduce the probability of a similar episode in the near future; however, it's clear that better support from conda here would be the most elegant solution which would rule out such issues from ever reoccurring in the first place.
PS. The wheelnext initiative over in PyPI-land (which has a bunch of problems that conda already solved) is considering this aspect of the problem space as well (see here), perhaps there's some useful cross-pollination to be had.
PPS. Originally I thought we could just use a linear ordering of the various sm_50, sm_60, sm_70 etc., but CUDA 12.8 introduced the notion of architecture families, which will need to be accounted for. Also, builds against sm_53 should still be compatible with sm_50, AFAIU.