Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions cep-repodata-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# CEP XX: New repodata and matchspec features

| Title | A short title of the proposal |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Status | Draft |
| Author(s) | Wolf Vollprecht <[email protected]> |
| Created | 2025-02-05T12:10:51Z |
| Updated | 2025-02-05T12:10:58Z |
| Discussion | link to the PR where the CEP is being discussed, NA is circulated initially |
| Implementation | Flags: https://github.com/conda/rattler/pull/1040, Optional: https://github.com/conda/rattler/pull/1019, Conditional: https://github.com/prefix-dev/resolvo/pull/101 |


The conda ecosystem finds and resolves packages based on "repodata" - the main index metadata for all artifacts in the condaverse.

Repodata has been a stable format for a long time now. It generally consists of at least the following fields:

```yaml
name: name of the package
version: version of the package
build_string: build string of the package
build_number: build number of the package
depends: [MatchSpec] dependencies of the package, expressed as "triplet" matchspec
constrains: [MatchSpec] constraints that the package adds to the resolution aka optional dependencies
```

All other fields are mostly for metadata purposes and not listed.

With this CEP we would like to add 3 new fields to a proposed "repodata.v2" format.

The fields serve three different purposes:

- `extras`: optional dependency sets as known from the PyPI world. For examples, `sqlalchemy` might be a small base package that defines a number of extras such as `mysql`, `postgres`, `sqlite` that would pull in dependency sets as needed
- `conditional` dependencies, also widely known from the Python world. These are activated only when the condition is true. For example, certain dependencies such as `pywin32` are only relevant on Windows and not on macOS or Linux.
- `flags` are used to make it easier to select variants. Compiled packages can often be compiled with different options which results in different variants (for example, Debug vs. Release builds). With `flags` it will be trivial to select the preferred build with a syntax such as `foobar[flags=['release']]`. Flags are free-form and can be used by distributions such as conda-forge to differentiate between gpu and non-gpu builds as well.

## Extra dependency sets

We want to define a new `extras` key in `RepoData`. The key will be a dictionary mapping from String to list of MatchSpecs:

```yaml
name: sqlalchemy
version: 1.0.0
depends:
- python >=3.8
extras:
sqlite:
- sqlite >=1.5
- py-sqlite-adapter 1.0
postgres:
- postgres >=3.5
- pyxpgres >=8
```

When a user, or a dependency, selects an extra through a MatchSpec, the extra and it's dependencies are "activated". This is conceptually the same as having three packages with "exact" dependencies from the "extra" to the base package: `sqlalchemy`, `sqlalchemy-sqlite` and `sqlalchemy-postgres` – which is the workaround currently employed by a number of packages on conda-forge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify what happens if an extra is requested for a package but the selected variant doesnt provide that extra. E.g. what happens if I depend on a foobar[extras=["doesntexist"]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it but its also not defined how to depend on an extra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed this today some more and there are a few options:

  • An extra implicitly always exists (as empty) and so we would always choose the latest "base" package (whether it has the extra or not). We would then warn if it does not have the extra. I believe this is the behavior that pip implements, and makes it easier to be forward compatible even if other dependencies might ask for an extra that does not exist anymore.
  • We could strictly require that the extra is there and otherwise fail (or select an older version of the package that has the extra).
  • We could forward propagate extras (e.g. the first time we see an extra, we implicitly add it to all later versions, even if empty) which would prevent the solver from choosing a version before the extra existed, but might choose a later one after the extra was removed).

I think we should decide on a behavior, most likely choosing the one from pip. We should also read up on the idea of adding a "default" extra (default dependencies that can be deselected with no-default-extra or something along the lines which I recently saw being proposed somewhere in PyPI world as part of wheel-next I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP for default extras: https://peps.python.org/pep-0771/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a proposal for the matchspec syntax for these extras? Something like this?

conda install sqlalchemy[sqlite]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What currently works is sqlalchemy[extras=sqlite].

Please note that [ ... ] is a syntax already in conda to specify key-value pairs, e.g. foobar[sha256="1234..."].


## Conditional dependencies

Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like this idea. An implementation note: while if __unix is reasonably easy to implement because it is "static", if python <3.8 is conceptually much harder as it is not something that can be decided ahead of solving. I requires to be able to adapt the dependencies of a package as partial candidates are investigated during solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a form of boolean dependencies in rpm (already supported by libsolv I believe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that implements this for resolvo: prefix-dev/resolvo#136


The proposed syntax is:

```yaml
name: sqlalchemy
version: 1.0.0
depends:
- python >=3.8
- pywin32; if __win
- six; if python <3.8
```

The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we didn't need the ; because somehow conda will happily ignore the if parts while parsing MatchSpecs.

>>> from conda.models.match_spec import MatchSpec as M
>>> M("python 3.8 * if python")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 'if' if  __win")  # quote 'if' to force parse it as a build string
MatchSpec("python==3.8='if'")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also ignore parenthesised blocks:

>>> M("python 3.8 * (__win)")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 (__win)")
MatchSpec("python==3.8")
>>> M("python 3.8 (__win and __osx)")
MatchSpec("python==3.8")
>>> M("python 3.8 (if __win and __osx)")
MatchSpec("python==3.8")

Copy link
Contributor

@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libmamba also ignores parentheses:

>>> from libmambapy.specs import MatchSpec as LibmambaMatchSpec
>>>print(LibmambaMatchSpec.parse("python 3.8 * (__win and __osx)"))
python==3.8
>>> print(LibmambaMatchSpec.parse("python 3.8 * (if __win and __osx)"))
python==3.8

Copy link
Contributor

@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to design the syntax like this:

name [version [build]] ('if' condition)

The if literal could be omitted, or replaced with with, if folks feel it's clearer that way. See discussion in #conda-maintainers > Conditional dependencies syntax in v2 environments & recipes @ 💬.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea (not sure if a good one) could be: rather than make MatchSpecs more complex than they already are, build on the idea of selectors from the new recipe format and allow depends to contain objects like so:

depends:
 - python >=3.8
 - if: python <3.8
    then: six

In the recipe format, this is expressed via a variation on the selector to avoid being process at build time (if(run): for instance).

This disallow basically disallow conditionals outside of recipes (or format with selector), but makes a more consistent narrative around conditionals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was indeed also posted on zulip. One issue is that it then becomes harder for cli tools to adopt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"

foobar when python >=3.8

E.g. in a recipe:

- if: unix
  then: foobar
  when: python >=3.8
  
# OR
  
- when: python >=3.8
  then: foobar
 
# OR 
  
- "foobar when python >=3.8"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this when idea a lot, and combines well with the if/then syntax, but I'd like to discuss the serialized MatchSpec. foobar when python>=3.8 is unparsable because it could be understood as version=when, build=python>=3.8. Of course we could force parsing rules to "split on when first" and so on, but let me suggest something that wouldn't imply so many changes in the MatchSpec parsers: use a square bracket keyword.

foobar[when="python>=3.8"]
foobar[when="__win"]
foobar[when="python>=3.8 and __unix"]


We would like to also allow for AND and OR with the following syntax:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need NOT and parentheses for precedence overrides, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regular MatchSepc, we have , and | used for versions for and and or. I think we should keep thing similar, even if it means supporting and and or in versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find this hard to distinguish when parsing the version. E.g.

python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0

I find the version with or easier to read than the one with pipes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment markers have been doing fine with and and or, I don't think we need to introduce more character-based operators.


```
...; if python <3.8 and numpy >=2.0
...; if python >=3.8 or pypy
```

Note: the proposed functionality is already done in less elegant ways by creating multiple noarch packages with `__unix` or `__win` dependencies in the conda-forge distribution. Similarly this behavior will be conceptually similar as building multiple variants for a given package.

## Flags for the repodata

It's very natural to build different variants for a given package in the conda ecosystem with different properties: blas implementation, gpu / cuda version, and other variables make up the variant matrix for certain recipes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mental note that flags should be added to the variant hash to ensure that packages have unique names.


However, it is not easy to specify which variant a user really wants in conda today. Most of the time, some string-matching on the build string is used to select one of the options, such as `pytorch 2.5.* *cuda`.

There are other workarounds by using `mutex` packages and constraining them such as `blas_impl * mkl` which could be used to select only packages that also depend on the MKL build.

However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant.

A RepodataRecord should get a new field "flags" that is a list of strings, such as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do flags and extra mix and overlap? Wouldn't conditional dependencies on a flag be enough to generate the extra category?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.

In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.


```yaml
name: pytorch
version: "2.5.0"
# note these flags are free-form, and distributions are free to come up
# with their own set of flags
flags: ["gpu:cuda", "blas:mkl", "archspec:4", "release"]
```

Flags can then be matched using the following little syntax:

- `release`: only packages with `release` flag are used
- `~release`: disallow packages with `release` flag
- `?release`: if release flag available, filter on it, otherwise use any other
- `gpu:*`: any flag starting with `gpu:` will be matched

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example for the string matching for say blas:mkl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, we should just state how to do an exact match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You coudl just ask for flags = ["blas:mkl"] for an exact match :)

- `archspec:>2`: any flag starting with `archspec:` will be matched with everything trailing interpreted as a number and matched against the comparison operator
- `?archspec:>2`: if a flag starting with `archspec:` is found, match against this, otherwise ignore

In practice, this would look like the following from a user perspective:

```shell
conda install 'pytorch[version=">=3.1", flags=["gpu:*", "?release"]]'
```

## Backwards Compatibility

The new `repodata.v2` will be served alongside the current format under `repodata.v2.json`. Older conda clients will continue using the v1 format. Packages using any of the new features will not be added to v1 of `repodata.json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad about having six copies of repodata (.json, .json.zst, sharded, v2.*)

Loading