Skip to content

Conversation

@wolfv
Copy link
Contributor

@wolfv wolfv commented Feb 5, 2025

This defines new repodata with two additional fields and a modfied matchspec syntax:

  • extra (optional) dependency sets that can be activated by the user or other packages
  • conditional dependencies to be activated on a specific platform, or when other package versions are also pulled in (e.g. require six; if python <3.8)
  • flags to make it easier to select variants confidently (instead of globbing the build string)

Some initial discussion happened over the past days on:

TODO:

  • specify that we need to understand full-blown matchspecs in the repodata.v2 (e.g. with bracket syntax), because we want other packages to be able to request extras or flags).
  • Decide whether flag negation should be ~foo, !foo, or -foo.
  • Decide if we need to add a default flags entry to the repodata record, or another way to mark the default variant (can also still use down-priorization via track records)
  • Investigate ordering of flags and whether we should have some weighing e.g. for numerical flags (higher == better)? Or wether this should be decided by another order integer that the users can compute.

@wolfv
Copy link
Contributor Author

wolfv commented Feb 5, 2025

Here are comments from the HackMD:

Should we consider a system like Spack has (https://spack.readthedocs.io/en/latest/basic_usage.html#specs-dependencies) or maybe Conan (https://docs.conan.io/2/reference/conanfile/attributes.html#options)?
Additionally, how would a package declare which flag is the default? (Edited)

For default flags I think we might want to deprecate the track_features field and instead add a priority: integer field that can hold any value. The user could mark the default this way. We could also use the inverse of priority, so that higher values are less desirable.

Or, maybe a better thought: we could also make it so that if priority: -1 we never enable this package, except if we request its flags directly.

Regarding Spack and Conan: I would be super hyped if you can distill this on how it could work for the conda ecosystem. I haven't had the time yet to read into it deeply. I am not a fan of super arbitrary code execution though.

- `release`: only packages with `release` flag are used
- `~release`: disallow packages with `release` flag
- `?release`: if release flag available, filter on it, otherwise use any other
- `gpu:*`: any flag starting with `gpu:` will be matched

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example for the string matching for say blas:mkl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, we should just state how to do an exact match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You coudl just ask for flags = ["blas:mkl"] for an exact match :)


The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.

We would like to also allow for AND and OR with the following syntax:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need NOT and parentheses for precedence overrides, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regular MatchSepc, we have , and | used for versions for and and or. I think we should keep thing similar, even if it means supporting and and or in versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find this hard to distinguish when parsing the version. E.g.

python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0

I find the version with or easier to read than the one with pipes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment markers have been doing fine with and and or, I don't think we need to introduce more character-based operators.

- six; if python <3.8
```

The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we didn't need the ; because somehow conda will happily ignore the if parts while parsing MatchSpecs.

>>> from conda.models.match_spec import MatchSpec as M
>>> M("python 3.8 * if python")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 'if' if  __win")  # quote 'if' to force parse it as a build string
MatchSpec("python==3.8='if'")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also ignore parenthesised blocks:

>>> M("python 3.8 * (__win)")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 (__win)")
MatchSpec("python==3.8")
>>> M("python 3.8 (__win and __osx)")
MatchSpec("python==3.8")
>>> M("python 3.8 (if __win and __osx)")
MatchSpec("python==3.8")

Copy link
Contributor

@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libmamba also ignores parentheses:

>>> from libmambapy.specs import MatchSpec as LibmambaMatchSpec
>>>print(LibmambaMatchSpec.parse("python 3.8 * (__win and __osx)"))
python==3.8
>>> print(LibmambaMatchSpec.parse("python 3.8 * (if __win and __osx)"))
python==3.8

Copy link
Contributor

@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to design the syntax like this:

name [version [build]] ('if' condition)

The if literal could be omitted, or replaced with with, if folks feel it's clearer that way. See discussion in #conda-maintainers > Conditional dependencies syntax in v2 environments & recipes @ 💬.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea (not sure if a good one) could be: rather than make MatchSpecs more complex than they already are, build on the idea of selectors from the new recipe format and allow depends to contain objects like so:

depends:
 - python >=3.8
 - if: python <3.8
    then: six

In the recipe format, this is expressed via a variation on the selector to avoid being process at build time (if(run): for instance).

This disallow basically disallow conditionals outside of recipes (or format with selector), but makes a more consistent narrative around conditionals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was indeed also posted on zulip. One issue is that it then becomes harder for cli tools to adopt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"

foobar when python >=3.8

E.g. in a recipe:

- if: unix
  then: foobar
  when: python >=3.8
  
# OR
  
- when: python >=3.8
  then: foobar
 
# OR 
  
- "foobar when python >=3.8"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this when idea a lot, and combines well with the if/then syntax, but I'd like to discuss the serialized MatchSpec. foobar when python>=3.8 is unparsable because it could be understood as version=when, build=python>=3.8. Of course we could force parsing rules to "split on when first" and so on, but let me suggest something that wouldn't imply so many changes in the MatchSpec parsers: use a square bracket keyword.

foobar[when="python>=3.8"]
foobar[when="__win"]
foobar[when="python>=3.8 and __unix"]


## Conditional dependencies

Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like this idea. An implementation note: while if __unix is reasonably easy to implement because it is "static", if python <3.8 is conceptually much harder as it is not something that can be decided ahead of solving. I requires to be able to adapt the dependencies of a package as partial candidates are investigated during solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a form of boolean dependencies in rpm (already supported by libsolv I believe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that implements this for resolvo: prefix-dev/resolvo#136


However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant.

A RepodataRecord should get a new field "flags" that is a list of strings, such as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do flags and extra mix and overlap? Wouldn't conditional dependencies on a flag be enough to generate the extra category?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.

In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.


The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.

We would like to also allow for AND and OR with the following syntax:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find this hard to distinguish when parsing the version. E.g.

python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0

I find the version with or easier to read than the one with pipes.

- six; if python <3.8
```

The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"

foobar when python >=3.8

E.g. in a recipe:

- if: unix
  then: foobar
  when: python >=3.8
  
# OR
  
- when: python >=3.8
  then: foobar
 
# OR 
  
- "foobar when python >=3.8"


However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant.

A RepodataRecord should get a new field "flags" that is a list of strings, such as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.

In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.


## Conditional dependencies

Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that implements this for resolvo: prefix-dev/resolvo#136

- pyxpgres >=8
```

When a user, or a dependency, selects an extra through a MatchSpec, the extra and it's dependencies are "activated". This is conceptually the same as having three packages with "exact" dependencies from the "extra" to the base package: `sqlalchemy`, `sqlalchemy-sqlite` and `sqlalchemy-postgres` – which is the workaround currently employed by a number of packages on conda-forge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify what happens if an extra is requested for a package but the selected variant doesnt provide that extra. E.g. what happens if I depend on a foobar[extras=["doesntexist"]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it but its also not defined how to depend on an extra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed this today some more and there are a few options:

  • An extra implicitly always exists (as empty) and so we would always choose the latest "base" package (whether it has the extra or not). We would then warn if it does not have the extra. I believe this is the behavior that pip implements, and makes it easier to be forward compatible even if other dependencies might ask for an extra that does not exist anymore.
  • We could strictly require that the extra is there and otherwise fail (or select an older version of the package that has the extra).
  • We could forward propagate extras (e.g. the first time we see an extra, we implicitly add it to all later versions, even if empty) which would prevent the solver from choosing a version before the extra existed, but might choose a later one after the extra was removed).

I think we should decide on a behavior, most likely choosing the one from pip. We should also read up on the idea of adding a "default" extra (default dependencies that can be deselected with no-default-extra or something along the lines which I recently saw being proposed somewhere in PyPI world as part of wheel-next I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP for default extras: https://peps.python.org/pep-0771/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a proposal for the matchspec syntax for these extras? Something like this?

conda install sqlalchemy[sqlite]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What currently works is sqlalchemy[extras=sqlite].

Please note that [ ... ] is a syntax already in conda to specify key-value pairs, e.g. foobar[sha256="1234..."].


## Flags for the repodata

It's very natural to build different variants for a given package in the conda ecosystem with different properties: blas implementation, gpu / cuda version, and other variables make up the variant matrix for certain recipes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mental note that flags should be added to the variant hash to ensure that packages have unique names.

@wolfv
Copy link
Contributor Author

wolfv commented Jul 18, 2025

This is what a recipe looks like - we have a branch where the tests are passing.

package:
  name: conditional
  version: 0.1.0

build:
  flags:
    - cuda
    - openmp:mpi

requirements:
  run:
    - rattler-build
    - bzip2 *; if rattler-build >=0.44
    - zstd *; if rattler-build <0.44

  extras:
    science:
      - numpy
      - matplotlib
    web:
      - flask
    cli:
      - click

tests:
  - script:
    - echo "Hello, world!"
    requirements:
      run:
        - rattler-build >=0.44

  - script:
    - echo "This is a test for Rattler < 0.44"
    requirements:
      run:
        - rattler-build <0.44

  - script:
    - echo "This is a test for Rattler < 0.44"
    requirements:
      run:
        - conditional [extras=[science, web]]

@wolfv
Copy link
Contributor Author

wolfv commented Jul 22, 2025

Another comment to remind ourselves, to simplify the initial implementation:

  • we will not allow conditions in run_constraints (for now)
  • We will not allow extra groups to contain anything other than run dependencies (no run constraints or run exports)

@baszalmstra
Copy link
Contributor

We just merged

which adds support for all features mentioned in this CEP. We are working on an preliminary version of this CEP in rattler based on this.

@jezdez
Copy link
Member

jezdez commented Jul 30, 2025

Let's split this CEP up between the various proposals here, each have sufficiently complex to implement constraints.

@baszalmstra
Copy link
Contributor

Let's split this CEP up between the various proposals here, each have sufficiently complex to implement constraints.

The reason we combined these is they are require changes to the repodata format. We aimed to bundle these together in a single change.

But I do understand the desire to split the different features. Do you have a suggestion how we can avoid having to bump the repodata version in each CEP?

@jaimergp
Copy link
Contributor

I think nothing prevents us from saying something like "This CEP is co-submitted with X, Y, Z. All of them propose to bump to repodata version v2, the implementation of which MUST happen simultaneously when all CEPs have been reviewed and accepted or rejected".

@wolfv
Copy link
Contributor Author

wolfv commented Jul 31, 2025

I think I would be pro-splitting if we would see lots of engagement on the topics presented here. But so far it is not an unwieldy discussion so I don't exactly see the point of splitting :)

@jezdez
Copy link
Member

jezdez commented Aug 19, 2025

The point is that repodata and matchspec are two different technical specifications and shouldn't be bunched into one CEP since that makes referring and evolving them in the future much harder. Imagine reading a future CEP saying "This refers to the matchspec portion of CEP 45, but not the repodata part."

@h-vetinari
Copy link

"This CEP is co-submitted with X, Y, Z. All of them propose to bump to repodata version v2, the implementation of which MUST happen simultaneously when all CEPs have been reviewed and accepted or rejected".

One of that CEPs could be about this proposed design for a better run-export infrastructure, which will likely (probably? certainly?) require repodata changes. So v2 would be the moment to pull that off, because I'm not assuming we'll be doing this very often.

@wolfv
Copy link
Contributor Author

wolfv commented Sep 4, 2025

When implementing support for conditional dependencies, we noted that using if foobar * and or bizbar might be confusing if and is used as a build string.

I think it might be sensible to change and and or to use && and || instead (C/C++ syntax) which are characters that are not allowed anywhere in a matchspec.

Thoughts? Looks less Pythonic for sure.

We could also allow and and or as "inputs" and change to && and || in the indexer.

Or we disallow build strings like exact match and/or and require the user to type foobar[build_string="and"] instead.

@h-vetinari
Copy link

h-vetinari commented Sep 5, 2025

if foobar * and or bizbar

I hate space-separated matchspecs with a burning passion, and this would make it 10 times worse. Let's aim to give people a grammar that they can resolve in their heads, without decrypting an ambiguous order of precedence, or having to look up the definition of the spec. IMO && and || are ubiquitous enough that they would be very good choices for this.

In the same vein, if we're doing MatchSpec v2, I would so like to get rid of semantically-relevant spaces everywhere, including between versions and build strings. We could just make foobar[build_string="..."] the default (or even required) representation.

@danyeaw
Copy link
Member

danyeaw commented Oct 28, 2025

I noticed that in CEP15 we already using the repodata v2 nominclature, and for implementations like for mamba, libsolv is already checking for repodata_version == 2. Should we refer to this change as v3 instead? Do we want to use repodata_version for this change as well or create a new repodata.v2.json like the current draft says?

@jaimergp
Copy link
Contributor

if foobar * and or bizbar

I hate space-separated matchspecs with a burning passion, and this would make it 10 times worse. Let's aim to give people a grammar that they can resolve in their heads, without decrypting an ambiguous order of precedence, or having to look up the definition of the spec. IMO && and || are ubiquitous enough that they would be very good choices for this.

In the same vein, if we're doing MatchSpec v2, I would so like to get rid of semantically-relevant spaces everywhere, including between versions and build strings. We could just make foobar[build_string="..."] the default (or even required) representation.

Agreed, let's get rid of the ambiguity of space-separated fields. There's no need to keep it, and we have alternatives. We could require that conditions are only be expressed with bracket syntax.

@jaimergp
Copy link
Contributor

jaimergp commented Nov 5, 2025

As mentioned in #55, the feedback there is very valuable and should be referenced here when applicable (e.g. in the Rationale section that discusses why the proposed syntax was chosen).


## Backwards Compatibility

The new `repodata.v2` will be served alongside the current format under `repodata.v2.json`. Older conda clients will continue using the v1 format. Packages using any of the new features will not be added to v1 of `repodata.json`. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad about having six copies of repodata (.json, .json.zst, sharded, v2.*)

Copy link
Contributor

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new features proposed in this CEP are really exciting and I want to see this happen, so count with my support! Before I delve into the technical details, I'd like to mention a few editorial changes that would be required (and I'm happy to assist in fixing those). Right now, it feels that this is more of a "pre-CEP" writeup. Namely:

  • The template is not being followed. None of the proposed sections are used. It's particular important to have: Abstract, Rationale, Specification, Rejected Ideas, and References. The Copyright notice is mandatory.
  • There are a number of related PRs on-flight that can help with the definitions and context. For example, the repodata schema can be simply referred to as "the repodata schema as defined in CEP PR #135", and then marked as part of the "Requires" frontmatter metadata.
  • There are some missing or outdated references to implementation PRs (e.g. conda/rattler#1040 was closed in favor of conda/rattler#1550; conda/rattler#1816 is missing).
  • As mentioned elsewhere, this PR should be better split in three parts; one per new field. All of them can "require" each other so they are all part of the v3 repodata update.
  • I recommend resorting to MUST, SHOULD and MAY in the Specification section to clarify what's allowed and what's not.

Now onto the actual contents:

  • The repodata version 2 is taken. This needs to be version 3.
  • The conditionals syntax needs to be better specified. Right now it's unclear what happens with space-separated matchspecs, especially if they happen to have a build string and. I recommend constraining this so we don't end up with nested conditionals and other complications (unless intended!).
  • The strings allowed in "flags" are not well described. Maximum length, allowed characters and other specificities would be welcome.
  • The Backwards Compatibility section should cover what happens with the v1 repodata when a package carries dependencies with the new features. Would the two repodatas get out of sync? Are we expecting some packages to be only available in the new repodata? Are both repodata versions supposed to carry the same content? What's the expected behavior here?
  • Can packages depend on other packages with their extras? If so, how is this specified? Recipe formats have only allowed name [version [build]] syntax so far. Is this the moment where we require repodata v3 to only use the canonical form pre-standardized in #82?

There's maybe some more nuance to this, but this is what I got after reading it again today! Thanks for listening 🤓 And I reiterate: I'm happy to help write this up in the format that we need!

@baszalmstra
Copy link
Contributor

baszalmstra commented Nov 16, 2025

How do we deal with quotes in quotes if we want to express: foo[if="bar[build=\"*gpu\"]"]. Escape them like the above? Is that already something that is defined in other CEPs?

@jaimergp
Copy link
Contributor

How do we deal with quotes in quotes if we want to express: foo[if="bar[build=\"*gpu\"]"]. Escape them like the above? Is that already something that is defined in other CEPs?

I think we support both single and double quotes, so they can be mixed, but also escaped. That should be part of #82.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants