-
Notifications
You must be signed in to change notification settings - Fork 30
CEP for the next evolution of Repodata (v2) and MatchSpec #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Here are comments from the HackMD:
For default flags I think we might want to deprecate the Or, maybe a better thought: we could also make it so that if Regarding Spack and Conan: I would be super hyped if you can distill this on how it could work for the conda ecosystem. I haven't had the time yet to read into it deeply. I am not a fan of super arbitrary code execution though. |
| - `release`: only packages with `release` flag are used | ||
| - `~release`: disallow packages with `release` flag | ||
| - `?release`: if release flag available, filter on it, otherwise use any other | ||
| - `gpu:*`: any flag starting with `gpu:` will be matched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example for the string matching for say blas:mkl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an exact match not work fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an exact match not work fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, we should just state how to do an exact match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You coudl just ask for flags = ["blas:mkl"] for an exact match :)
|
|
||
| The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec. | ||
|
|
||
| We would like to also allow for AND and OR with the following syntax: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need NOT and parentheses for precedence overrides, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regular MatchSepc, we have , and | used for versions for and and or. I think we should keep thing similar, even if it means supporting and and or in versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If find this hard to distinguish when parsing the version. E.g.
python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0
I find the version with or easier to read than the one with pipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment markers have been doing fine with and and or, I don't think we need to introduce more character-based operators.
| - six; if python <3.8 | ||
| ``` | ||
|
|
||
| The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we didn't need the ; because somehow conda will happily ignore the if parts while parsing MatchSpecs.
>>> from conda.models.match_spec import MatchSpec as M
>>> M("python 3.8 * if python")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 'if' if __win") # quote 'if' to force parse it as a build string
MatchSpec("python==3.8='if'")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will also ignore parenthesised blocks:
>>> M("python 3.8 * (__win)")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 (__win)")
MatchSpec("python==3.8")
>>> M("python 3.8 (__win and __osx)")
MatchSpec("python==3.8")
>>> M("python 3.8 (if __win and __osx)")
MatchSpec("python==3.8")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libmamba also ignores parentheses:
>>> from libmambapy.specs import MatchSpec as LibmambaMatchSpec
>>>print(LibmambaMatchSpec.parse("python 3.8 * (__win and __osx)"))
python==3.8
>>> print(LibmambaMatchSpec.parse("python 3.8 * (if __win and __osx)"))
python==3.8There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to design the syntax like this:
name [version [build]] ('if' condition)The if literal could be omitted, or replaced with with, if folks feel it's clearer that way. See discussion in #conda-maintainers > Conditional dependencies syntax in v2 environments & recipes @ 💬.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea (not sure if a good one) could be: rather than make MatchSpecs more complex than they already are, build on the idea of selectors from the new recipe format and allow depends to contain objects like so:
depends:
- python >=3.8
- if: python <3.8
then: sixIn the recipe format, this is expressed via a variation on the selector to avoid being process at build time (if(run): for instance).
This disallow basically disallow conditionals outside of recipes (or format with selector), but makes a more consistent narrative around conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was indeed also posted on zulip. One issue is that it then becomes harder for cli tools to adopt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"
foobar when python >=3.8
E.g. in a recipe:
- if: unix
then: foobar
when: python >=3.8
# OR
- when: python >=3.8
then: foobar
# OR
- "foobar when python >=3.8"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this when idea a lot, and combines well with the if/then syntax, but I'd like to discuss the serialized MatchSpec. foobar when python>=3.8 is unparsable because it could be understood as version=when, build=python>=3.8. Of course we could force parsing rules to "split on when first" and so on, but let me suggest something that wouldn't imply so many changes in the MatchSpec parsers: use a square bracket keyword.
foobar[when="python>=3.8"]
foobar[when="__win"]
foobar[when="python>=3.8 and __unix"]|
|
||
| ## Conditional dependencies | ||
|
|
||
| Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like this idea. An implementation note: while if __unix is reasonably easy to implement because it is "static", if python <3.8 is conceptually much harder as it is not something that can be decided ahead of solving. I requires to be able to adapt the dependencies of a package as partial candidates are investigated during solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is a form of boolean dependencies in rpm (already supported by libsolv I believe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the PR that implements this for resolvo: prefix-dev/resolvo#136
|
|
||
| However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant. | ||
|
|
||
| A RepodataRecord should get a new field "flags" that is a list of strings, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do flags and extra mix and overlap? Wouldn't conditional dependencies on a flag be enough to generate the extra category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.
In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.
|
|
||
| The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec. | ||
|
|
||
| We would like to also allow for AND and OR with the following syntax: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If find this hard to distinguish when parsing the version. E.g.
python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0
I find the version with or easier to read than the one with pipes.
| - six; if python <3.8 | ||
| ``` | ||
|
|
||
| The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"
foobar when python >=3.8
E.g. in a recipe:
- if: unix
then: foobar
when: python >=3.8
# OR
- when: python >=3.8
then: foobar
# OR
- "foobar when python >=3.8"|
|
||
| However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant. | ||
|
|
||
| A RepodataRecord should get a new field "flags" that is a list of strings, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.
In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.
|
|
||
| ## Conditional dependencies | ||
|
|
||
| Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the PR that implements this for resolvo: prefix-dev/resolvo#136
| - pyxpgres >=8 | ||
| ``` | ||
|
|
||
| When a user, or a dependency, selects an extra through a MatchSpec, the extra and it's dependencies are "activated". This is conceptually the same as having three packages with "exact" dependencies from the "extra" to the base package: `sqlalchemy`, `sqlalchemy-sqlite` and `sqlalchemy-postgres` – which is the workaround currently employed by a number of packages on conda-forge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clarify what happens if an extra is requested for a package but the selected variant doesnt provide that extra. E.g. what happens if I depend on a foobar[extras=["doesntexist"]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed it but its also not defined how to depend on an extra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we discussed this today some more and there are a few options:
- An extra implicitly always exists (as empty) and so we would always choose the latest "base" package (whether it has the extra or not). We would then warn if it does not have the
extra. I believe this is the behavior thatpipimplements, and makes it easier to be forward compatible even if other dependencies might ask for an extra that does not exist anymore. - We could strictly require that the extra is there and otherwise fail (or select an older version of the package that has the extra).
- We could forward propagate extras (e.g. the first time we see an extra, we implicitly add it to all later versions, even if empty) which would prevent the solver from choosing a version before the extra existed, but might choose a later one after the extra was removed).
I think we should decide on a behavior, most likely choosing the one from pip. We should also read up on the idea of adding a "default" extra (default dependencies that can be deselected with no-default-extra or something along the lines which I recently saw being proposed somewhere in PyPI world as part of wheel-next I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP for default extras: https://peps.python.org/pep-0771/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a proposal for the matchspec syntax for these extras? Something like this?
conda install sqlalchemy[sqlite]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What currently works is sqlalchemy[extras=sqlite].
Please note that [ ... ] is a syntax already in conda to specify key-value pairs, e.g. foobar[sha256="1234..."].
|
|
||
| ## Flags for the repodata | ||
|
|
||
| It's very natural to build different variants for a given package in the conda ecosystem with different properties: blas implementation, gpu / cuda version, and other variables make up the variant matrix for certain recipes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mental note that flags should be added to the variant hash to ensure that packages have unique names.
|
This is what a recipe looks like - we have a branch where the tests are passing. |
|
Another comment to remind ourselves, to simplify the initial implementation:
|
|
We just merged which adds support for all features mentioned in this CEP. We are working on an preliminary version of this CEP in rattler based on this. |
|
Let's split this CEP up between the various proposals here, each have sufficiently complex to implement constraints. |
The reason we combined these is they are require changes to the repodata format. We aimed to bundle these together in a single change. But I do understand the desire to split the different features. Do you have a suggestion how we can avoid having to bump the repodata version in each CEP? |
|
I think nothing prevents us from saying something like "This CEP is co-submitted with X, Y, Z. All of them propose to bump to repodata version v2, the implementation of which MUST happen simultaneously when all CEPs have been reviewed and accepted or rejected". |
|
I think I would be pro-splitting if we would see lots of engagement on the topics presented here. But so far it is not an unwieldy discussion so I don't exactly see the point of splitting :) |
|
The point is that repodata and matchspec are two different technical specifications and shouldn't be bunched into one CEP since that makes referring and evolving them in the future much harder. Imagine reading a future CEP saying "This refers to the matchspec portion of CEP 45, but not the repodata part." |
One of that CEPs could be about this proposed design for a better run-export infrastructure, which will likely (probably? certainly?) require repodata changes. So v2 would be the moment to pull that off, because I'm not assuming we'll be doing this very often. |
|
When implementing support for conditional dependencies, we noted that using I think it might be sensible to change Thoughts? Looks less Pythonic for sure. We could also allow Or we disallow build strings like exact match |
I hate space-separated matchspecs with a burning passion, and this would make it 10 times worse. Let's aim to give people a grammar that they can resolve in their heads, without decrypting an ambiguous order of precedence, or having to look up the definition of the spec. IMO In the same vein, if we're doing MatchSpec v2, I would so like to get rid of semantically-relevant spaces everywhere, including between versions and build strings. We could just make |
|
I noticed that in CEP15 we already using the repodata v2 nominclature, and for implementations like for mamba, libsolv is already checking for |
Agreed, let's get rid of the ambiguity of space-separated fields. There's no need to keep it, and we have alternatives. We could require that conditions are only be expressed with bracket syntax. |
|
As mentioned in #55, the feedback there is very valuable and should be referenced here when applicable (e.g. in the Rationale section that discusses why the proposed syntax was chosen). |
|
|
||
| ## Backwards Compatibility | ||
|
|
||
| The new `repodata.v2` will be served alongside the current format under `repodata.v2.json`. Older conda clients will continue using the v1 format. Packages using any of the new features will not be added to v1 of `repodata.json`. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too bad about having six copies of repodata (.json, .json.zst, sharded, v2.*)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new features proposed in this CEP are really exciting and I want to see this happen, so count with my support! Before I delve into the technical details, I'd like to mention a few editorial changes that would be required (and I'm happy to assist in fixing those). Right now, it feels that this is more of a "pre-CEP" writeup. Namely:
- The template is not being followed. None of the proposed sections are used. It's particular important to have: Abstract, Rationale, Specification, Rejected Ideas, and References. The Copyright notice is mandatory.
- There are a number of related PRs on-flight that can help with the definitions and context. For example, the
repodataschema can be simply referred to as "the repodata schema as defined in CEP PR #135", and then marked as part of the "Requires" frontmatter metadata. - There are some missing or outdated references to implementation PRs (e.g. conda/rattler#1040 was closed in favor of conda/rattler#1550; conda/rattler#1816 is missing).
- As mentioned elsewhere, this PR should be better split in three parts; one per new field. All of them can "require" each other so they are all part of the v3 repodata update.
- I recommend resorting to MUST, SHOULD and MAY in the Specification section to clarify what's allowed and what's not.
Now onto the actual contents:
- The repodata version 2 is taken. This needs to be version 3.
- The conditionals syntax needs to be better specified. Right now it's unclear what happens with space-separated matchspecs, especially if they happen to have a build string
and. I recommend constraining this so we don't end up with nested conditionals and other complications (unless intended!). - The strings allowed in "flags" are not well described. Maximum length, allowed characters and other specificities would be welcome.
- The Backwards Compatibility section should cover what happens with the v1 repodata when a package carries dependencies with the new features. Would the two repodatas get out of sync? Are we expecting some packages to be only available in the new repodata? Are both repodata versions supposed to carry the same content? What's the expected behavior here?
- See this
conda-whl-channelreview for some tricks that may be useful here.
- See this
- Can packages depend on other packages with their extras? If so, how is this specified? Recipe formats have only allowed
name [version [build]]syntax so far. Is this the moment where we require repodata v3 to only use the canonical form pre-standardized in #82?
There's maybe some more nuance to this, but this is what I got after reading it again today! Thanks for listening 🤓 And I reiterate: I'm happy to help write this up in the format that we need!
|
How do we deal with quotes in quotes if we want to express: |
I think we support both single and double quotes, so they can be mixed, but also escaped. That should be part of #82. |
This defines new repodata with two additional fields and a modfied matchspec syntax:
six; if python <3.8)Some initial discussion happened over the past days on:
TODO:
~foo,!foo, or-foo.