-
Notifications
You must be signed in to change notification settings - Fork 30
Draft another CEP for PURL annotation #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| about: | ||
| # ... | ||
| purls: | ||
| - pkg:pypi/django@{{ version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the CEP specification mandate that PURLs have a version or do we allow versionless PURLs? I'd also be curious to know what's your opinion on the cases where we'll need to use the generic type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought of using versionless PURLs, but the examples in #63 (comment) made me consider adding the version because well, it has to be there anyway. That said, I don't think the version field should be required.
About generic, yes and no. Some packages won't have a canonical location or ecosystem (pretty common in C/C++), so the most convenient way is to annotate them as generic. PEP 725 (draft) goes into more detail about this. Check also https://github.com/jaimergp/external-metadata-mappings, where I am dabbing with the concept of cross-ecosystem mappings. There's a Streamlit app too. I do see the value in things like generic/libtiff, as well as PEP-725-proposed virtual:interface/* and virtual:compiler/*. Hopefully this can be integrated in PURL like pkg:abstract/* though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought of using versionless PURLs, but the examples in #63 (comment) made me consider adding the version because well, it has to be there anyway.
I think it depends on how you want to use it. If you just want a standard way to map package names, then I guess versionless is fine. But if you want to map to a specific version, then you need the upstream version.
About generic, yes and no. Some packages won't have a canonical location or ecosystem (pretty common in C/C++), so the most convenient way is to annotate them as
generic. PEP 725 (draft) goes into more detail about this. Check also https://github.com/jaimergp/external-metadata-mappings, where I am dabbing with the concept of cross-ecosystem mappings. There's a Streamlit app too. I do see the value in things likegeneric/libtiff, as well as PEP-725-proposedvirtual:interface/*andvirtual:compiler/*. Hopefully this can be package-url/purl-spec#222 though.
I have been following your work in https://github.com/jaimergp/external-metadata-mappings and the other repos, and that's great!
I'm asking about generics because we already started using PURLs to map our package names to upstream packages but decided to wait before adding any kind of support for generics. It gets messy pretty quickly. But more importantly, you need to know what you want to use the PURL for. Is it to map to an exact upstream location or is it to map something to something else and you don't really care about the exact source?
I don't know if we should define this in the CEP though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking about generics because we already started using PURLs to map our package names to upstream packages but decided to wait before adding any kind of support for generics. It gets messy pretty quickly.
Do you have some examples of the messy parts? Really interested to see other use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say that we didn't look very deeply into generics. But I feel like where it gets tricky is that the spec says that download_url and checksum can be provided. But the thing is that the download_url could differ based on the subdir or other things. There are also potential cases where do very different things could be named the same way. Since generic doesn't have a concept of namespace, I'm not too sure how we would handle these (though it's hypothetical mostly).
I might also overthinking it (which is very possible).
Co-authored-by: Jean-Christophe Morin <[email protected]>
| purls: | ||
| - pkg:pypi/django@{{ version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider making the purls a mapping, rather than a simple list? This would make the field more easily interpretable and would give us more flexibility to further extend it.
E.g.,
purls:
upstream:
- pkg:pypi/foo
vendors:
- pkg:github/bar
- pkg:github/baz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A free-form mapping or a mapping with well defined keys? If the latter, which ones? I don't want to end up in SBOM-like territory.
| contain a `purls` field that takes a list of strings as specified | ||
| by the [PURL standard](https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst). | ||
|
|
||
| If the build tool does not support this new field in `about`, the free-form `extra` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that extra can't be passed to an output. conda-build will simply ignore it and use the top-level extra section instead of the output-specific one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that be changed? Or is that not feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could change that. I think we would need to change https://github.com/conda/conda-build/blob/d27c97c11e2316905cefe24deebd3e445e853ece/conda_build/metadata.py#L2619-L2626, or something like that.
|
One comment here is that in the context of the github conda support rollout and then deprecation, I think we discovered that the mappings between ecosystems, if constructed without oversight, are pathways for potential security issues. I wonder how that kind of consideration might effect the design of this CEP. Certainly for conda-forge, we'd need knobs to control that mapping so that people cannot point a numpy to some other conda-forge package besides numpy. |
|
The design of repodata patches, while it has warts, might help us here. We could designate a specially named conda package that holds the authoritative set of PURLS for that channel. A given channel could upload that package if they choose to do so and that would be the source of truth. |
|
One simple path forward is to add purls to the repodata, source it from about, and also support patching it. Then conda-forge can override everything as needed using its patching codes, but other channels have a builtin mechanism. This would result in duplication, so it we may want to treat it more like run exports. |
Rehashing discussion in #63, more focused on
aboutrather thanindex.