Skip to content
143 changes: 143 additions & 0 deletions cep-XXXX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# CEP XXXX - Metadata files served by conda channels

<table>
<tr><td> Title </td><td> CEP XXXX - Metadata files served by conda channels </td>
<tr><td> Status </td><td> Draft </td></tr>
<tr><td> Author(s) </td><td> Jaime Rodríguez-Guerra &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Sep 30, 2025 </td></tr>
<tr><td> Updated </td><td> Sep 30, 2025 </td></tr>
<tr><td> Discussion </td><td> N/A </td></tr>
<tr><td> Implementation </td><td> N/A </td></tr>
<tr><td> Requires </td><td> https://github.com/conda/ceps/pull/133 </td></tr>
</table>

## Abstract

This CEP standardizes the schema for the metadata files served in conda channels. Namely, `repodata.json` (and its variants) and `channeldata.json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if specifying channeldata.json in a separate CEP might make it easier to deprecate it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Moved to #140.


> Channels may also serve `run_exports.json`, which is described in [CEP 12](./cep-0012.md).

## Motivation

The motivation of this CEP is merely informative. It describes the schema of existing metadata files already in wide use.

## Specification

As per [CEP 26](./cep-0026.md), a conda channel is defined as a location that MUST serve a `noarch/repodata.json` path. It MAY also serve additional, platform-specific `repodata.json` paths under other subdirectories of the same depth, which MUST follow the `subdir` naming conventions described in [CEP 26](./cep-0026.md).

A conda channel MAY also serve a `channeldata.json` path in its root level.

Note that there are no requirements for these paths to be backed by a proper filesystem; the contents of these locations can also be provided by API endpoints.

The contents of the `repodata.json` and `channeldata.json` documents MUST follow the schemas described below.

### `repodata.json`

`repodata.json` documents are subdir-specific JSON dictionaries that aggregate the `index.json` metadata of the included conda artifacts (see [CEP PR#133](https://github.com/conda/ceps/pull/133)), and extend them with details only known when the compressed artifact has been generated (such as size, timestamp, or checksums).

Each `repodata.json` MUST represent a dictionary with the keys listed below. All of them are optional. Additional top-level keys MUST be allowed but they MUST be ignored if not recognized.

- `info: dict[str, dict]`. Metadata about the `repodata.json` itself. See [info metadata](#info-metadata).
- `packages: dict[str, dict]`. This entry maps `*.tar.bz2` filenames to their [package record metadata](#package-record-metadata).
- `packages.conda: dict[str, dict]`. This entry maps `*.conda` filenames to [package record metadata](#package-record-metadata).
- `removed: list[str]`. List of filenames that were once included in either `packages` or `packages.conda`, but are now removed. The corresponding artifacts SHOULD still be accessible via their direct URL.

Additional keys SHOULD NOT be present and SHOULD be ignored.

#### `info` metadata

This dictionary stores information about the repodata file. It MUST follow this schema:

- `arch: str`. Deprecated. Same meaning as in [CEP PR#133](https://github.com/conda/ceps/pull/133)'s `index.json` key.
- `base_url: str`. Optional. See [CEP 15](./cep-0015.md).
- `platform: str`. Deprecated. Same meaning as in [CEP PR#133](https://github.com/conda/ceps/pull/133)'s `index.json` key.
- `repodata_version: int`. Optional. Version of the `repodata.json` schema. In its absence, tools MUST assume its value is `1`. See [CEP 15](./cep-0015.md) for `repodata_version = 2`.
- `subdir: str`. Recommended. The channel subdirectory this `repodata.json` belongs to. If its absence, its value MAY be inferred from the parent component of the `repodata.json` path.

Additional keys SHOULD NOT be present and SHOULD be ignored.

#### Package record metadata

Each entry in `packages` and `packages.conda`:

- MUST follow the `index.json` schema (see [CEP PR#133](https://github.com/conda/ceps/pull/133)).
- SHOULD report the same values as the artifact's `info/index.json` metadata. Small modifications MAY be introduced to apply metadata fixes (e.g. correct the constraints of a requirement in the `depends` field) without needing to rebuild the artifact.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rule, or recommendation for how consumers should resolve differences between the artifact's info/index.json and the repodata info for the package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause is introduced mostly to support repodata patching. repodata.json has precedence at install time. See https://github.com/jaimergp/ceps/blob/conda-meta-json/cep-XXXX.md ("Management of a conda environment").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small, medium or large modifications perhaps.

- MUST additionally include the following keys:
- `md5: str | None`. Hexadecimal string of the MD5 checksum of the compressed artifact.
- `sha256: str | None`. Hexadecimal string of the SHA256 checksum of the compressed artifact.
- `size: int`. Size, in bytes, of the compressed artifact.
- If the entry corresponds to a `.tar.bz2` package that was transmuted to `.conda`, it SHOULD include these keys:
- `legacy_bz2_md5: str`: Hexadecimal string of the SHA256 checksum of the original `.tar.bz2` artifact.
- `legacy_bz2_size: int`: Size, in bytes, of the original `.tar.bz2` artifact.

Additional keys SHOULD NOT be present and SHOULD be ignored.

#### Repodata variants

A conda channel MAY serve additional `repodata.json` paths in each subdir. Their name SHOULD match the glob `*repodata*.json`, and their contents MUST follow the `repodata.json` schema.

### `channeldata.json`

Deprecated.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really deprecated? I couldn't find any other notes that says that. And it looks like conda-build and conda-index still fully support features that rely on (or build) channeldata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is the first public attempt to deprecate it, but it has come up in meetings and other conversations. For example, repo.prefix.dev does not generate it and seems to be perfectly functional as a channel. I'd need to check with conda-build, but AFAICR, it was only needed for run_exports and that's why we added CEP 12 (among other reasons). conda-index may still generate it for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we document first (or as I said above move the channeldata.json definition into a seperate CEP), and then deprecate in a separate step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No non-CDN anaconda.org channel generates channeldata. There are lots of reasons why it should go away, including that it can be nondeterministic which subdir's package is representative in channeldata.


This JSON document MAY be served at the root of the conda channel. It aggregates some packaging metadata across all the channel subdirectories. It MUST follow this schema:

- `channeldata_version: int`. Version of the `channeldata` schema. Currently `1`.
- `subdirs: list[str]`: List of subdirectories supported by the channel.
- `packages: dict[str, dict]`. Mapping of package names to a dictionary with the following metadata:
- `activate.d: bool`. Whether the packages feature activation scripts.
- `binary_prefix: bool`. Whether the package files contain a prefix placeholder that must be replaced in binary mode.
- `deactivate.d: bool`. Whether the packages feature deactivation scripts.
- `dev_url: str`. URL to the main website of the project.
- `doc_url: str`. URL to the documentation website of the project.
- `home: str`. URL to the main website of the project.
- `license: str`. License of the project, preferably a SPDX expression.
- `post_link: bool`. Whether the packages feature post-link scripts.
- `pre_link: bool`. Whether the packages feature pre-link scripts.
- `pre_unlink: bool`. Whether the packages feature pre-unlink scripts.
- `run_exports: dict[str, dict]`. Mapping of versions to their `run_exports` metadata. See [CEP 12](./cep-0012.md) for the valid keys.
- `source_url: str | list[str]`. URL (or URLs) of the sources that were fetched to build the package.
- `subdirs: list[str]`. Channel subdirectories under which this package is available.
- `summary: str`. Short description of the project.
- `text_prefix: bool`. Whether the package files contain a prefix placeholder that must be replaced in text mode.
- `timestamp: int`. Upload date of the most recently published artifact, as a POSIX timestamp in milliseconds.
- `version: str`. Most recent version published in the channel.

## Examples

A minimal conda channel only needs a single, empty file:

```text
./noarch/repodata.json
```

A conda channel with a Linux x64 specific subdirectory:

```text
./noarch/repodata.json
./linux-64/repodata.json
```

Optionally serving `channeldata.json`:

```text
./noarch/repodata.json
./linux-64/repodata.json
./channeldata.json
```

## Rationale

The `channeldata.json` file is considered deprecated because the listed metadata may be unreliable. It assumes that all the artifacts for a given package name will always have a homogeneous composition, but this is not necessarily true. Some examples:

- Some artifacts may contain activation scripts on some platforms, but not on others.
- Prefix replacement may only be needed from a certain point in the lifetime of the project (e.g. the maintainers add a compiled extension for performance).
- The website or license may change during the project lifetime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid reasons, but out of scope of the CEP


## References

- <https://docs.conda.io/projects/conda-build/en/stable/concepts/generating-index.html>

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).