Skip to content
127 changes: 127 additions & 0 deletions cep-XXXX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# CEP XXXX - Package metadata files served by conda channels

<table>
<tr><td> Title </td><td> CEP XXXX - Package metadata files served by conda channels </td>
<tr><td> Status </td><td> Draft </td></tr>
<tr><td> Author(s) </td><td> Jaime Rodríguez-Guerra &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Sep 30, 2025 </td></tr>
<tr><td> Updated </td><td> Sep 30, 2025 </td></tr>
<tr><td> Discussion </td><td> N/A </td></tr>
<tr><td> Implementation </td><td> N/A </td></tr>
<tr><td> Requires </td><td> https://github.com/conda/ceps/pull/133 </td></tr>
</table>

## Abstract

This CEP standardizes the schema for the package metadata (repodata) files served in conda channels. Namely, `repodata.json` and its variants.

## Motivation

The motivation of this CEP is merely informative. It describes the schema of existing metadata files already in wide use.

## Specification

As per [CEP 26](./cep-0026.md), a conda channel is defined as a location that MUST serve a `noarch/repodata.json` path. It MAY also serve additional, platform-specific `repodata.json` paths under other subdirectories of the same depth, which MUST follow the `subdir` naming conventions described in [CEP 26](./cep-0026.md).

> Note that there are no requirements for these paths to be backed by a proper filesystem; the contents of these locations can also be provided by API endpoints.

`repodata.json` documents are subdir-specific JSON dictionaries that aggregate the `index.json` metadata of the included conda artifacts (see [CEP PR#133](https://github.com/conda/ceps/pull/133)), and extend them with details only known when the compressed artifact has been generated (such as size, timestamp, or checksums).

### Schema

Each `repodata.json` MUST represent a dictionary with the keys listed below. All of them are optional. Additional top-level keys MUST be allowed but they MUST be ignored if not recognized.

- `info: dict[str, dict]`. Metadata about the `repodata.json` itself. See [info metadata](#info-metadata).
- `packages: dict[str, dict]`. This entry maps `*.tar.bz2` filenames to their [package record metadata](#package-record-metadata).
- `packages.conda: dict[str, dict]`. This entry maps `*.conda` filenames to [package record metadata](#package-record-metadata).
- `removed: list[str]`. List of filenames that were once included in either `packages` or `packages.conda`, but are now removed. The corresponding artifacts SHOULD still be accessible via their direct URL.

A `signatures: dict[str, dict]` key MAY be present, but SHOULD be ignored. This key was introduced as a proprietary extension by Anaconda, but it is not part of the repodata v1 specification.

#### `info` metadata

This dictionary stores information about the repodata file. It MUST follow this schema:

- `arch: str`. Deprecated. Same meaning as in [CEP PR#133](https://github.com/conda/ceps/pull/133)'s `index.json` key.
- `base_url: str`. Optional. See [CEP 15](./cep-0015.md).
- `platform: str`. Deprecated. Same meaning as in [CEP PR#133](https://github.com/conda/ceps/pull/133)'s `index.json` key.
- `repodata_version: int`. Optional. Version of the `repodata.json` schema. In its absence, tools MUST assume its value is `1`. See [CEP 15](./cep-0015.md) for `repodata_version = 2`.
- `subdir: str`. Recommended. The channel subdirectory this `repodata.json` belongs to. If its absence, its value MAY be inferred from the parent component of the `repodata.json` path.

Additional keys SHOULD NOT be present and SHOULD be ignored.

#### Package record metadata

Each entry in `packages` and `packages.conda`:

- MUST follow the `index.json` schema (see [CEP PR#133](https://github.com/conda/ceps/pull/133)).
- SHOULD report the same values as the artifact's `info/index.json` metadata. Small modifications MAY be introduced to apply metadata fixes (e.g. correct the constraints of a requirement in the `depends` field) without needing to rebuild the artifact.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rule, or recommendation for how consumers should resolve differences between the artifact's info/index.json and the repodata info for the package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause is introduced mostly to support repodata patching. repodata.json has precedence at install time. See https://github.com/jaimergp/ceps/blob/conda-meta-json/cep-XXXX.md ("Management of a conda environment").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small, medium or large modifications perhaps.

- MUST additionally include the following keys:
- `md5: str | None`. Hexadecimal string of the MD5 checksum of the compressed artifact.
- `sha256: str | None`. Hexadecimal string of the SHA256 checksum of the compressed artifact.
- `size: int`. Size, in bytes, of the compressed artifact.
- If the entry corresponds to a `.tar.bz2` package that was transmuted to `.conda`, it SHOULD include these keys:
- `legacy_bz2_md5: str`: Hexadecimal string of the SHA256 checksum of the original `.tar.bz2` artifact.
- `legacy_bz2_size: int`: Size, in bytes, of the original `.tar.bz2` artifact.

Additional keys SHOULD NOT be present and SHOULD be ignored.

### Repodata variants

A conda channel MAY serve additional `repodata.json` documents in each subdir. Their name SHOULD match the glob `*repodata*.json`, and their contents MUST follow the `repodata.json` schema.
Common variants include `current_repodata.json`, which aggregates a subset of the full repodata document, focusing on the latest versions of each package plus their necessary dependencies.

Channels SHOULD serve compressed versions of every repodata file. The following compression schemes are recognized:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's a SHOULD tbh; I think it should be a MUST, since practically speaking, conda-index does produce both. I'd think, for the sake of running compatible channels, we should clearly dictate what the minimum operationally reasonable set of files is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you are creating a local channel on your machine you don't need the compressed repodata for any practical reason. And, I think that should still be a valid channel.

Does it make sense to have like an implementation recommendation section? That can information like:

  • It's practical to serve compressed versions of repodata
  • The channeldata.json file is not required
    etc.

we should clearly dictate what the minimum operationally reasonable set of files is.

+1!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the compression is an optimization of distribution-sized channels, which I guess sharding solves differently. I guess the source of truth is just the repodata.json, and having a "deployment" or "best practices" section is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rattler-index allows to not write the zstd file: https://github.com/conda/rattler/blob/55014fa15730502bfea1f185c0c2c539672b9d55/crates/rattler_index/src/lib.rs#L999

i think it should be optional (but recommended) as i can see use cases for personal small channels where it's not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and having a "deployment" or "best practices" section is a good idea.

I'd rather leave that to the aggregated conda.org documentation, because those best practices may change over time and there's no need to "vote" on them on CEP.

Copy link
Contributor

@dholth dholth Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason to allow missing repodata.json.zst is to support old, frozen channels. Backwards compatibility is necessary.
How long until we have sharded-repodata-only channels that provide no repodata.json?


- BZ2: MUST append the `.bz2` extension; e.g. `repodata.json.bz2`.
- ZSTD: MUST append the `.zst` extension; e.g. `repodata.json.zst`. Recommended.

## Examples

A minimal conda channel only needs a single, empty file:

```text
./noarch/repodata.json
```

A conda channel with a Linux x64 specific subdirectory:

```text
./noarch/repodata.json
./linux-64/repodata.json
```

## References

- <https://docs.conda.io/projects/conda-build/en/stable/concepts/generating-index.html>

## Appendices

### Appendix A: `signatures` section

This dictionary maps conda package filenames (with extension) to a signature metadata dictionary. Each subdictionary then maps the signing key identifier to the signature value. This value is expressed as a dictionary with a key `signature` that maps to the actual signature of the corresponding package record. See example:

```js
"packages": {
...
},
"packages.conda": {
...
},
"signatures": {
"_anaconda_depends-2018.12-py27_0.tar.bz2": {
"4a044c3445b9d8bc5429a2b1d7d42bdb4d8404285b76322e8eacdfdae8b0e4cd": { // signing key id
"signature": "a0ffab3f954c3dc64373ba16bee5e9ba9683a625fa3e4a6c4263d9de550bcafd233c2522789c9b31b40c35a87775d6f8fa2498a3bec3647c36c0a2f5cd2eb10c" // signature value
}
},
"zstd-1.3.7-h0b5b093_0.conda": {
"4a044c3445b9d8bc5429a2b1d7d42bdb4d8404285b76322e8eacdfdae8b0e4cd": {
"signature": "ea1f11a74c081298fe243c6982f676d9838bfee81e74a24bef6474f3be1243b4624f6d12dc8196f8db909cf049e9e344151e44c5b950cbab8583641c7b661a0d"
}
}
}
```

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).