Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 218 additions & 29 deletions cep-sigstore-predicate.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CEP - Standardizing the v1 predicate for sigstore attestations
# CEP - Standardizing a publish attestation for the conda ecosystem

<table>
<tr><td> Title </td><td> Standardizing the v1 predicate for sigstore attestations </td>
<tr><td> Title </td><td> Standardizing a publish attestation for the conda ecosystem </td>
<tr><td> Status </td><td> Proposed </td></tr>
<tr><td> Author(s) </td><td> Wolf Vollprecht &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Feb 18, 2025 </td></tr>
Expand All @@ -12,61 +12,250 @@

## Abstract

We want to standardize attestations for the conda ecosystem.
This CEP proposes a standard attestation layout for the conda ecosystem.
This attestation layout is based on the [in-toto] framework
and will enable further integration with signing schemes like
[Sigstore].

### Sigstore Attestations
## Definitions and Concepts

Sigstore attestations are cryptographic statements about software artifacts that provide:
- An **attestation** is a machine-readable cryptographically signed statement.
When an attestation's signature is verified against a trusted key, that
verification provides integrity and authenticity guarantees about the
attestation's subject. For example:

- Authenticity: Proof of who created/signed the artifact
- Integrity: Verification that the artifact hasn't been tampered with
- Transparency: Public record of signatures in a tamper-evident log
- Alice is the maintainer of the `widgets` package.
- Alice signs a machine readable statement equivalent to the following
English sentence, producing her attestation:

### Key Components
> Alice published the `widgets` package at version v1.2.3 with
> hash `sha256:abcd...` to the `conda-forge` channel.

- Predicates: JSON documents containing metadata about the signing event, using the `in-toto` format
- Signatures: Cryptographic proofs made using ephemeral keys
- Rekor: A tamper-evident log that stores attestations
- Fulcio: A certificate authority that issues short-lived certificates
- Bob establishes trust in Alice's public key.
- Bob can verify the attestation's signature against Alice's public key,
giving him confidence that the statement is true.
- Correspondingly, Bob can reject any statement for `widgets` that is not
signed by Alice's public key.

In this document, we want to standardize the sigstore predicate for conda packages. The bundle format to be used for sigstore attestations is the `v0.3` bundle format.
- [in-toto] is a framework and standard for defining attestations.

- Within in-toto, an attestation's statement is composed of a
**subject** and a **predicate**. The subject is the resource
(or resources) being attested to, and the predicate is a
an arbitrary collection of metadata about the subject.
The predicate is identified by a **predicate type**,
which defines the predicate's expected schema.

- [Sigstore] is a project that enables misuse-resistant software signing
and verification via short-lived certificates and a tamper-evident log.
Sigstore composes with attestation frameworks like in-toto to provide
transparency and misuse-resistance properties on top of the integrity
and authenticity properties of attestations.

One of Sigstore's major misuse-resistance contributions is
the use of *ephemeral keys* for signing. Modifying the example above:

- Instead of maintaining a long-lived signing key, Alice generates an
*ephemeral key* and binds it to her *identity*
("`[email protected]`").

This binding is done via a certificate issued by [Fulcio], which verifies a
*proof of possession* (such as from [OpenID Connect]) from Alice for her
claimant identity. The certificate issued by Fulcio is, in turn auditable
via [RFC 6962] Certificate Transparency (CT) logs.

- Alice signs her attestation with her ephemeral key, and distributes a
"bundle" containing both her attestation and her signing certificate.

- Instead of establishing trust a long-lived key from Alice, Bob establishes
trust in Alice's identity.

- Bob can verify the attestation's signature against Alice's emphemeral key,
which in turn can be verified as authentically Alice's via the Fulcio-
issued certificate.

With this flow, neither Alice nor Bob needs to maintain long-lived signing
or verifying keyrings, in turn reducing the attacker surface for key
compromise.

Another key misuse-resistance contribution within Sigstore is *machine
identities*. A machine identity behaves similarly to a human identity
(Alice or Bob), but identifies a machine instead of a human. For example,
`github.com/example/example/.github/workflows/release.yml@refs/tags/v1.2.3`
could be the machine identity of a GitHub Actions workflow that ran from
`release.yml` within `example/example` against the `v1.2.3` tag.

## Motivation

The conda ecosystem contains metadata that answers the following questions,
in part or in full:

* _Who_ (or _what_) published this package?
* _What_ is the package's hash?
* _Where_ was this package _published from_, and where _to_?
* _When_ was this package published?

However, this metadata is not currently **cryptographically verifiable**:
the consuming party must either trust it as presented, or verify it manually
against independent sources of truth (such as a project's release history).

Attestations that present this metadata in a cryptographically
verifiable manner are desirable for a number of reasons:

* Package maintainers wish to demonstrate the integrity and authenticity
of their package uploads;
* Individual downstream users wish to verify the integrity and authenticity of
packages they consume, without placing additional trust in the
channel or channel's hosting server;
* Attestations change the sophistication and risk profile for attackers in
defenders' favor: the attacker must be sufficiently sophisticated
to access private key material, *and* have a risk tolerance profile that
accepts exposure via auditable transparency logs.

More broadly, attestation schemes like the one proposed in this CEP have
seen adoption in similar and related ecosystems:

* Python (PyPA/PyPI): [PEP 740] and [PyPI - Attestations]
* NodeJS (npm): [npm - Generating provenance statements]
* Ruby (RubyGems): [rubygems/release-gem]

## Specification

The in-toto predicate should contain the following fields:
### Attestation format

This CEP proposes the following attestation statement layout, using the
[in-toto Statement schema]:

- `predicateType` **MUST** be `https://schemas.conda.org/attestations/publish/v1`
- `subject` **MUST** be a single [`ResourceDescriptor`], with the following
constraints:
- `subject[0].name` **MUST** be the full filename of the conda package
that will be part of the `repodata.json` and under which it will appear on
the server.
- `subject[0].digest` **MUST** be a [`DigestSet`], and it **MUST** contain
a single `sha256` entry with the SHA256 hash of the conda package.
- `predicate` **MAY** be present. If present and not `null`, it **MUST** be a
JSON object with the following fields:
- `targetChannel` **MUST** be a string, indicating where the package
is being uploaded to. This field **MUST** be a valid URL with no
trailing slashes.
Comment on lines +139 to +141
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging: the original language said "canonical URL" for the targetChannel, but I couldn't find a clear resource on what canonicalization of channel URLs looks like in the conda ecosystem. Does conda have a definition for a "canonical" URL, or by "canonical" did you mean something like "official"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is becoming a bit of a problem for the conda ecosystem these days. Currently, when you say -c conda-forge, the URL is resolved to https://conda.anaconda.org/conda-forge which is the "default" place for teh open source channels. But I think we want to be more precise and you should indicate the URL that matches exactly with the channel that you would also download the package from later, whcih can reside on different hosts (for example, for prefix.dev it would look like https://prefix.dev/my-channel).

A channel / organization would have to decide for themselves where the primary channel lives (and which URLs are merely mirrors). For example, for conda-forge right now this would be https://conda.anaconda.org/conda-forge. At some point the organization might decide to switch to https://prefix.dev/conda-forge - at which point we'd have to configure clients to trust either those two, or make a decision based on a timestamp which one is the primary source during what periods.

I also have a proposal open for a community governed channel registry that would live e.g. as a JSON file on Github: conda#91


An example of a compliant statement is provided below:

```json
{
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [{
"name": "file-name-0.0.1-h123456_5.conda",
"digest": {"sha256": "..."}, ...
"digest": {"sha256": "01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b"},
}],
// Schema URL
"predicateType": "https://schemas.conda.org/predicate-v1.json",
"predicateType": "https://schemas.conda.org/attestations/publish/v1",
"predicate": {
// Canonical URL of the target channel
"targetChannel": "https://prefix.dev/conda-forge",
}
}
```

The `subject` field is already defined in the in-toto specification and contains the name of the package and its digest.
For conda packages a SHA256 hash MUST be used.
The subject MUST be the full filename of the conda package that will be part of the repodata.json and under which it will appear on the server.
### Signing and distributing

This CEP recommends the following signing process:

1. The signer (i.e. Alice or Alice's trusted machine identity) uses a
[Sigstore]-compatible client to generate an ephemeral keypair and bind it to
their identity via a public certificate.
2. The signer generates an in-toto statement as described above, and
produces an attestation by signing that statement with their ephemeral
private key.
3. The signer uploads their attestation to the Sigstore transparency log
as a [DSSE] envelope.
4. The signer produces a [Sigstore bundle] containing their certificate,
attestation, and transparency log inclusion proof.

Each of these steps is performed transparently by a Sigstore client like
[sigstore-python], except for step (2) as it concerns the specific
layout of the signed-over statement.

The result of this process is a single Sigstore bundle, which can be
distributed alongside the conda package or otherwise made discoverable.

This CEP does not proscribe a distribution mechanism. Prior art for distribution
mechanisms can be found in the PyPI and RubyGems ecosystems, e.g.
[PyPI's Integrity API].

The `predicateType` field is used to specify the schema of the predicate. The `predicate` field contains the actual predicate data.
We propose to publish a schema to validate the `predicate` field. The schema will be available at `https://schemas.conda.org/predicate-v1.json`.
### Verifying

The predicate MUST contain the `targetChannel` field, to indicate where the package is being uploaded to. This field MUST be validated by the receiving server. The channel MUST be in canonical form (full URL, no trailing slashes).
This CEP recommends the following verification process:

1. The verifier retrieves Alice's conda package and associated
Sigstore bundle.
1. The verifier performs a standard Sigstore verification process against
the bundle, using Alice's identity (or machine identity) as the
signing identity. This process produces a verified in-toto statement.

This step requires the verifier to establish trust in the identity
being verified against.

Exact mechanisms for establishing this trust are
outside the scope of this CEP. However, one option is a TOFU (trust on first
use) scheme with an attestation-aware conda channel, where package names
are "locked" to attesting identities on first use, with subsequent updates
being verified against that identity.

1. The verifier checks the in-toto statement for consistency against their
ground truth:

- The `predicateType` field **MUST** be `https://schemas.conda.org/attestations/publish/v1`.
- The `subject[0].name` field **MUST** match the filename of the conda package.
- The `subject[0].digest` field **MUST** match the SHA256 hash of the conda
package.
- The `predicate.targetChannel` field **SHOULD** match the channel that
the package was retrieved from, if `predicate` is present. However, the
verifier **MAY** choose to allow a channel mismatch, e.g. if the known
context is a mirroring context (where the conda package was originally
published to a different channel, but is now being consumed from
a mirror).

At the end of this process, the verifier is confident in the following facts:

- The package was published by the signer (Alice or Alice's machine identity).
- If the publisher is a machine identity, this further establishes source
provenance via the machine identity's claims. See [Sigstore OID information]
for additional information on these claims.
- The package is authentic and integral modulo trust in the signer.

## Discussion

This predicate adds basic verifiable facts about the package. It will tie the producer of the package to the target channel.
This is similar to what PyPI has implemented with the [PyPI publish attestation](https://docs.pypi.org/attestations/publish/v1/). Since there is no single authoritative index in the Conda world, we add the `targetChannel` field to reach parity.
This predicate adds basic verifiable facts about the package. It will tie the
producer of the package to the target channel. This is similar to what PyPI has
implemented with the [PyPI publish
attestation](https://docs.pypi.org/attestations/publish/v1/). Since there is no
single authoritative index in the Conda world, we add the `targetChannel` field
to reach parity.

On the server, the certificate should be tested against the Trusted Publisher used to upload the certificate to establish a chain of trust.
On the server, the certificate should be tested against the Trusted Publisher
used to upload the certificate to establish a chain of trust.

## Future work

Once sigstore attestations are established and more research has been done, we might want to use the [SLSA (Supply-chain Levels for Software Artifacts)](https://slsa.dev) spec as base for predicates in the conda ecosystem.
Once sigstore attestations are established and more research has been done, we
might want to use the [SLSA (Supply-chain Levels for Software
Artifacts)](https://slsa.dev) spec as base for predicates in the conda
ecosystem.

[in-toto]: https://in-toto.io
[Sigstore]: https://sigstore.dev
[Fulcio]: https://github.com/sigstore/fulcio
[RFC 6962]: https://datatracker.ietf.org/doc/html/rfc6962
[OpenID Connect]: https://openid.net/connect/
[PEP 740]: https://peps.python.org/pep-0740/
[PyPI - Attestations]: https://docs.pypi.org/attestations/
[npm - Generating provenance statements]: https://docs.npmjs.com/generating-provenance-statements
[rubygems/release-gem]: https://github.com/rubygems/release-gem
[in-toto Statement schema]: https://github.com/in-toto/attestation/blob/main/spec/v1/statement.md
[`ResourceDescriptor`]: https://github.com/in-toto/attestation/blob/main/spec/v1/resource_descriptor.md
[`DigestSet`]: https://github.com/in-toto/attestation/blob/main/spec/v1/digest_set.md
[DSSE]: https://github.com/secure-systems-lab/dsse/blob/master/envelope.md
[Sigstore bundle]: https://docs.sigstore.dev/about/bundle/
[sigstore-python]: https://github.com/sigstore/sigstore-python
[Sigstore OID information]: https://github.com/sigstore/fulcio/blob/main/docs/oid-info.md
[PyPI's Integrity API]: https://docs.pypi.org/api/integrity/