Skip to content

build: Migrate to "StageX 2.0"#376

Merged
str4d merged 4 commits into
zcash:mainfrom
antonleviathan:fix/update-zaino
Mar 18, 2026
Merged

build: Migrate to "StageX 2.0"#376
str4d merged 4 commits into
zcash:mainfrom
antonleviathan:fix/update-zaino

Conversation

@antonleviathan
Copy link
Copy Markdown
Contributor

The StageX tree underwent a major overhaul to switch from being GNU-native to LLVM-native for much better cross compiling support, among many other advantages. It is essentially StageX 2.0, but introducing minor breaking changes. We should have done a better job with release notes but now that we have landed this work (which was ongoing over most of the last year) we are shifting our efforts to: stability, massive compile-time and run-time speed improvements, hardening, timely and largely automated reproductions and updates, better release notes and docs! A lot of people are using stagex fast as it is the only thing that solves the supply chain problem well, but we have some catching up to other distros to do in terms of docs and examples to be sure.

In addition to documentation, our hope is our upcoming tool "sxctl" will be able to automate migration and updates to stagex based containerfiles.

I've done the following to fix your build:

  • Bumped protobuf and abseil-cpp to latest
  • Removed the user-runtime which is deprecated now, and switched to core-filesystem, then added USER 1000:1000 as that will likely be removed from core-filesystem
  • Added rust flags which explicitly instruct usage of clang and lld during compilation, and removed the rocksdb env var ROCKSDB_PKG_CONFIG=0 as the flags override that env var

Going forward, it should be sufficient to bump these packages to the latest release.

For now some of us just include scripts like https://git.distrust.co/public/airgap/src/branch/main/src/update.py to simplify discovery of the latest set of built-together hashes.

Thanks for being patient with this and don't hesitate to poke me if you have any issues, I'm happy to help unblock you and make the distro work better for you.

Comment thread Dockerfile Outdated
Comment thread Dockerfile Outdated
@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Feb 16, 2026

Looks like CI is failing at the "Run command inside Docker container" step:

Run docker run --rm zallet -h
  docker run --rm zallet -h
  shell: /usr/bin/bash -e {0}
Error: Process completed with exit code 139.

Example of a successful run: https://github.com/zcash/wallet/actions/runs/21872823726/job/63132877649

@antonleviathan
Copy link
Copy Markdown
Contributor Author

Yes currently the binary is seg-faulting because it's not properly statically linked. I'm trying to fix it.

@antonleviathan
Copy link
Copy Markdown
Contributor Author

Encountered more issues but we are continuing to work on resolving this. Thanks for the patience, and I'll keep you posted. Are there any big deadlines or blockers if this PR is delayed?

@str4d str4d force-pushed the update-zaino branch 3 times, most recently from 1121b2f to 16f77f6 Compare February 23, 2026 21:06
@antonleviathan
Copy link
Copy Markdown
Contributor Author

There are some issues with building with network=none so we're still debugging that. Looks like a rocksdb issue.

@RyanSquared
Copy link
Copy Markdown
Contributor

It's been mentioned elsenet that network=none isn't required for now, so we're moving onto reproduction issues. There appear to be two issues currently: age has a heisenbyte in its assembly that changes every build, and zaino-state has a datestamp. The latter can be overridden using SOURCE_DATE_EPOCH, so we're setting that for the build and will test reproduction with that. The former issue is a bit more difficult to tackle, as the only indication of the issue is a single value in generated assembly being different.

@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 11, 2026

age has a heisenbyte in its assembly that changes every build [..] the only indication of the issue is a single value in generated assembly being different.

I can't think of anything in age itself (as opposed to its dependencies) that might contribute to this. If you can isolate the assembly to a specific (even monomorphised) function, I can try and provide more guidance.

@RyanSquared
Copy link
Copy Markdown
Contributor

age has a heisenbyte in its assembly that changes every build [..] the only indication of the issue is a single value in generated assembly being different.

I can't think of anything in age itself (as opposed to its dependencies) that might contribute to this. If you can isolate the assembly to a specific (even monomorphised) function, I can try and provide more guidance.

I'll definitely pass more info along once I've resolved the datestamp. Everything else (besides datestamp and rage) in the rlibs appears to be rmeta fields, which don't make it into the end binary.

@RyanSquared
Copy link
Copy Markdown
Contributor

and as a testament to checking all the rlibs before chasing down a single potential issue to its most extreme... I've got three bit-for-bit reproductions, just after changing SOURCE_DATE_EPOCH.

I need to do some investigation into caching - I witnessed a 1.8x speedup by removing all caching, so I'll see if I can reapply my fixes on the branch, cleanly, and reproduce any caching mismatches again.

@RyanSquared
Copy link
Copy Markdown
Contributor

It looks like there's two attempts at caching going on and they appear to be conflicting with each other. One attempt builds the dependencies in a previous stage, while the other mounts a new cache over the copied-over assets, requiring them to be built again. I'd recommend skipping the build-deps layer entirely and just using the mounted cache (which is actually supposed to be at /usr/src/app/zallet/src/bin/zallet/target). I'd like feedback on whether this change is approved, since it's not strictly related to reproducibility.

@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 12, 2026

The intention of the Docker layering is that updates to the Zallet codebase that don't affect dependencies only require rebuilding the second layer, reusing the binary artifacts from the first Docker layer. If this is not correctly working then yes this should be changed to something that does work.

I don't actually know much myself about how --mount=type=cache works in Dockerfiles - is the cache actually persistently cached in a similar way to how the Dockerfile layers are cached? If not, how is it handled?

@RyanSquared
Copy link
Copy Markdown
Contributor

is the cache actually persistently cached in a similar way to how the Dockerfile layers are cached?

First, to clarify how Dockerfile layers are cached: Dockerfile layer caching assumes that, given all inputs remain the same, all outputs remain the same. It's similar to a pure function. This means that any input changing at a previous layer, would result in the entire cache being unusable, because we assume that any input changing in a prior step invalidates results of a further step. It is less of a "cache", and more of a "snapshot".

This used to be the only way to do caching in Docker, and resulted in patterns used by projects like cargo hakari, which use a workspace-hack crate to prebuild all the dependencies. The target directory would then be usable in later stages where you ADD your runtime crate and do another build, using the same dependencies as the workspace-hack.

However, if you change any dependencies, or if you had multiple crates and you changed any of the other crates in your workspace, the layer still becomes invalidated and all of the existing cache is no longer applicable.


The --mount acts more like a "true" cache. Even if the previous inputs are invalidated (dependencies added, code changed, etc.), the cache still exists and is still provided. This means you can add dependencies, take out dependencies, and do whatever else without actually affecting the cache. This is functionally equivalent to how the target directory works - it is not cleared between cargo invocations even if your project has some pretty massive refactors, and only a cargo clean (or in the case of Docker, docker build --no-cache) will actually delete the cache.

For optimizing CI, I would heavily recommend making use of --mount type=cache style caching for both fetch and build stages, and only splitting up cargo fetch and (cargo build or cargo install) without making any use of a cargo-hakari or "blank main.rs" style caching (note: I always prefer cargo build, because cargo install has a tendency to ignore lockfiles sometimes), when focusing on the Cargo target directory. It is the most aggressive form of caching and should always result in the fastest build short of not changing anything - which would already be resolved by Docker build layers.

@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 13, 2026

Sounds good, let's do that then

@RyanSquared
Copy link
Copy Markdown
Contributor

My latest commit uses the --mount type=cache and a cargo fetch layer. Two builds have reproduced, but it will not reproduce the previous commit, as I am now sharing the Cargo root between layers (rather than stages) and have dropped --cargo-root (and Cargo tracks system directories when generating code, which I deeply despise, but stagex- and container-based builds work around).

@RyanSquared
Copy link
Copy Markdown
Contributor

Modifying a source in a way that doesn't trigger a fetch will take about 1.5 seconds on the fetch step. Every modification that isn't "Docker pure" will result in a full rebuild of the zallet crate, because Cargo's computational unit is a crate, but this takes about 3 minutes with hot cache, as opposed to 5 minutes with zero cache.

- bump pallet-rust, user-protobuf, user-abseil-cpp
- add pallet-clang
- replace core-user-runtime with core-filesystem
- add USER 1000:1000 to runtime stage
- add flags for clang and lld to be used
- remove ROCKSDB_USE_PKG_CONFIG=0 as it's unneccessary due to flags added
@str4d str4d changed the base branch from update-zaino to main March 18, 2026 00:09
@str4d str4d force-pushed the fix/update-zaino branch from b160f7e to 5efa376 Compare March 18, 2026 00:09
@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 18, 2026

Rebased on main to decouple these fixes from #358.

@str4d str4d self-requested a review March 18, 2026 00:31
@str4d str4d changed the title fix: repair stagex build build: Migrate to "StageX 2.0" Mar 18, 2026
@str4d str4d added the A-build Area: Build system label Mar 18, 2026
@str4d str4d force-pushed the fix/update-zaino branch from 5efa376 to f9761e6 Compare March 18, 2026 00:34
@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 18, 2026

Force-pushed to undo the deletion of some comments (which made sense for one of the intermediate states, but does not with the current state of the PR) and fix an indentation bug.

@str4d str4d force-pushed the fix/update-zaino branch from f9761e6 to 5616ec9 Compare March 18, 2026 00:49
@str4d
Copy link
Copy Markdown
Collaborator

str4d commented Mar 18, 2026

Force-pushed to make the diff a bit smaller.

Copy link
Copy Markdown
Collaborator

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5616ec9

Comment thread Dockerfile
FROM stagex/user-protobuf@sha256:5e67b3d3a7e7e9db9aa8ab516ffa13e54acde5f0b3d4e8638f79880ab16da72c AS protobuf
FROM stagex/user-abseil-cpp@sha256:3dca99adfda0cb631bd3a948a99c2d5f89fab517bda034ce417f222721115aa2 AS abseil-cpp
FROM stagex/core-user-runtime@sha256:055ae534e1e01259449fb4e0226f035a7474674c7371a136298e8bdac65d90bb AS user-runtime
FROM stagex/pallet-rust:1.91.1@sha256:4062550919db682ebaeea07661551b5b89b3921e3f3a2b0bc665ddea7f6af1ca AS pallet-rust
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that StageX only provides images for a small set of specific Rust versions, going forward I intend us to decouple (as much as possible) the Zallet MSRV from these. That means:

  • Normal development uses rustup like usual with whatever our MSRV is.
  • The stagex/pallet-rust version pinned here in this Dockerfile should be the most recent stable Rust release provided by StageX.
  • CI confirms via the test-docker-container.yml workflow that MSRV is not higher than this pinned Rust version.

@str4d str4d enabled auto-merge March 18, 2026 00:53
@str4d str4d merged commit 6da1075 into zcash:main Mar 18, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-build Area: Build system

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants