feat: add 2-pass release build for Dynamic Dispatch table#119
feat: add 2-pass release build for Dynamic Dispatch table#119bdero wants to merge 22 commits intoshorebird/devfrom
Conversation
bcc42c8 to
5572ff2
Compare
064ab50 to
d13559f
Compare
760f8cf to
0eb78de
Compare
| } | ||
|
|
||
| // Run analyze_snapshot to compute DD table + caller links. | ||
| final analyzeSnapshotPath = _genSnapshot.getAnalyzeSnapshotPath(snapshotType, darwinArch); |
There was a problem hiding this comment.
So historically, all of the analyzed snapshot usage was done down in AOT tools. Are you sure you want to pull this up into the Flutter tool?
There was a problem hiding this comment.
I mean, they're kind of all one and the same because they're version locked between them, so I'm not sure that it matters a ton. Maybe this is the right place to put it?
There was a problem hiding this comment.
I think part of what made me suspect is that there's clearly a gen snapshot invoker at this level, but there isn't an analyze snapshot invoker. down at AOT tools there's both.
There was a problem hiding this comment.
bdero's review bot: Going with "leave it here for now". The call site needs to feed flags directly into the very next gen_snapshot invocation in this same build() method, so a hand-off through aot_tools would mean shipping the args back up anyway. Worth a follow-up to mirror GenSnapshot with a proper AnalyzeSnapshot invoker class — the current getAnalyzeSnapshotPath getter on GenSnapshot is a code smell and the natural home for that class — but I'd prefer that as a focused refactor rather than expanding this PR.
When building for arm64 Apple platforms with the linker enabled, run gen_snapshot twice: first in ELF mode to produce a temporary snapshot for analyze_snapshot to compute the DD table manifest, caller links, and slot mapping, then in assembly mode with --dd_slot_mapping to produce the final snapshot with indirect calls wired up. The DD table files (App.dd.link, App.dd_callers.link) are copied into the shorebird supplement directory alongside the existing link files so they can be bundled with releases and used during patch builds.
The 2-pass DD table build runs gen_snapshot in ELF mode before the main assembly pass. Update tests to expect this additional command.
The arm64 DD analysis pass delays the arm64 assembly, so x86_64 (which skips DD) completes its build first when both run concurrently via Future.wait.
Move the analyze_snapshot existence check before the gen_snapshot ELF pass so the entire DD computation is a no-op on standard Flutter SDKs that don't ship analyze_snapshot. This fixes iOS smoke test failures where gen_snapshot was being invoked unnecessarily in ELF mode. Also reverts test changes that are no longer needed since DD commands won't appear when analyze_snapshot doesn't exist in the test filesystem.
The arm64 build's async _computeDDTable() check (even a no-op when analyze_snapshot is absent) introduces an await that lets x86_64 reach gen_snapshot first in Future.wait. Reorder test expectations to match the actual interleaving: x86_64 before arm64 at each step.
The DD slot mapping now uses kernel_offset-based function matching (instead of function names). The base build must export an identity side file during gen_snapshot pass 1 and pass it to analyze_snapshot --compute_dd_slot_mapping. Without this, the DDSlotMapping has empty kernel_offset_to_slot and FinalizeIndirectStaticCallTable can't assign any DD slots, resulting in an empty DD table in the base snapshot.
Read SHOREBIRD_DD_MAX_BYTES from the environment to allow overriding the cascade limiter threshold. Defaults to 10000 if not set. An environment variable is used (rather than a command-line flag) so that older Flutter builds without DD table support silently ignore it.
Updates dart_sdk_revision to include SIMARM64 simulator fixes (DoRedirectedCall, ClobberVolatileRegisters, Execute reason param) needed for ios_debug engine builds.
Snapshot of work in progress on the base build's DD pipeline: - debug print in AOTSnapshotter.build for usesDDTable diagnosis - BUILD.gn change to place analyze_snapshot in universal/ next to gen_snapshot so flutter_tools can resolve it by path substitution Committing to a clean base before restructuring the pipeline to support the pre-DD optimized pass added in the dart-sdk patch flow.
When shorebird_flutter ships with the DD table 2-pass release build
enabled but the underlying engine's gen_snapshot binary predates the
DD table work (e.g. a user has downgraded their engine cache, or is
running against a Shorebird release from before DD landed),
gen_snapshot hard-errors on the ELF pass with "Setting VM flags
failed: Unrecognized flags: print_dd_function_identity_to" and the
entire release build fails. Older engines simply don't know about
`--print_dd_function_identity_to`, `--dd_slot_mapping`, or any of the
DD table flag family.
Add a capability probe in _computeDDTable that runs gen_snapshot once
with the DD flag plus a bogus kernel input. Two possible failure
modes distinguish support:
- Flag recognized → flag parsing passes, kernel load fails:
"Can't load Kernel binary: File size is too small to be a valid
kernel file."
- Flag not recognized → VM init fails at flag parsing:
"Setting VM flags failed: Unrecognized flags:
print_dd_function_identity_to"
If stderr contains the "Unrecognized flags" token, skip the entire
DD pipeline and fall back to a plain single-pass Shorebird release
build. The cascade-limiter linker (in aot_tools) independently
handles the no-DD case by falling back to the CT pass's op.link for
final pass OP alignment (see dart-sdk commit 139dd8c864d), so the
patch side of the pipeline keeps working too.
Probe strategies that don't work:
- `gen_snapshot --help` doesn't list individual flags.
- `gen_snapshot --print_flags` exits early on "At least one input
is required" before dumping any flag info.
- Passing the flag without a snapshot kind and kernel silently
ignores it regardless of support.
The flag+bogus-kernel+snapshot-kind combination is the only
invocation that reliably distinguishes recognized-but-failed-later
from outright-rejected, on both DD-aware and pre-DD engines.
Result is cached per gen_snapshot path so multi-arch release builds
don't pay the probe cost more than once.
When SHOREBIRD_DD_MAX_BYTES is set, AOTSnapshotter now performs a 2-pass build: 1. Pass 1: gen_snapshot produces an ELF for analysis + DD identity file 2. analyze_snapshot computes DD table + caller links + slot mapping 3. Pass 2: gen_snapshot rebuilds with --dd_slot_mapping for DD-enabled code This produces DD-aware release snapshots where high-fanout cascade functions are routed through the indirect static call table, enabling the cascade limiter's link percentage benefit. Also adds DD supplement files (App.dd.link, App.dd_callers.link, App.dd_identity.link, App.dd_slots.link) to the LinkSupplement copy list so they're propagated to the Shorebird CLI's supplement directory.
- build_test.dart: GenSnapshot now requires a `fileSystem` parameter (added by this branch for the analyze_snapshot probe path); pass `MemoryFileSystem.test()` to the test constructor. - macos_test.dart: the DD pass is gated on SHOREBIRD_DD_MAX_BYTES, which isn't set in the test environment, so neither arm64 nor x86_64 has an extra async hop before gen_snapshot. With concurrent Future.wait the archs now reach gen_snapshot in iteration order (arm64 first, x86_64 second). Update the expected command sequence and the explanatory comment to match.
…raction) Three changes from Eric's review: 1. Drop the unused `reportTimings` field on AOTSnapshotter — added in 81304a9 but never read anywhere. 2. Replace the analyze_snapshot fallback iteration with explicit arch-based naming, mirroring how GenSnapshot.run picks its binary. 3. Extract the inlined DD 2-pass logic out of build() into _runDdAnalysisPass and _readDdMaxBytes, and collapse the five identical path joins into a local linkPath() helper.
Reverts the snapshot BUILD.gn change that moved the lipo'd
analyze_snapshot binary into ${root_out_dir}/universal/. That move
broke create_ios_framework.py:96, which still reads analyze_snapshot
from the build-dir root, and was only needed to make the
flutter_tools getAnalyzeSnapshotPath helper find the binary in
local-engine mode (where gen_snapshot resolves via .../universal/...).
Instead, getAnalyzeSnapshotPath now probes both dirname(genSnapshot)
and dirname(genSnapshot)/.., which finds:
* cached SDK layout: analyze_snapshot alongside gen_snapshot (probe 1)
* local-engine layout: analyze_snapshot one level up at the build-dir
root (probe 2)
Keeps engine packaging untouched and avoids coupling this PR to the
Shorebird CLI's view of the published artifact layout.
5ba9994 to
6f4f270
Compare
Upstream commit be89e40 (#138) split the updater C API into separate engine and dart cbindgen configs (cbindgen_engine.toml / cbindgen_dart.toml) and removed the combined cbindgen.toml, but the companion shell/common/shorebird/BUILD.gn change to update the declared inputs was not landed. Without this fix, the iOS engine build fails with: ninja: error: '../../flutter/third_party/updater/library/cbindgen.toml', needed by 'gen/flutter/shell/common/shorebird/rust_updater_<...>.stamp', missing and no known rule to make it Worth landing upstream as a follow-up on #138.
…oudly Two related hardenings to prevent silent DD-disable regressions like the one we just chased on Wonderous (link percentage cratered from 90% to 22.6% because a stale universal/analyze_snapshot binary from a prior BUILD.gn revision was selected, analyze_snapshot rejected gen_snapshot's output with "Wrong full snapshot version", and the DD pass swallowed the failure): 1. getAnalyzeSnapshotPath now verifies the candidate analyze_snapshot's --sdk_version output matches gen_snapshot's --version output. The strings include the build timestamp, so two binaries from the same build agree exactly. A stale binary left behind by a previous build configuration prints a different version line and is skipped. If the version probe itself fails (e.g. binary won't run), fall back to the exists-only check rather than fail-closed. 2. _runDdAnalysisPass now checks exit codes from both analyze_snapshot --compute_dd_table and --compute_dd_slot_mapping invocations, and from the post-conditions on the produced files, surfacing each failure with a concrete printError + non-zero return. Previously the calls used await without an exit-code check, so any failure (snapshot version skew, missing input, OOM, etc.) silently left App.dd.link unwritten and the build proceeded to gen_snapshot pass 2 with no --dd_slot_mapping. The release would publish without DD activation; subsequent patches would fall back to on-the-fly DD computation against a no-DD base and produce structurally divergent patches with devastating link percentages. Also prefer aborting early when getAnalyzeSnapshotPath returns null (versus silently shipping no DD), with a message that explains the patching consequence.
Summary
analyze_snapshotto compute DD table manifest, caller links, and slot mapping--print_dd_function_identity_to) mapping each Code object's InstructionsTable index to its Function'skernel_offset, which is passed toanalyze_snapshot --compute_dd_slot_mappingvia--dd_function_identityfor kernel_offset-based function matching--dd_slot_mappingApp.dd.link,App.dd_callers.link) are copied into the supplement directory alongside existing link filesTest plan