Avoid `inst.body()` duplicate call which reallocates `AllocId`s by dkcumming · Pull Request #120 · runtimeverification/stable-mir-json

dkcumming · 2026-02-19T15:21:44Z

This PR corrects a bug in generating the Stable MIR JSON artefacts. Prior to this fix allocations were being collected twice (calling Instance::body twice), which resulted in certain allocs (slices) having multiple allocation ids. The fix is to refer to previously collected items so that the collector does not assign allocation ids again.

Problem

There has been a disconnection between Ty and AllocId in the .smir.json artefacts (for slices). A constant should be able to be traced to an allocation by the Ty that will yield an AllocId. However attempting to do this lead to a case where the AllocId from the Ty was not in the Allocs map, and allocations in the map had AllocIds unseen in the rest of the .smir.json.

Cause

The disconnection betwen the Tys and AllocIds listed above came from there being a second collector pass calling Instance::body again that assigned new AllocIds for slices that were not connected to the Tys (still referencing the originals). The code that initiates the second collector pass is from stable-mir-json, while the code that re-allocates fresh ids is in rustc itself.

A rough trace of the pipeline goes like this:

Details

First collection initiated in collect_smir by let items = collect_items(tcx);
Each MonoItem is mapped to an Item by mk_item(tcx, item.clone(), name)
In mk_item if the MonoItem is MonoItemFn(inst: Instance) call Instance::body(&self) on inst by inst.body()
Instance::Body calls the rustc_smir Context::instance_body (for TablesWrapper that impl Context)
Context::instance_body calls BodyBuilder::new(tables.tcx, instance) to create a new BodyBuilder which then calls the BodyBuilder::build method
BodyBuilder::build creates mono_body which is of type rustc_middle::mir::Body (not stable_mir yet), which is then is converted to a stable_mir::mir::Body by mono_body.stable(tables)
Converting to a stable_mir::mir::Body through stable will convert all the BasicBlocks, Statements, Terminators, LocalDecls, and importantly rustc_middle::mir::Const into a stable_mir::ty::{TyConst,MirConst}
If the Const to covert is a rustc_middle::mir::Const::Val then it is converted with a new stable_mir::ty::Allocation as argument to the stable_mir::ty::ConstantKind::Allocated by ConstantKind::Allocated(alloc::new_allocation(ty, val, tables));
The creation of the new Allocation will match the ConstValue, and if it is ConstValue::Slice then the AllocId produced will be from let alloc_id = tables.tcx.reserve_and_set_memory_alloc(data); which will call TyCtxt::reserve_alloc_id and which will create a new allocation id by calling AllocMap::reserve <- IMPORTANT
Given 9. explains how the AllocId was created from calling inst.body(), let's skip over the remaining details in executing let items = collect_items(tcx); to collect_interned_values(tcx, items.iter().map(|i| &i.mono_item).collect::<Vec<_>>());
collect_interned_values iterates through items and if it matches a MonoItem::Fn(inst) then inst.body() is called again - PROBLEM. This repeats steps 4. - 9. again and at step 9. will allocate another AllocId for the InternedValueCollector that does not match what was originally allocated.

Solution

Once the Items are collected we can just thread a slice through and use the Body that is already collected which means inst.body() does not need to be called again and will not reallocate ids.

cds-amal · 2026-02-19T22:00:45Z

@dkcumming , your PR investigation/summary hints at a deeper issue: the pipeline is doing three conceptually related things as separate steps, each destroying information that the next step needs. I'll have to give this a think 🤔 🧐

dkcumming · 2026-02-20T05:36:17Z

@cds-amal I will circle back to #121 after some downstream work this fix opens up

@dkcumming

## What's this about? So, #120 fixed the immediate alloc-id mismatch by carrying collected items forward instead of re-fetching them. That was the right call. But it left the underlying structure intact: three separate phases (mk_item, collect_unevaluated_constant_items, collect_interned_values), each with full access to `TyCtxt`, each free to call `inst.body()` or any other side-effecting rustc query whenever it felt like it. Nothing in the types prevented that, and the bug was a direct consequence: one phase called `inst.body()` a second time, rustc minted fresh AllocIds, and suddenly the alloc map had ids that didn't correspond to anything in the stored bodies. The question is: how do we make that class of bug structurally impossible, rather than just fixed for the one case we caught? The full decision record is in [`ADR-002`](https://github.com/cds-rs/stable-mir-json/blob/dc/declarative-spike/docs/adr/002-declarative-pipeline-with-allocmap-coherence.md). ### The restructuring The fix is to split the pipeline into phases with type signatures that enforce the boundary: ```rust collect_and_analyze_items(HashMap<String, Item>) -> (CollectedCrate, DerivedInfo) assemble_smir(CollectedCrate, DerivedInfo) -> SmirJson ``` `CollectedCrate` holds items and unevaluated consts (the output of talking to rustc). `DerivedInfo` holds calls, allocs, types, and spans (the output of walking bodies). `assemble_smir` takes both by value and does pure data transformation; it structurally cannot call `inst.body()` because it has no `MonoItem` or `Instance` to call it on. That's the whole point: if you can't reach the query, you can't accidentally call it. The two body-walking visitors (`InternedValueCollector` and `UnevaluatedConstCollector`) are merged into a single `BodyAnalyzer` that walks each body exactly once. The fixpoint loop for transitive unevaluated constant discovery is integrated: when `BodyAnalyzer` finds an unevaluated const, it records it; the outer loop creates the new `Item` (the one place `inst.body()` is called) and enqueues it. ### But what about catching regressions? Turns out the existing integration tests normalize away `alloc_id`s (via the jq filter), so they literally cannot catch this class of bug. The golden files don't contain alloc ids at all; you could scramble every id in the output and the tests would still pass. `AllocMap` replaces the bare `HashMap<AllocId, ...>` with a newtype that, under `#[cfg(debug_assertions)]`, tracks every insertion and flags duplicates. After the collect/analyze phase completes, `verify_coherence` walks every stored `Item` body with an `AllocIdCollector` visitor and asserts that each referenced `AllocId` exists in the map. This catches both "walked a stale body" (missing ids) and "walked the same body twice" (duplicate insertions) at dev time; zero cost in release builds. ### Other things that fell out of this - Static items now store their body in `MonoItemKind::MonoItemStatic` (collected once in `mk_item`), so the analysis phase never goes back to rustc for static bodies - `get_item_details` takes the pre-collected body as a parameter instead of calling `inst.body()` independently - The `items_clone` full HashMap clone is replaced by a `HashSet` of original item names (which is all the static fixup actually needed) - [we uncovered and fixed a very old bug](#121 (comment)) ### What's deleted `InternedValueCollector`, `UnevaluatedConstCollector`, `collect_interned_values`, `collect_unevaluated_constant_items`, the `InternedValues` type alias, and `items_clone`. Good riddance. ### Downstream impact The tighter allocs representation has already shown positive downstream effects in KMIR: the proof engine can now decode allocations inline (resolving to concrete values like `StringVal("123")`) instead of deferring them as opaque thunks. @dkcumming 's `offset-u8` test went from thunking through `#decodeConstant(constantKindAllo...)` to directly producing `toAlloc(allocId(0)), StringVal("123")`. The test's expected output needed updating, but the new failure mode is semantically grounded in actual data rather than deferred interpretation. ### Test plan - [x] `cargo build` compiles - [x] `cargo clippy` clean - [x] `cargo fmt` clean - [x] `make integration-test` passes (all 28 tests, identical output) - [x] KMIR downstream: `test_prove_rs[offset-u8-fail]` expected output updated

Avoid inst.body() duplicate call by carrying collected items around

f903f21

dkcumming requested review from jberthold, mariaKt, palinatolmach and tothtamas28 February 19, 2026 15:21

dkcumming self-assigned this Feb 19, 2026

palinatolmach approved these changes Feb 20, 2026

View reviewed changes

dkcumming added the automerge label Feb 20, 2026

automergerpr-permission-manager bot merged commit 9a78109 into master Feb 20, 2026
5 checks passed

automergerpr-permission-manager bot deleted the dc/alloc-id-fix branch February 20, 2026 05:40

cds-amal mentioned this pull request Feb 20, 2026

Single-pass body analysis with AllocMap coherence checks #121

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid `inst.body()` duplicate call which reallocates `AllocId`s#120

Avoid `inst.body()` duplicate call which reallocates `AllocId`s#120
automergerpr-permission-manager[bot] merged 1 commit intomasterfrom
dc/alloc-id-fix

dkcumming commented Feb 19, 2026

Uh oh!

cds-amal commented Feb 19, 2026

Uh oh!

dkcumming commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dkcumming commented Feb 19, 2026

Problem

Cause

Solution

Uh oh!

cds-amal commented Feb 19, 2026

Uh oh!

dkcumming commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants