Context
Surfaced during the design re-read of PR #304. Not a blocker — the safety story is functionally fine — but the backup contract claims something the code doesn't quite uphold.
What happens
internal/dedup/dedup.go:243-263 documents that Scan(non-dry-run) writes rfc822_message_id to disk before the user confirms the merge:
Side effect (non-dry-run only): if the scoped sources contain messages with no rfc822_message_id but with stored MIME, Scan calls store.BackfillRFC822IDs to derive the column from the stored headers before grouping. The backfill is idempotent metadata derivation — it fills a previously-NULL column from data already on the row, never overwrites a non-NULL value, and changes no message content.
The author calls it "idempotent metadata derivation," and commit e9182d20 added a confirm-prompt note about it. The semantics are defensible on their own.
The gap is in the backup story. The doc at dedup.go:241 explicitly says:
The dedup-batch backup-before-merge contract still holds: the backup is sized for the merge (which soft-deletes losers and transfers labels), not for this metadata catch-up.
So if the user runs deduplicate --account alice, sees the report, and aborts at the confirm prompt, the rfc822 writes have already happened and are not covered by any backup. Probably harmless because the writes are derived from already-stored MIME, but the safety claim "every dedup operation is reversible via --undo or covered by a pre-merge backup" is one rung weaker than it reads.
There's also a UX angle: most CLIs train users to expect a non---dry-run invocation to be a "preview before deciding" up to the confirmation prompt. Writing during the preview phase violates that intuition.
Why it matters
Low blast radius today (the writes are derived data, no message content changes). Worth fixing if/when:
- Backfill semantics ever get extended to write something less "purely derived"
- A future user-facing safety doc claims "no writes before the merge confirmation"
Proposed approach
Move the backfill into Execute, inside the same transaction as the merge, after the user confirms. The reason the author cited (the report needs the backfilled IDs to surface duplicate groups) is real but solvable:
- Run a read-only "would backfill N messages" probe during Scan.
- Surface the count in the report as "would backfill N then find M groups (estimate; final group count after backfill may differ)."
- Realize the writes during Execute, after confirmation, inside the merge transaction. The pre-merge backup then genuinely covers everything the merge transaction touches.
Context
Surfaced during the design re-read of PR #304. Not a blocker — the safety story is functionally fine — but the backup contract claims something the code doesn't quite uphold.
What happens
internal/dedup/dedup.go:243-263documents thatScan(non-dry-run)writesrfc822_message_idto disk before the user confirms the merge:The author calls it "idempotent metadata derivation," and commit
e9182d20added a confirm-prompt note about it. The semantics are defensible on their own.The gap is in the backup story. The doc at
dedup.go:241explicitly says:So if the user runs
deduplicate --account alice, sees the report, and aborts at the confirm prompt, the rfc822 writes have already happened and are not covered by any backup. Probably harmless because the writes are derived from already-stored MIME, but the safety claim "every dedup operation is reversible via --undo or covered by a pre-merge backup" is one rung weaker than it reads.There's also a UX angle: most CLIs train users to expect a non-
--dry-runinvocation to be a "preview before deciding" up to the confirmation prompt. Writing during the preview phase violates that intuition.Why it matters
Low blast radius today (the writes are derived data, no message content changes). Worth fixing if/when:
Proposed approach
Move the backfill into
Execute, inside the same transaction as the merge, after the user confirms. The reason the author cited (the report needs the backfilled IDs to surface duplicate groups) is real but solvable: