You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Surfaced during the design re-read of PR #304. Wes and Claude discussed and agreed this is a real invariant mismatch but not a blocker for #304 because SQLite's single-writer model prevents in-process divergence today. Filing as a follow-up because the gap matters more once Postgres or cross-process writers are part of the supported deployment.
The mismatch
The schema declares account_identities with raw-bytes equality:
The code (internal/store/identifier_match.go) treats email-shaped identifiers case-insensitively via LOWER(address) = LOWER(?) while preserving original case on storage. Non-email identifiers (Matrix MXIDs, phone E.164, chat handles) are intentionally case-sensitive.
The PK enforces raw-bytes uniqueness but the application logic enforces case-folded uniqueness for emails. Two paths can produce divergent rows for the same logical identity:
Row exists for Alice@Example.com. Concurrent process inserts alice@example.com. Both AddAccountIdentity and MigrateLegacyIdentityConfig do SELECT … WHERE LOWER(address) = LOWER(?), find no row, then INSERT. PK is on raw bytes — both inserts succeed.
Result: two account_identities rows for the same logical identity. mergeSignalSet semantics break (signals attached to one row are invisible from the other). RemoveAccountIdentity finds both via the case-fold predicate, but only by accident.
The single-writer comment at account_identities.go:37-39 acknowledges this is only safe in-process. serve mode plus a parallel CLI command — the documented multi-process deployment — is exactly where divergence becomes possible.
The cleanup commit 28853fe (case-insensitive remove for email-shaped identifiers) handles the symptom on remove but doesn't prevent re-accumulation.
Long-running serve deployments running alongside CLI dedup jobs are exactly the workflow most likely to trip this.
Proposed approach
Two reasonable shapes; both are bigger than a one-line patch:
Normalized comparison key column. Add address_key TEXT NOT NULL storing NormalizeIdentifierForCompare(address), make the PK (source_id, address_key), keep address as the display value. Backfill migration on schema upgrade. Cleanest invariant.
Partial expression index. SQLite supports CREATE UNIQUE INDEX … WHERE looksLikeEmail(address) but looksLikeEmail is Go, not SQL. Either reproduce the predicate in SQL (address LIKE '%@%.%' is close but not identical) or split rows into two tables (email vs. other). Less invasive than (1) but the SQL predicate divergence is its own future trap.
Recommend (1). Do not slap UNIQUE(source_id, lower(address)) on all rows — non-email identifiers must keep case sensitivity.
Context
Surfaced during the design re-read of PR #304. Wes and Claude discussed and agreed this is a real invariant mismatch but not a blocker for #304 because SQLite's single-writer model prevents in-process divergence today. Filing as a follow-up because the gap matters more once Postgres or cross-process writers are part of the supported deployment.
The mismatch
The schema declares
account_identitieswith raw-bytes equality:The code (
internal/store/identifier_match.go) treats email-shaped identifiers case-insensitively viaLOWER(address) = LOWER(?)while preserving original case on storage. Non-email identifiers (Matrix MXIDs, phone E.164, chat handles) are intentionally case-sensitive.The PK enforces raw-bytes uniqueness but the application logic enforces case-folded uniqueness for emails. Two paths can produce divergent rows for the same logical identity:
Alice@Example.com. Concurrent process insertsalice@example.com. BothAddAccountIdentityandMigrateLegacyIdentityConfigdoSELECT … WHERE LOWER(address) = LOWER(?), find no row, thenINSERT. PK is on raw bytes — both inserts succeed.account_identitiesrows for the same logical identity.mergeSignalSetsemantics break (signals attached to one row are invisible from the other).RemoveAccountIdentityfinds both via the case-fold predicate, but only by accident.The single-writer comment at
account_identities.go:37-39acknowledges this is only safe in-process.servemode plus a parallel CLI command — the documented multi-process deployment — is exactly where divergence becomes possible.Why it matters
28853fe(case-insensitive remove for email-shaped identifiers) handles the symptom on remove but doesn't prevent re-accumulation.servedeployments running alongside CLI dedup jobs are exactly the workflow most likely to trip this.Proposed approach
Two reasonable shapes; both are bigger than a one-line patch:
address_key TEXT NOT NULLstoringNormalizeIdentifierForCompare(address), make the PK(source_id, address_key), keepaddressas the display value. Backfill migration on schema upgrade. Cleanest invariant.CREATE UNIQUE INDEX … WHERE looksLikeEmail(address)butlooksLikeEmailis Go, not SQL. Either reproduce the predicate in SQL (address LIKE '%@%.%'is close but not identical) or split rows into two tables (email vs. other). Less invasive than (1) but the SQL predicate divergence is its own future trap.Recommend (1). Do not slap
UNIQUE(source_id, lower(address))on all rows — non-email identifiers must keep case sensitivity.