Skip to content

Implement negative structural evidence#24

Merged
monneyboi merged 2 commits intomainfrom
implement-negative-evidence
Mar 12, 2026
Merged

Implement negative structural evidence#24
monneyboi merged 2 commits intomainfrom
implement-negative-evidence

Conversation

@claude
Copy link

@claude claude bot commented Mar 7, 2026

Summary

  • Add dampened negative factor to similarity propagation that penalizes entity pairs whose functional neighbors have no matching counterpart, preventing false merges between identically-named entities in different structural contexts
  • Key design: weighted average mismatch with binary relation threshold, max(neg_a, neg_b) bilateral factor, name-similarity seed for inner match check, and separate positive_base tracking for monotone structural propagation
  • Removes xfail markers from test_identical_names_different_contexts_no_merge and test_similar_names_disjoint_neighborhoods_no_match; all 54 tests pass

Closes #23

Test plan

  • All 54 existing tests pass (uv run pytest)
  • test_identical_names_different_contexts_no_merge — two different "Dr. James Chen" in unrelated labs no longer merge
  • test_similar_names_disjoint_neighborhoods_no_match — "Elena Vasquez" / "Lena Vasquez" with disjoint neighborhoods no longer match
  • Positive-evidence monotonicity preserved (renamed test with neg disabled)
  • Existing merge scenarios (Sharma name variations, shared entities across clusters, synonym relations) still pass

🤖 Generated with Claude Code

Add dampened negative factor that penalizes entity pairs whose functional
neighbors have no matching counterpart, preventing false merges between
entities with identical names but different structural contexts (e.g. two
different "Dr. James Chen" in unrelated research labs).

Key design decisions:
- Weighted average mismatch (not per-path product) to distinguish partial
  vs full neighborhood mismatch
- Binary relation similarity threshold (0.8) to prevent synonym cascade
- max(neg_a, neg_b) bilateral factor (charitable for incomplete coverage)
- Name-similarity seed for inner match check (prevents circular reinforcement)
- Separate positive_base tracking for monotone structural propagation

Removes xfail markers from test_identical_names_different_contexts_no_merge
and test_similar_names_disjoint_neighborhoods_no_match.

Closes #23

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Author

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff and source files. No issues found — approving.

What I checked:

  • Negative factor correctness: name_seed is fixed (prevents circular reinforcement), positive_base is monotone non-decreasing, and since neg is constant per pair (computed from fixed name_seed), combined = base * neg is monotone after the initial drop. The abs(combined - old) > epsilon change correctly handles that initial drop.
  • Forward vs inverse functionality swap between _build_weighted_adjacency and _build_forward_adjacency is correct per the PARIS design in negative_evidence.md.
  • max(neg_a, neg_b) bilateral strategy is well-motivated for asymmetric news coverage.
  • Test changes: xfail removals correct, NovaTech assertion relaxed with clear justification, monotonicity test correctly isolates positive propagation via neg_alpha=0.0, dead anchor variable and unused import pytest cleaned up.
  • No dead code, no shims, no half-finished pieces. Reads as if negative evidence was always part of the design.

LGTM.

Copy link
Contributor

@monneyboi monneyboi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative evidence implementation reintroduces a binary relation similarity threshold (rel_threshold=0.8 in _one_sided_negative), but positive propagation still uses continuous rel_sim as a multiplier. This creates an inconsistent mental model: functionality pooling and negative evidence treat relations as "same or not" via a threshold, while positive propagation treats similarity as a continuous weight.

Requested change: Unify to a single rel_threshold parameter on match_graphs that flows to all three consumers:

  1. compute_functionality — pool relations above threshold (already does this)
  2. propagate_similarity (positive) — gate relation pairs: if rel_sim >= threshold, treat as same relation and propagate (without multiplying by rel_sim); below threshold, skip entirely. This reverts the "initial nongated" change from 695408b.
  3. compute_negative_factor — check relation match using the same threshold (already does this, but hardcoded)

The mental model becomes: the threshold defines equivalence classes over free-text relations — essentially reducing them to a discrete enum. Above threshold = same relation, below = different. One threshold, one concept, used everywhere.

This means:

  • Remove the separate rel_threshold param from compute_negative_factor / _one_sided_negative
  • Add a rel_threshold param to propagate_similarity
  • In the propagation loop, change strength_sum += rs * weight * neighbor_confidence to just strength_sum += weight * neighbor_confidence (only reached when rs >= rel_threshold)
  • match_graphs passes the single rel_threshold (currently rel_cluster_threshold) to all three functions

Replace the inconsistent mix of continuous rel_sim multiplier (positive
propagation) and binary threshold (functionality pooling, negative
evidence) with a single rel_threshold that defines relation equivalence
classes throughout: above threshold = same relation, below = different.

- Add rel_threshold param to propagate_similarity, gate relation pairs
  (skip if below threshold, no continuous multiplier)
- Remove hardcoded default from compute_negative_factor/_one_sided_negative,
  receive threshold from caller via keyword-only param
- Wire rel_cluster_threshold from match_graphs to all three consumers
- Strengthen three-source name variation test: add 2 more shared neighbors
  to compensate for reduced structural signal from gating (3 name variants
  reduce functionality weights to 1/3, need 5 neighbors for sufficient
  evidence)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Author

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff (both commits) against match.py, the design doc, and the prior review. The threshold unification from monneyboi's feedback is fully addressed. No issues found.

What I checked:

  • Threshold unification complete: rel_cluster_threshold flows from match_graphspropagate_similaritycompute_negative_factor / _one_sided_negative. Positive propagation gates with rs < rel_threshold and drops the continuous rs * multiplier. One concept of relation equivalence throughout — functionality pooling, positive gating, negative evidence all use the same threshold.
  • Convergence is guaranteed despite abs() change: positive_base is monotone non-decreasing, and the negative factor is computed from name_seed (fixed), not propagated confidence. So combined = base * neg converges once base does — no oscillation possible. The abs(combined - old) > epsilon correctly detects the initial drop when negative evidence first activates.
  • Forward vs inverse functionality swap between _build_weighted_adjacency and _build_forward_adjacency is correct per PARIS semantics: positive evidence uses inverse functionality (target match → source match), negative evidence uses forward functionality (source should have specific target).
  • max(neg_a, neg_b) bilateral strategy is well-motivated — asymmetric news coverage means missing neighbors on one side shouldn't compound.
  • NovaTech assertion relaxation is correct: with one neighbor (founder) that doesn't match and forward functionality ~1.0, score = 1.0 × 0.7 = 0.7 < 0.8 is the right behavior — identical names with contradicting functional neighbors should be suppressed. The test's primary assertion (sharma ≠ vasquez) still holds.
  • No dead code, no shims, no half-finished pieces. Clean integration.

LGTM — no issues to flag.

@monneyboi monneyboi merged commit f309e3f into main Mar 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement negative structural evidence

1 participant