Implement negative structural evidence by claude[bot] · Pull Request #24 · resolveworks/worldgraph

claude · 2026-03-07T16:19:20Z

Summary

Add dampened negative factor to similarity propagation that penalizes entity pairs whose functional neighbors have no matching counterpart, preventing false merges between identically-named entities in different structural contexts
Key design: weighted average mismatch with binary relation threshold, max(neg_a, neg_b) bilateral factor, name-similarity seed for inner match check, and separate positive_base tracking for monotone structural propagation
Removes xfail markers from test_identical_names_different_contexts_no_merge and test_similar_names_disjoint_neighborhoods_no_match; all 54 tests pass

Closes #23

Test plan

All 54 existing tests pass (uv run pytest)
test_identical_names_different_contexts_no_merge — two different "Dr. James Chen" in unrelated labs no longer merge
test_similar_names_disjoint_neighborhoods_no_match — "Elena Vasquez" / "Lena Vasquez" with disjoint neighborhoods no longer match
Positive-evidence monotonicity preserved (renamed test with neg disabled)
Existing merge scenarios (Sharma name variations, shared entities across clusters, synonym relations) still pass

🤖 Generated with Claude Code

Add dampened negative factor that penalizes entity pairs whose functional neighbors have no matching counterpart, preventing false merges between entities with identical names but different structural contexts (e.g. two different "Dr. James Chen" in unrelated research labs). Key design decisions: - Weighted average mismatch (not per-path product) to distinguish partial vs full neighborhood mismatch - Binary relation similarity threshold (0.8) to prevent synonym cascade - max(neg_a, neg_b) bilateral factor (charitable for incomplete coverage) - Name-similarity seed for inner match check (prevents circular reinforcement) - Separate positive_base tracking for monotone structural propagation Removes xfail markers from test_identical_names_different_contexts_no_merge and test_similar_names_disjoint_neighborhoods_no_match. Closes #23 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude

Reviewed the full diff and source files. No issues found — approving.

What I checked:

Negative factor correctness: name_seed is fixed (prevents circular reinforcement), positive_base is monotone non-decreasing, and since neg is constant per pair (computed from fixed name_seed), combined = base * neg is monotone after the initial drop. The abs(combined - old) > epsilon change correctly handles that initial drop.
Forward vs inverse functionality swap between _build_weighted_adjacency and _build_forward_adjacency is correct per the PARIS design in negative_evidence.md.
max(neg_a, neg_b) bilateral strategy is well-motivated for asymmetric news coverage.
Test changes: xfail removals correct, NovaTech assertion relaxed with clear justification, monotonicity test correctly isolates positive propagation via neg_alpha=0.0, dead anchor variable and unused import pytest cleaned up.
No dead code, no shims, no half-finished pieces. Reads as if negative evidence was always part of the design.

LGTM.

monneyboi

The negative evidence implementation reintroduces a binary relation similarity threshold (rel_threshold=0.8 in _one_sided_negative), but positive propagation still uses continuous rel_sim as a multiplier. This creates an inconsistent mental model: functionality pooling and negative evidence treat relations as "same or not" via a threshold, while positive propagation treats similarity as a continuous weight.

Requested change: Unify to a single rel_threshold parameter on match_graphs that flows to all three consumers:

compute_functionality — pool relations above threshold (already does this)
propagate_similarity (positive) — gate relation pairs: if rel_sim >= threshold, treat as same relation and propagate (without multiplying by rel_sim); below threshold, skip entirely. This reverts the "initial nongated" change from 695408b.
compute_negative_factor — check relation match using the same threshold (already does this, but hardcoded)

The mental model becomes: the threshold defines equivalence classes over free-text relations — essentially reducing them to a discrete enum. Above threshold = same relation, below = different. One threshold, one concept, used everywhere.

This means:

Remove the separate rel_threshold param from compute_negative_factor / _one_sided_negative
Add a rel_threshold param to propagate_similarity
In the propagation loop, change strength_sum += rs * weight * neighbor_confidence to just strength_sum += weight * neighbor_confidence (only reached when rs >= rel_threshold)
match_graphs passes the single rel_threshold (currently rel_cluster_threshold) to all three functions

Replace the inconsistent mix of continuous rel_sim multiplier (positive propagation) and binary threshold (functionality pooling, negative evidence) with a single rel_threshold that defines relation equivalence classes throughout: above threshold = same relation, below = different. - Add rel_threshold param to propagate_similarity, gate relation pairs (skip if below threshold, no continuous multiplier) - Remove hardcoded default from compute_negative_factor/_one_sided_negative, receive threshold from caller via keyword-only param - Wire rel_cluster_threshold from match_graphs to all three consumers - Strengthen three-source name variation test: add 2 more shared neighbors to compensate for reduced structural signal from gating (3 name variants reduce functionality weights to 1/3, need 5 neighbors for sufficient evidence) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude

Reviewed the full diff (both commits) against match.py, the design doc, and the prior review. The threshold unification from monneyboi's feedback is fully addressed. No issues found.

What I checked:

Threshold unification complete: rel_cluster_threshold flows from match_graphs → propagate_similarity → compute_negative_factor / _one_sided_negative. Positive propagation gates with rs < rel_threshold and drops the continuous rs * multiplier. One concept of relation equivalence throughout — functionality pooling, positive gating, negative evidence all use the same threshold.
Convergence is guaranteed despite abs() change: positive_base is monotone non-decreasing, and the negative factor is computed from name_seed (fixed), not propagated confidence. So combined = base * neg converges once base does — no oscillation possible. The abs(combined - old) > epsilon correctly detects the initial drop when negative evidence first activates.
Forward vs inverse functionality swap between _build_weighted_adjacency and _build_forward_adjacency is correct per PARIS semantics: positive evidence uses inverse functionality (target match → source match), negative evidence uses forward functionality (source should have specific target).
max(neg_a, neg_b) bilateral strategy is well-motivated — asymmetric news coverage means missing neighbors on one side shouldn't compound.
NovaTech assertion relaxation is correct: with one neighbor (founder) that doesn't match and forward functionality ~1.0, score = 1.0 × 0.7 = 0.7 < 0.8 is the right behavior — identical names with contradicting functional neighbors should be suppressed. The test's primary assertion (sharma ≠ vasquez) still holds.
No dead code, no shims, no half-finished pieces. Clean integration.

LGTM — no issues to flag.

claude bot commented Mar 7, 2026

View reviewed changes

monneyboi requested changes Mar 12, 2026

View reviewed changes

claude bot commented Mar 12, 2026

View reviewed changes

monneyboi merged commit f309e3f into main Mar 12, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement negative structural evidence#24

Implement negative structural evidence#24
monneyboi merged 2 commits intomainfrom
implement-negative-evidence

claude bot commented Mar 7, 2026

Uh oh!

claude bot left a comment

Uh oh!

monneyboi left a comment

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude bot commented Mar 7, 2026

Summary

Test plan

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

monneyboi left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant