Multi-label entities: use max similarity across all names during propagation

## Problem

After a merge, a merged entity has multiple names in its occurrence pool (e.g. "Meridian Technologies" and "Meridian Tech"), but only the most-frequent canonical name is stored on `Entity.name` and embedded. The other names are silently dropped.

In `prepare_embeddings`, only the canonical name is embedded:
```python
all_names = sorted({e.name for g in graphs for e in g.entities.values()})
```

In `propagate`, similarity is initialised from that single name:
```python
name_a = graph_a.entities[eid_a].name
emb_a = name_embeddings.get(name_a)
```

This means if an already-merged entity's canonical name is not the closest match to a name in another graph, similarity is under-estimated.

## Correct behaviour

For entities with multiple known names, the initial similarity between two entities should be:

```
max(cosine_sim(emb_a_i, emb_b_j) for emb_a_i in all_embs_a for emb_b_j in all_embs_b)
```

## Fix

1. Add a `names: set[str]` field to `Entity` (alongside the canonical `name`).
2. Populate it from all occurrence names at load/merge time.
3. In `prepare_embeddings`, embed all names (not just canonical ones).
4. In `propagate`, compute initial sigma as max pairwise similarity across all name embeddings for each entity pair.

## When this matters

The current pipeline runs one pass of matching on the original per-article graphs (which are always single-source, so single-name). The bug only bites if merged graphs are fed back into a second matching pass (iterative refinement). It's latent today but will silently degrade quality if iterative matching is added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-label entities: use max similarity across all names during propagation #3

Problem

Correct behaviour

Fix

When this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-label entities: use max similarity across all names during propagation #3

Description

Problem

Correct behaviour

Fix

When this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions