feat(cll): client-side CLL filtering with full map API by danyelf · Pull Request #1244 · DataRecce/recce

danyelf · 2026-03-25T06:57:29Z

Problem

Every time a user clicks a node or column in the CLL (Column-Level Lineage) view, the frontend fires a separate API call to the backend. Each call computes CLL for that specific node — involving SQL parsing, dependency resolution, and manifest traversal. On large projects this creates noticeable latency on every click, and the UX feels sluggish as users explore their lineage graph.

Additionally, we discovered that get_cll_cached() was calling as_manifest() on every invocation, which converts a WritableManifest to a Manifest by deserializing every node in the manifest through mashumaro. On a 1200-model project, this made CLL computation take 11+ minutes for a full map request.

Solution

One API call, instant navigation

A new full_map parameter on the CLL API tells the backend to compute CLL for all manifest nodes in a single request. The frontend caches this full map and uses a pure client-side function (sliceCllMap) to extract exactly the nodes/columns/parent_map needed for any given view — impact overview, node-level, or column-level.

After the initial load, every subsequent click (changing selected node, drilling into a column, toggling upstream/downstream) is instant — zero API calls, pure JavaScript object slicing.

sliceCllMap faithfully replicates the backend's anchor + BFS logic:

Impact overview: finds changed nodes, builds anchors per change category, BFS to reachable set
Node-level with change_analysis: anchors on changed columns (partial_breaking), node itself (breaking/unknown), or just the node_id
Column-level: BFS from anchor column through parent/child maps

385x performance fix

Replaced as_manifest(self.get_manifest(base)) inside get_cll_cached() with the already-deserialized self.manifest. The old code was converting the entire WritableManifest → Manifest (triggering mashumaro __mashumaro_from_dict__ for every node in the manifest) on every single node in the CLL loop. For a 1200-model project with ~1800 nodes, that's 1800 full-manifest deserializations.

Also replaced deepcopy() of the entire CLL result with targeted shallow copy() of only the node/column objects whose change_status fields get mutated, and bumped lru_cache from 128 → 4096 to avoid cache thrashing on large projects.

Perf results (1207-model anonymized project)

Call	Before	After	Speedup
full_map (cold)	694s (11.5 min)	1.8s	385x
full_map (warm)	713s (11.9 min)	0.23s	3100x
impact overview	233s (3.9 min)	0.43s	542x

Results verified identical (0 diffs across 1796 nodes, 15520 columns, 17316 parent_map entries).

Changes

Backend

server.py: full_map param on CllIn, passed as no_filter to get_cll()
dbt_adapter/__init__.py:
- When no_filter=True and node_id=None, computes CLL for ALL manifest nodes
- as_manifest() → self.manifest / self.previous_state.manifest
- deepcopy → targeted copy() on mutated fields only
- lru_cache(128) → lru_cache(4096)

Frontend

sliceCllMap.ts: Pure function — BFS from anchors, buildSlice for shallow-clone + filtering
LineageViewOss.tsx: fullCllMapRef caches full map, fetchAndCacheFullMap helper, cache invalidation guard
cll.ts: full_map field on CllInput

Tests

76 equivalence tests comparing sliceCllMap(fullMap, params) against real server responses
- 19 no-change tests, 1 impact overview, 7 node-level, 33 column-level, 16 column-level diff
- Covers non_breaking, partial_breaking, breaking changes + downstream/impacted nodes
Updated _set_compiled_code test helper to patch both WritableManifest and Manifest

Test plan

Backend CLL tests: 23 passed
Frontend tests: 3607 passed
Type check: clean
Perf verified on anonymized 1207-model project
Manual testing in browser against jaffle_shop

Checklist

Signed-off commits (DCO)
Tests added/updated

🤖 Generated with Claude Code

Pure function that replicates the backend's anchor + BFS filtering on a full CLL map, enabling client-side filtering without per-click API calls. Handles three modes: - Impact overview (no params): returns full map unchanged - Node-level: anchors based on change_analysis (changed columns as BFS seeds for partial_breaking, node itself for breaking/unknown) - Column-level: BFS from anchor column through parent/child maps Includes 45 tests: 15 unit tests + 19 no-change equivalence tests + 11 diff equivalence tests covering non_breaking (added column), partial_breaking (modified column def), and breaking (WHERE clause). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

One API call fetches the complete CLL map (all nodes with column lineage and change analysis). All subsequent navigation — impact overview, node clicks, column clicks — slices client-side via sliceCllMap() with zero additional API calls. Backend changes: - Add full_map param to CllIn, passed as no_filter to get_cll() - When full_map=true and no node_id, compute CLL for ALL manifest nodes (not just changed nodes) so the frontend can slice any path Frontend changes: - LineageViewOss: cache full map in ref, slice client-side on clicks, invalidate only on external lineageGraph changes (not our own patch) - sliceCllMap: add impact overview filtering (changed node anchors + BFS), fix upstream/downstream BFS cross-contamination by using separate visited sets per direction (bfsFromAnchors) - Full map request uses only {change_analysis, full_map} — no node_id or directional params 76 equivalence tests (impact + 7 nodes × 5 cols each) confirm client-side slicing matches server responses exactly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

- buildSlice: shallow-clone nodes, set impacted flag (true for BFS-reachable, false for extra nodes), filter node.columns to only include columns in the reachable set - assertEquivalent: now checks impacted and node.columns per node - Extract fetchAndCacheFullMap helper in LineageViewOss (was duplicated in useLayoutEffect and refreshLayout) - Remove 8 orphaned diff fixtures (no-upstream/no-downstream variants and unreferenced impacted node fixtures) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

as_manifest() was called inside get_cll_cached() for every node, converting WritableManifest→Manifest and triggering mashumaro deserialization of ALL manifest nodes each time. On a 1200-model project this made full_map CLL take 11+ minutes. Fix: use the already-deserialized self.manifest / self.previous_state.manifest instead of re-converting per node. Also replace deepcopy with shallow copy of only the nodes/columns that get mutated (change_status fields), and increase lru_cache from 128 to 4096 to cover large projects. Result on 1207-model anonymized project: - full_map cold: 694s → 1.8s (385x faster) - full_map warm: 713s → 0.23s (3100x faster) - impact overview: 233s → 0.43s (542x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>

danyelf and others added 4 commits March 24, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cll): client-side CLL filtering with full map API#1244

feat(cll): client-side CLL filtering with full map API#1244
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-frontend

danyelf commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danyelf commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

One API call, instant navigation

385x performance fix

Perf results (1207-model anonymized project)

Changes

Backend

Frontend

Tests

Test plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danyelf commented Mar 25, 2026 •

edited

Loading