Skip to content

feat(cll): client-side CLL filtering with full map API#1244

Draft
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-frontend
Draft

feat(cll): client-side CLL filtering with full map API#1244
danyelf wants to merge 4 commits intomainfrom
worktree-unified-cll-frontend

Conversation

@danyelf
Copy link
Contributor

@danyelf danyelf commented Mar 25, 2026

Problem

Every time a user clicks a node or column in the CLL (Column-Level Lineage) view, the frontend fires a separate API call to the backend. Each call computes CLL for that specific node — involving SQL parsing, dependency resolution, and manifest traversal. On large projects this creates noticeable latency on every click, and the UX feels sluggish as users explore their lineage graph.

Additionally, we discovered that get_cll_cached() was calling as_manifest() on every invocation, which converts a WritableManifest to a Manifest by deserializing every node in the manifest through mashumaro. On a 1200-model project, this made CLL computation take 11+ minutes for a full map request.

Solution

One API call, instant navigation

A new full_map parameter on the CLL API tells the backend to compute CLL for all manifest nodes in a single request. The frontend caches this full map and uses a pure client-side function (sliceCllMap) to extract exactly the nodes/columns/parent_map needed for any given view — impact overview, node-level, or column-level.

After the initial load, every subsequent click (changing selected node, drilling into a column, toggling upstream/downstream) is instant — zero API calls, pure JavaScript object slicing.

sliceCllMap faithfully replicates the backend's anchor + BFS logic:

  • Impact overview: finds changed nodes, builds anchors per change category, BFS to reachable set
  • Node-level with change_analysis: anchors on changed columns (partial_breaking), node itself (breaking/unknown), or just the node_id
  • Column-level: BFS from anchor column through parent/child maps

385x performance fix

Replaced as_manifest(self.get_manifest(base)) inside get_cll_cached() with the already-deserialized self.manifest. The old code was converting the entire WritableManifest → Manifest (triggering mashumaro __mashumaro_from_dict__ for every node in the manifest) on every single node in the CLL loop. For a 1200-model project with ~1800 nodes, that's 1800 full-manifest deserializations.

Also replaced deepcopy() of the entire CLL result with targeted shallow copy() of only the node/column objects whose change_status fields get mutated, and bumped lru_cache from 128 → 4096 to avoid cache thrashing on large projects.

Perf results (1207-model anonymized project)

Call Before After Speedup
full_map (cold) 694s (11.5 min) 1.8s 385x
full_map (warm) 713s (11.9 min) 0.23s 3100x
impact overview 233s (3.9 min) 0.43s 542x

Results verified identical (0 diffs across 1796 nodes, 15520 columns, 17316 parent_map entries).

Changes

Backend

  • server.py: full_map param on CllIn, passed as no_filter to get_cll()
  • dbt_adapter/__init__.py:
    • When no_filter=True and node_id=None, computes CLL for ALL manifest nodes
    • as_manifest()self.manifest / self.previous_state.manifest
    • deepcopy → targeted copy() on mutated fields only
    • lru_cache(128)lru_cache(4096)

Frontend

  • sliceCllMap.ts: Pure function — BFS from anchors, buildSlice for shallow-clone + filtering
  • LineageViewOss.tsx: fullCllMapRef caches full map, fetchAndCacheFullMap helper, cache invalidation guard
  • cll.ts: full_map field on CllInput

Tests

  • 76 equivalence tests comparing sliceCllMap(fullMap, params) against real server responses
    • 19 no-change tests, 1 impact overview, 7 node-level, 33 column-level, 16 column-level diff
    • Covers non_breaking, partial_breaking, breaking changes + downstream/impacted nodes
  • Updated _set_compiled_code test helper to patch both WritableManifest and Manifest

Test plan

  • Backend CLL tests: 23 passed
  • Frontend tests: 3607 passed
  • Type check: clean
  • Perf verified on anonymized 1207-model project
  • Manual testing in browser against jaffle_shop

Checklist

  • Signed-off commits (DCO)
  • Tests added/updated

🤖 Generated with Claude Code

danyelf and others added 4 commits March 24, 2026 14:45
Pure function that replicates the backend's anchor + BFS filtering
on a full CLL map, enabling client-side filtering without per-click
API calls. Handles three modes:

- Impact overview (no params): returns full map unchanged
- Node-level: anchors based on change_analysis (changed columns as
  BFS seeds for partial_breaking, node itself for breaking/unknown)
- Column-level: BFS from anchor column through parent/child maps

Includes 45 tests: 15 unit tests + 19 no-change equivalence tests +
11 diff equivalence tests covering non_breaking (added column),
partial_breaking (modified column def), and breaking (WHERE clause).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
One API call fetches the complete CLL map (all nodes with column
lineage and change analysis). All subsequent navigation — impact
overview, node clicks, column clicks — slices client-side via
sliceCllMap() with zero additional API calls.

Backend changes:
- Add full_map param to CllIn, passed as no_filter to get_cll()
- When full_map=true and no node_id, compute CLL for ALL manifest
  nodes (not just changed nodes) so the frontend can slice any path

Frontend changes:
- LineageViewOss: cache full map in ref, slice client-side on clicks,
  invalidate only on external lineageGraph changes (not our own patch)
- sliceCllMap: add impact overview filtering (changed node anchors +
  BFS), fix upstream/downstream BFS cross-contamination by using
  separate visited sets per direction (bfsFromAnchors)
- Full map request uses only {change_analysis, full_map} — no node_id
  or directional params

76 equivalence tests (impact + 7 nodes × 5 cols each) confirm
client-side slicing matches server responses exactly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
- buildSlice: shallow-clone nodes, set impacted flag (true for
  BFS-reachable, false for extra nodes), filter node.columns to
  only include columns in the reachable set
- assertEquivalent: now checks impacted and node.columns per node
- Extract fetchAndCacheFullMap helper in LineageViewOss (was duplicated
  in useLayoutEffect and refreshLayout)
- Remove 8 orphaned diff fixtures (no-upstream/no-downstream variants
  and unreferenced impacted node fixtures)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
as_manifest() was called inside get_cll_cached() for every node,
converting WritableManifest→Manifest and triggering mashumaro
deserialization of ALL manifest nodes each time. On a 1200-model
project this made full_map CLL take 11+ minutes.

Fix: use the already-deserialized self.manifest / self.previous_state.manifest
instead of re-converting per node. Also replace deepcopy with shallow copy
of only the nodes/columns that get mutated (change_status fields), and
increase lru_cache from 128 to 4096 to cover large projects.

Result on 1207-model anonymized project:
- full_map cold: 694s → 1.8s (385x faster)
- full_map warm: 713s → 0.23s (3100x faster)
- impact overview: 233s → 0.43s (542x faster)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant