perf: ⚡️ Speed up method MetadataMapping.resolve_selection by 72%#3490
Draft
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Draft
perf: ⚡️ Speed up method MetadataMapping.resolve_selection by 72%#3490codeflash-ai[bot] wants to merge 1 commit intomainfrom
MetadataMapping.resolve_selection by 72%#3490codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **71% speedup** (3.10ms → 1.81ms) by introducing **memoization** to eliminate redundant recursive computations in the `resolve_selection()` method.
**What changed:**
The optimization replaces the recursive `_breadcrumb_is_selected()` method with an inline `compute()` function that uses a cache dictionary. When resolving selection for deeply nested breadcrumb hierarchies, the original code repeatedly computed the same parent breadcrumb selections multiple times.
**Why it's faster:**
In the original implementation, each breadcrumb resolution required traversing up the parent chain recursively. For nested structures like `('properties', 'parent', 'properties', 'child')`, the parent breadcrumb `('properties', 'parent')` would be evaluated multiple times if multiple children existed. The line profiler shows `_breadcrumb_is_selected()` was called 15,492 times with 38.6ms total time in the original version, but only once (1μs) in the optimized version.
With memoization, each breadcrumb's selection is computed exactly once and cached. Subsequent lookups are instant dictionary retrievals. This is particularly effective for:
- **Deep nesting**: Tests with 10-level depth show 360% speedup (156μs → 34.0μs)
- **Large flat structures**: 1000 properties show 74.3% speedup (401μs → 230μs)
- **Complex parent-child relationships**: 50 parents with 10 children each show 118% speedup (310μs → 142μs)
**How it impacts workloads:**
Schema resolution with nested properties (common in JSON schemas) benefits most. The test results show consistent speedups across all scale tests, with the largest gains in deeply nested structures. The optimization maintains identical correctness (all tests pass with same results) while dramatically reducing redundant work.
**Trade-offs:**
Small overhead (~100-200ns) for trivial cases with 1-2 breadcrumbs due to cache initialization, but this is negligible compared to the massive gains for realistic workloads with dozens or hundreds of properties.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3490 +/- ##
==========================================
- Coverage 94.39% 94.28% -0.12%
==========================================
Files 71 71
Lines 5867 5911 +44
Branches 727 738 +11
==========================================
+ Hits 5538 5573 +35
- Misses 231 238 +7
- Partials 98 100 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
MetadataMapping.resolve_selection by 72%MetadataMapping.resolve_selection by 72%
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 72% (0.72x) speedup for
MetadataMapping.resolve_selectioninsinger_sdk/singerlib/catalog.py⏱️ Runtime :
3.10 milliseconds→1.81 milliseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 71% speedup (3.10ms → 1.81ms) by introducing memoization to eliminate redundant recursive computations in the
resolve_selection()method.What changed:
The optimization replaces the recursive
_breadcrumb_is_selected()method with an inlinecompute()function that uses a cache dictionary. When resolving selection for deeply nested breadcrumb hierarchies, the original code repeatedly computed the same parent breadcrumb selections multiple times.Why it's faster:
In the original implementation, each breadcrumb resolution required traversing up the parent chain recursively. For nested structures like
('properties', 'parent', 'properties', 'child'), the parent breadcrumb('properties', 'parent')would be evaluated multiple times if multiple children existed. The line profiler shows_breadcrumb_is_selected()was called 15,492 times with 38.6ms total time in the original version, but only once (1μs) in the optimized version.With memoization, each breadcrumb's selection is computed exactly once and cached. Subsequent lookups are instant dictionary retrievals. This is particularly effective for:
How it impacts workloads:
Schema resolution with nested properties (common in JSON schemas) benefits most. The test results show consistent speedups across all scale tests, with the largest gains in deeply nested structures. The optimization maintains identical correctness (all tests pass with same results) while dramatically reducing redundant work.
Trade-offs:
Small overhead (~100-200ns) for trivial cases with 1-2 breadcrumbs due to cache initialization, but this is negligible compared to the massive gains for realistic workloads with dozens or hundreds of properties.
✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
singerlib/test_catalog.py::test_metadata_mapping🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-MetadataMapping.resolve_selection-mlr5oifgand push.