Skip to content

perf: ⚡️ Speed up method MetadataMapping.resolve_selection by 72%#3490

Draft
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-MetadataMapping.resolve_selection-mlr5oifg
Draft

perf: ⚡️ Speed up method MetadataMapping.resolve_selection by 72%#3490
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-MetadataMapping.resolve_selection-mlr5oifg

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 17, 2026

📄 72% (0.72x) speedup for MetadataMapping.resolve_selection in singer_sdk/singerlib/catalog.py

⏱️ Runtime : 3.10 milliseconds 1.81 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 71% speedup (3.10ms → 1.81ms) by introducing memoization to eliminate redundant recursive computations in the resolve_selection() method.

What changed:
The optimization replaces the recursive _breadcrumb_is_selected() method with an inline compute() function that uses a cache dictionary. When resolving selection for deeply nested breadcrumb hierarchies, the original code repeatedly computed the same parent breadcrumb selections multiple times.

Why it's faster:
In the original implementation, each breadcrumb resolution required traversing up the parent chain recursively. For nested structures like ('properties', 'parent', 'properties', 'child'), the parent breadcrumb ('properties', 'parent') would be evaluated multiple times if multiple children existed. The line profiler shows _breadcrumb_is_selected() was called 15,492 times with 38.6ms total time in the original version, but only once (1μs) in the optimized version.

With memoization, each breadcrumb's selection is computed exactly once and cached. Subsequent lookups are instant dictionary retrievals. This is particularly effective for:

  • Deep nesting: Tests with 10-level depth show 360% speedup (156μs → 34.0μs)
  • Large flat structures: 1000 properties show 74.3% speedup (401μs → 230μs)
  • Complex parent-child relationships: 50 parents with 10 children each show 118% speedup (310μs → 142μs)

How it impacts workloads:
Schema resolution with nested properties (common in JSON schemas) benefits most. The test results show consistent speedups across all scale tests, with the largest gains in deeply nested structures. The optimization maintains identical correctness (all tests pass with same results) while dramatically reducing redundant work.

Trade-offs:
Small overhead (~100-200ns) for trivial cases with 1-2 breadcrumbs due to cache initialization, but this is negligible compared to the massive gains for realistic workloads with dozens or hundreds of properties.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 16 Passed
🌀 Generated Regression Tests 125 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
singerlib/test_catalog.py::test_metadata_mapping 7.83μs 6.83μs 14.6%✅
🌀 Click to see Generated Regression Tests
# Import the real classes from the module under test.
from __future__ import annotations

from singer_sdk.singerlib.catalog import Metadata, MetadataMapping, StreamMetadata


def test_resolve_selection_on_empty_mapping_returns_empty_selectionmask():
    # Create an empty MetadataMapping instance (no breadcrumbs added).
    mm = MetadataMapping()
    # Call resolve_selection which should return a SelectionMask instance.
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 583ns -> 791ns (26.3% slower)


def test_stream_level_selected_and_child_inherits_selected_by_default():
    # Stream-level metadata set to selected_by_default=True (using StreamMetadata)
    mm = MetadataMapping()
    stream_md = StreamMetadata()  # construct a real StreamMetadata
    stream_md.selected_by_default = True  # set selected_by_default on the stream
    mm[()] = stream_md  # assign to the empty breadcrumb (stream-level)
    # Add an explicit property breadcrumb with no explicit selected value.
    child_breadcrumb = ("properties", "child")
    mm[child_breadcrumb] = Metadata()  # child metadata with defaults (None)
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 3.92μs -> 3.62μs (8.06% faster)
    sel_dict = dict(selection)


def test_parent_false_blocks_child_even_if_child_selected_true():
    # Parent (stream) explicitly deselected
    mm = MetadataMapping()
    mm[()] = StreamMetadata()
    mm[()].selected = False  # parent deselected
    # Child explicitly selected=True but should be blocked by parent=False
    child = ("properties", "a")
    child_md = Metadata()
    child_md.selected = True
    mm[child] = child_md
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 1.79μs -> 1.83μs (2.24% slower)
    sel = dict(selection)


def test_unsupported_inclusion_ignores_selected_true_and_returns_false():
    # Unsupported inclusion should always return False regardless of selected=True
    mm = MetadataMapping()
    b = ("properties", "unsupported_field")
    md = Metadata()
    # Use the real InclusionType enum on the real Metadata class
    md.inclusion = Metadata.InclusionType.UNSUPPORTED
    md.selected = True  # input tries to select it
    mm[b] = md
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 2.75μs -> 2.71μs (1.55% faster)
    sel = dict(selection)


def test_automatic_inclusion_ignores_selected_false_and_returns_true():
    # AUTOMATIC inclusion should return True even if selected is False
    mm = MetadataMapping()
    b = ("properties", "auto_field")
    md = Metadata()
    md.inclusion = Metadata.InclusionType.AUTOMATIC
    md.selected = False  # input tries to deselect
    mm[b] = md
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 2.54μs -> 2.58μs (1.63% slower)
    sel = dict(selection)


def test_selected_by_default_used_when_selected_is_none_and_parent_true():
    # Parent is selected True, child has selected_by_default=True but selected=None
    mm = MetadataMapping()
    mm[()] = StreamMetadata()
    mm[()].selected = True  # parent selected True
    child = ("properties", "child_default")
    md = Metadata()
    md.selected = None
    md.selected_by_default = True
    mm[child] = md
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 1.92μs -> 1.92μs (0.000% faster)
    sel = dict(selection)


def test_fallback_to_parent_or_false_when_metadata_omitted():
    # Parent omitted (defaults to no entry), but mapping contains a property-only metadata.
    # If the property metadata omits selected and selected_by_default, fallback to parent_value
    # which for a missing parent (no metadata entries at all) would be True only in absence
    # of any metadata, but since we have entries (the property itself), parent resolution
    # tries to consult parent breadcrumb which is empty tuple and will be created with default.
    mm = MetadataMapping()
    # Add a property breadcrumb with no selection info
    prop = ("properties", "p")
    mm[prop] = Metadata()  # no selected / selected_by_default set
    # Also add an unrelated breadcrumb to ensure "self" is not empty (so default True is not used)
    mm["properties", "dummy"] = Metadata()
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 3.25μs -> 2.92μs (11.4% faster)
    sel = dict(selection)


def test_nested_parent_chain_blocks_descendants():
    # Test deeper breadcrumb chains: ('properties','parent','properties','child')
    mm = MetadataMapping()
    parent = ("properties", "parent")
    child = ("properties", "parent", "properties", "child")
    # Parent explicitly deselected
    parent_md = Metadata()
    parent_md.selected = False
    mm[parent] = parent_md
    # Child explicitly selected True but should be blocked by parent
    child_md = Metadata()
    child_md.selected = True
    mm[child] = child_md
    codeflash_output = mm.resolve_selection()
    selection = codeflash_output  # 3.67μs -> 3.04μs (20.5% faster)
    sel = dict(selection)


def test_large_scale_selection_mapping_and_counts():
    # Build a mapping with 1000 property breadcrumbs plus a stream-level breadcrumb
    mm = MetadataMapping()
    stream = StreamMetadata()
    stream.selected = (
        True  # choose stream selected True so children can be evaluated independently
    )
    mm[()] = stream
    total = 1000
    # Alternate selection True/False for children to validate correctness at scale
    for i in range(total):
        b = ("properties", f"f{i}")
        md = Metadata()
        # Make even-indexed fields selected True, odd False
        md.selected = i % 2 == 0
        mm[b] = md
    # Resolve selection multiple times to ensure stability and to catch caching / stateful bugs
    for _ in range(5):
        codeflash_output = mm.resolve_selection()
        selection = codeflash_output  # 1.94ms -> 1.19ms (63.2% faster)
        sel = dict(selection)
        # Count how many are True among the property entries
        true_count = sum(1 for i in range(total) if sel["properties", f"f{i}"])
        # Evens are selected -> should be ceil(total/2) when starting at 0
        expected_true = (total + 1) // 2


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

from singer_sdk.singerlib.catalog import Metadata, MetadataMapping, StreamMetadata


def test_resolve_selection_empty_metadata():
    """Test resolve_selection with empty MetadataMapping returns SelectionMask with no entries."""
    mapping = MetadataMapping()
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 583ns -> 708ns (17.7% slower)


def test_resolve_selection_stream_level_selected():
    """Test resolve_selection with stream-level metadata marked as selected."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.42μs -> 1.54μs (8.11% slower)


def test_resolve_selection_stream_level_deselected():
    """Test resolve_selection with stream-level metadata marked as deselected."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=False)
    mapping[()] = stream_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.29μs -> 1.38μs (6.04% slower)


def test_resolve_selection_single_property_selected():
    """Test resolve_selection with a single property marked as selected."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    prop_md = Metadata(selected=True)
    mapping["properties", "id"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 2.04μs -> 2.00μs (2.05% faster)


def test_resolve_selection_single_property_deselected():
    """Test resolve_selection with a single property marked as deselected."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    prop_md = Metadata(selected=False)
    mapping["properties", "name"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.96μs -> 1.92μs (2.19% faster)


def test_resolve_selection_multiple_properties():
    """Test resolve_selection with multiple properties with different selections."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    mapping["properties", "id"] = Metadata(selected=True)
    mapping["properties", "name"] = Metadata(selected=False)
    mapping["properties", "email"] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 2.79μs -> 2.29μs (21.8% faster)


def test_resolve_selection_returns_selection_mask():
    """Test that resolve_selection returns a SelectionMask instance."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.25μs -> 1.29μs (3.25% slower)


def test_resolve_selection_with_selected_by_default():
    """Test resolve_selection respects selected_by_default when selected is not set."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    prop_md = Metadata(selected_by_default=True)
    mapping["properties", "age"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.96μs -> 2.00μs (2.05% slower)


def test_resolve_selection_with_automatic_inclusion():
    """Test resolve_selection with automatic inclusion type returns True."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    prop_md = Metadata(inclusion=Metadata.InclusionType.AUTOMATIC)
    mapping["properties", "id"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.88μs -> 1.83μs (2.24% faster)


def test_resolve_selection_with_unsupported_inclusion():
    """Test resolve_selection with unsupported inclusion type returns False."""
    mapping = MetadataMapping()
    stream_md = StreamMetadata(selected=True)
    mapping[()] = stream_md
    prop_md = Metadata(inclusion=Metadata.InclusionType.UNSUPPORTED)
    mapping["properties", "internal"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.79μs -> 1.88μs (4.48% slower)


def test_resolve_selection_nested_properties():
    """Test resolve_selection with nested property paths."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    mapping["properties", "user"] = Metadata(selected=True)
    mapping["properties", "user", "properties", "name"] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 2.92μs -> 2.38μs (22.8% faster)


def test_resolve_selection_parent_deselected_child_ignored():
    """Test that child properties are deselected when parent is deselected."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    mapping["properties", "user"] = Metadata(selected=False)
    mapping["properties", "user", "properties", "name"] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 2.58μs -> 2.25μs (14.8% faster)


def test_resolve_selection_unsupported_overrides_selected():
    """Test that unsupported inclusion overrides selected=True."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    prop_md = Metadata(selected=True, inclusion=Metadata.InclusionType.UNSUPPORTED)
    mapping["properties", "deprecated"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 3.21μs -> 3.25μs (1.29% slower)


def test_resolve_selection_automatic_overrides_deselected():
    """Test that automatic inclusion overrides selected=False."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    prop_md = Metadata(selected=False, inclusion=Metadata.InclusionType.AUTOMATIC)
    mapping["properties", "id"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 3.17μs -> 3.04μs (4.11% faster)


def test_resolve_selection_no_metadata_defaults_to_true():
    """Test that breadcrumbs with no explicit metadata default based on parent."""
    mapping = MetadataMapping()
    # Only define stream-level metadata
    mapping[()] = StreamMetadata(selected=True)
    # Do not define metadata for the property itself
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.29μs -> 1.29μs (0.000% faster)


def test_resolve_selection_stream_deselected_all_false():
    """Test that all properties are deselected when stream is deselected."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=False)
    mapping["properties", "id"] = Metadata(selected=True)
    mapping["properties", "name"] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 2.25μs -> 1.92μs (17.4% faster)


def test_resolve_selection_with_none_selected_value():
    """Test resolve_selection when selected is None (not set)."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    # Create metadata with selected=None explicitly
    prop_md = Metadata(selected=None, selected_by_default=None)
    mapping["properties", "field"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 3.12μs -> 3.00μs (4.17% faster)


def test_resolve_selection_selected_by_default_false():
    """Test resolve_selection with selected_by_default=False."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    prop_md = Metadata(selected=None, selected_by_default=False)
    mapping["properties", "optional"] = prop_md
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.96μs -> 1.88μs (4.48% faster)


def test_resolve_selection_very_deep_nesting():
    """Test resolve_selection with very deeply nested properties."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    # Create a deeply nested path
    breadcrumb = (
        "properties",
        "a",
        "properties",
        "b",
        "properties",
        "c",
        "properties",
        "d",
    )
    mapping[breadcrumb] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 4.46μs -> 4.17μs (6.98% faster)


def test_resolve_selection_empty_string_in_breadcrumb():
    """Test resolve_selection with empty strings in breadcrumb path."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    breadcrumb = ("properties", "")
    mapping[breadcrumb] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.92μs -> 1.83μs (4.47% faster)


def test_resolve_selection_special_characters_in_property_names():
    """Test resolve_selection with special characters in property names."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    breadcrumb = ("properties", "user-id")
    mapping[breadcrumb] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.96μs -> 1.83μs (6.82% faster)


def test_resolve_selection_unicode_property_names():
    """Test resolve_selection with unicode characters in property names."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    breadcrumb = ("properties", "name_ü")
    mapping[breadcrumb] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 1.96μs -> 1.88μs (4.43% faster)


def test_resolve_selection_order_of_entries():
    """Test that resolve_selection preserves the order of entries."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)
    mapping["properties", "z"] = Metadata(selected=True)
    mapping["properties", "a"] = Metadata(selected=True)
    mapping["properties", "m"] = Metadata(selected=True)
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 3.42μs -> 2.88μs (18.8% faster)


def test_resolve_selection_large_number_of_properties():
    """Test resolve_selection with a large number of properties."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Add 100 properties with alternating selection
    for i in range(100):
        breadcrumb = ("properties", f"field_{i}")
        selected = i % 2 == 0
        mapping[breadcrumb] = Metadata(selected=selected)

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 42.3μs -> 27.6μs (53.3% faster)


def test_resolve_selection_deeply_nested_large_structure():
    """Test resolve_selection with a large structure of deeply nested properties."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Create a structure with depth of 10 levels
    depth = 10
    for i in range(50):
        breadcrumb = ("properties",)
        for level in range(depth):
            breadcrumb += (f"level_{level}", "properties")
        breadcrumb += (f"field_{i}",)
        mapping[breadcrumb] = Metadata(selected=(i % 2 == 0))

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 156μs -> 34.0μs (360% faster)


def test_resolve_selection_mixed_metadata_types_large():
    """Test resolve_selection with mixed metadata types across many properties."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Create 200 properties with different metadata configurations
    for i in range(200):
        breadcrumb = ("properties", f"field_{i}")
        mod = i % 4

        if mod == 0:
            mapping[breadcrumb] = Metadata(selected=True)
        elif mod == 1:
            mapping[breadcrumb] = Metadata(selected=False)
        elif mod == 2:
            mapping[breadcrumb] = Metadata(inclusion=Metadata.InclusionType.AUTOMATIC)
        else:
            mapping[breadcrumb] = Metadata(inclusion=Metadata.InclusionType.UNSUPPORTED)

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 79.8μs -> 52.4μs (52.3% faster)


def test_resolve_selection_complex_parent_child_relationships():
    """Test resolve_selection with complex parent-child relationships at scale."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Create 50 parent objects, each with 10 child properties
    for parent in range(50):
        parent_breadcrumb = ("properties", f"parent_{parent}")
        mapping[parent_breadcrumb] = Metadata(selected=(parent % 2 == 0))

        for child in range(10):
            child_breadcrumb = (
                "properties",
                f"parent_{parent}",
                "properties",
                f"child_{child}",
            )
            # Child explicitly selected even if parent isn't
            mapping[child_breadcrumb] = Metadata(selected=True)

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 310μs -> 142μs (118% faster)


def test_resolve_selection_performance_with_1000_breadcrumbs():
    """Test resolve_selection performance with 1000 breadcrumbs."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Add 1000 breadcrumbs
    for i in range(1000):
        breadcrumb = ("properties", f"field_{i:04d}")
        mapping[breadcrumb] = Metadata(selected=(i % 3 != 0))

    # Execute resolve_selection
    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 401μs -> 230μs (74.3% faster)


def test_resolve_selection_all_deselected_large():
    """Test resolve_selection when stream is deselected with many properties."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=False)

    # Add 100 properties, all explicitly selected
    for i in range(100):
        breadcrumb = ("properties", f"field_{i}")
        mapping[breadcrumb] = Metadata(selected=True)

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 33.3μs -> 20.2μs (65.3% faster)
    for i in range(100):
        pass


def test_resolve_selection_all_selected_by_default_large():
    """Test resolve_selection with many properties using selected_by_default."""
    mapping = MetadataMapping()
    mapping[()] = StreamMetadata(selected=True)

    # Add 150 properties with selected_by_default
    for i in range(150):
        breadcrumb = ("properties", f"field_{i}")
        mapping[breadcrumb] = Metadata(selected=None, selected_by_default=(i % 2 == 0))

    codeflash_output = mapping.resolve_selection()
    result = codeflash_output  # 59.0μs -> 37.3μs (58.0% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-MetadataMapping.resolve_selection-mlr5oifg and push.

Codeflash Static Badge

The optimized code achieves a **71% speedup** (3.10ms → 1.81ms) by introducing **memoization** to eliminate redundant recursive computations in the `resolve_selection()` method.

**What changed:**
The optimization replaces the recursive `_breadcrumb_is_selected()` method with an inline `compute()` function that uses a cache dictionary. When resolving selection for deeply nested breadcrumb hierarchies, the original code repeatedly computed the same parent breadcrumb selections multiple times.

**Why it's faster:**
In the original implementation, each breadcrumb resolution required traversing up the parent chain recursively. For nested structures like `('properties', 'parent', 'properties', 'child')`, the parent breadcrumb `('properties', 'parent')` would be evaluated multiple times if multiple children existed. The line profiler shows `_breadcrumb_is_selected()` was called 15,492 times with 38.6ms total time in the original version, but only once (1μs) in the optimized version.

With memoization, each breadcrumb's selection is computed exactly once and cached. Subsequent lookups are instant dictionary retrievals. This is particularly effective for:
- **Deep nesting**: Tests with 10-level depth show 360% speedup (156μs → 34.0μs)
- **Large flat structures**: 1000 properties show 74.3% speedup (401μs → 230μs)
- **Complex parent-child relationships**: 50 parents with 10 children each show 118% speedup (310μs → 142μs)

**How it impacts workloads:**
Schema resolution with nested properties (common in JSON schemas) benefits most. The test results show consistent speedups across all scale tests, with the largest gains in deeply nested structures. The optimization maintains identical correctness (all tests pass with same results) while dramatically reducing redundant work.

**Trade-offs:**
Small overhead (~100-200ns) for trivial cases with 1-2 breadcrumbs due to cache initialization, but this is negligible compared to the massive gains for realistic workloads with dozens or hundreds of properties.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 17, 2026
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 84.44444% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.28%. Comparing base (0c613aa) to head (3444510).

Files with missing lines Patch % Lines
singer_sdk/singerlib/catalog.py 84.44% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3490      +/-   ##
==========================================
- Coverage   94.39%   94.28%   -0.12%     
==========================================
  Files          71       71              
  Lines        5867     5911      +44     
  Branches      727      738      +11     
==========================================
+ Hits         5538     5573      +35     
- Misses        231      238       +7     
- Partials       98      100       +2     
Flag Coverage Δ
core 82.42% <84.44%> (-0.03%) ⬇️
end-to-end 75.87% <62.22%> (-0.34%) ⬇️
optional-components 43.22% <8.88%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 17, 2026

Merging this PR will not alter performance

✅ 8 untouched benchmarks


Comparing codeflash/optimize-MetadataMapping.resolve_selection-mlr5oifg (3444510) with main (f9b6651)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (0c613aa) during the generation of this report, so f9b6651 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@edgarrmondragon edgarrmondragon changed the title ⚡️ Speed up method MetadataMapping.resolve_selection by 72% perf: ⚡️ Speed up method MetadataMapping.resolve_selection by 72% Feb 17, 2026
@edgarrmondragon edgarrmondragon marked this pull request as draft February 17, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI performance 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant