Skip to content

feat(taps): Introduced method in Tap class to convert input catalog to Stream objects, without discovery#3324

Draft
edgarrmondragon wants to merge 3 commits intomainfrom
feat/streams-from-catalog-2
Draft

feat(taps): Introduced method in Tap class to convert input catalog to Stream objects, without discovery#3324
edgarrmondragon wants to merge 3 commits intomainfrom
feat/streams-from-catalog-2

Conversation

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Oct 21, 2025

This clarifies whether the tap should perform discovery or use the input catalog, the two being mutually exclusive.

Supersedes #3084

Summary by Sourcery

Provide a clear separation between discovery and catalog-driven stream loading by adding load_streams_from_catalog and updating streams property, and enhance Catalog with a has_stream_selected helper

New Features:

  • Introduce load_streams_from_catalog method in Tap to recreate streams from a provided catalog without discovery
  • Update Tap.streams property to choose between discovery and input catalog processing
  • Add has_stream_selected method to Catalog for checking if a stream is selected

Enhancements:

  • Add Iterable import under TYPE_CHECKING in Tap and template for consistent typing

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 21, 2025

Reviewer's Guide

This PR refactors stream loading in Tap to distinguish between discovery and catalog-driven workflows by introducing a new load_streams_from_catalog method and branching logic in streams(), adds a helper to check stream selection in Catalog, and updates TYPE_CHECKING imports for Iterable.

Sequence diagram for Tap.streams() branching logic

sequenceDiagram
    participant Tap
    participant Catalog
    participant Stream
    Tap->>Tap: streams()
    alt input_catalog is None
        Tap->>Tap: load_streams()
        Tap->>Stream: create Stream objects
        Tap->>Tap: store streams by name
    else input_catalog is provided
        Tap->>Tap: load_streams_from_catalog(input_catalog)
        Tap->>Stream: create Stream objects from catalog
        Tap->>Stream: apply_catalog(input_catalog)
        Tap->>Tap: store streams by name
    end
Loading

Class diagram for updated Tap class and Catalog

classDiagram
    class Tap {
        +streams: dict[str, Stream]
        +load_streams() list[Stream]
        +load_streams_from_catalog(catalog: Catalog) Iterable[Stream]
    }
    class Catalog {
        +get_stream(stream_id: str) CatalogEntry | None
        +has_stream_selected(stream_id: str) bool
    }
    Tap --> Catalog : uses
    Catalog --> CatalogEntry : returns
Loading

Class diagram for new has_stream_selected method in Catalog

classDiagram
    class Catalog {
        +has_stream_selected(stream_id: str) bool
    }
    class CatalogEntry {
        +metadata
    }
    Catalog --> CatalogEntry : get_stream(stream_id)
    CatalogEntry --> Metadata : metadata
    class Metadata {
        +resolve_selection() mask
    }
Loading

File-Level Changes

Change Details Files
Conditional stream loading based on presence of input_catalog
  • Replaced unconditional load_streams() loop with conditional logic
  • When no catalog, build streams via load_streams()
  • When catalog provided, use load_streams_from_catalog and apply_catalog on each stream
singer_sdk/tap_base.py
Added load_streams_from_catalog method
  • Defined method signature returning Iterable[Stream]
  • Provided docstring explaining backward compatibility
  • Delegates to existing load_streams()
singer_sdk/tap_base.py
Added has_stream_selected utility in Catalog
  • Introduced has_stream_selected() returning metadata selection mask
  • Retrieves stream via get_stream and resolves its selection
singer_sdk/singerlib/catalog.py
Updated TYPE_CHECKING imports for Iterable
  • Imported Iterable from collections.abc in tap_base.py
  • Mirrored the import in the cookiecutter template
singer_sdk/tap_base.py
cookiecutter/tap-template/{{cookiecutter.tap_id}}/{{cookiecutter.library_name}}/non-sql-tap.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@edgarrmondragon edgarrmondragon changed the title feat(tap): Introduced method in Tap class to convert input catalog to Stream objects, without discovery feat(taps): Introduced method in Tap class to convert input catalog to Stream objects, without discovery Oct 21, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `singer_sdk/tap_base.py:414-425` </location>
<code_context>
             reverse=False,
         )

+    def load_streams_from_catalog(self, catalog: Catalog) -> Iterable[Stream]:  # noqa: ARG002
+        """Recreate streams from the input catalog.
+
+        Returns:
+            Iterable[Stream]: An iterable of Stream objects recreated from the input
+                catalog.
+        For backwards compatibility calls `self.load_streams()`
+
+        Returns:
+            A mapping of names to streams, using provided catalog.
+        """
+        return self.load_streams()
+
     # Bookmarks and state management
</code_context>

<issue_to_address>
**suggestion:** The new method load_streams_from_catalog currently ignores the catalog argument.

Consider updating the implementation to use the catalog argument, or clarify in the docstring that it is currently unused.

```suggestion
    def load_streams_from_catalog(self, catalog: Catalog) -> Iterable[Stream]:  # noqa: ARG002
        """Recreate streams from the input catalog.

        Note:
            The `catalog` argument is currently unused. For backwards compatibility,
            this method calls `self.load_streams()`.

        Returns:
            Iterable[Stream]: An iterable of Stream objects recreated from the input
                catalog.
        """
        return self.load_streams()
```
</issue_to_address>

### Comment 2
<location> `singer_sdk/tap_base.py:168-172` </location>
<code_context>
-
-            for stream in self.load_streams():
-                if input_catalog is not None:
+            if input_catalog is None:
+                self._streams = {stream.name: stream for stream in self.load_streams()}
+            else:
+                self._streams = {}
+                for stream in self.load_streams_from_catalog(input_catalog):
                     stream.apply_catalog(input_catalog)
-                self._streams[stream.name] = stream
</code_context>

<issue_to_address>
**suggestion:** The logic for stream loading now depends on input_catalog, but load_streams_from_catalog currently delegates to load_streams.

If catalog-based filtering is intended, implement it in load_streams_from_catalog or clarify the expected override behavior in documentation.
</issue_to_address>

### Comment 3
<location> `singer_sdk/singerlib/catalog.py:436-445` </location>
<code_context>
         """
         return self.get(stream_id)
+
+    def has_stream_selected(self, stream_id: str) -> bool:
+        """Returns true if the stream is selected.
+
+        Args:
+            stream_id: The ID/name of the stream.
+        """
+        if stream := self.get_stream(stream_id):
+            mask = stream.metadata.resolve_selection()
+            return mask[()]
+
+        return False
</code_context>

<issue_to_address>
**issue (bug_risk):** has_stream_selected may raise if resolve_selection returns a non-dict or mask is missing the expected key.

Add checks to confirm resolve_selection returns a dict and mask contains the expected key to prevent runtime errors from malformed input.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +436 to +445
def has_stream_selected(self, stream_id: str) -> bool:
"""Returns true if the stream is selected.

Args:
stream_id: The ID/name of the stream.
"""
if stream := self.get_stream(stream_id):
mask = stream.metadata.resolve_selection()
return mask[()]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): has_stream_selected may raise if resolve_selection returns a non-dict or mask is missing the expected key.

Add checks to confirm resolve_selection returns a dict and mask contains the expected key to prevent runtime errors from malformed input.

@read-the-docs-community
Copy link

read-the-docs-community bot commented Oct 21, 2025

Documentation build overview

📚 Meltano SDK | 🛠️ Build #30315116 | 📁 Comparing c471a51 against latest (3f03963)


🔍 Preview build

Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
classes/singer_sdk.Tap.html 📝 modified
classes/singer_sdk.sql.SQLTap.html 📝 modified

@edgarrmondragon edgarrmondragon force-pushed the feat/streams-from-catalog-2 branch from 2f61b57 to 923687d Compare October 21, 2025 22:21
@edgarrmondragon edgarrmondragon marked this pull request as draft October 21, 2025 22:21
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 21, 2025

CodSpeed Performance Report

Merging #3324 will not alter performance

Comparing feat/streams-from-catalog-2 (c471a51) with main (d052311)1

Summary

✅ 8 untouched

Footnotes

  1. No successful run was found on main (3f03963) during the generation of this report, so d052311 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.72%. Comparing base (3f03963) to head (c471a51).

Files with missing lines Patch % Lines
singer_sdk/singerlib/catalog.py 20.00% 4 Missing ⚠️

❌ Your patch check has failed because the patch coverage (66.66%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3324      +/-   ##
==========================================
- Coverage   93.78%   93.72%   -0.07%     
==========================================
  Files          69       69              
  Lines        5758     5766       +8     
  Branches      713      714       +1     
==========================================
+ Hits         5400     5404       +4     
- Misses        254      258       +4     
  Partials      104      104              
Flag Coverage Δ
core 80.74% <66.66%> (-0.05%) ⬇️
end-to-end 76.48% <66.66%> (-0.04%) ⬇️
optional-components 43.56% <16.66%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@edgarrmondragon edgarrmondragon force-pushed the feat/streams-from-catalog-2 branch from ead4d7b to 3f07ba1 Compare October 24, 2025 16:18
@edgarrmondragon edgarrmondragon force-pushed the feat/streams-from-catalog-2 branch from 3f07ba1 to 4d6bcf6 Compare November 6, 2025 21:02
@edgarrmondragon edgarrmondragon added the Type/Tap Singer taps label Nov 6, 2025
Rafał Krupiński and others added 2 commits November 7, 2025 16:05
…to `Stream` objects, without discovery

This clarifies whether the tap should perform discovery or use the input catalog, the two being mutually exclusive.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: Rafal Krupinski <low.map7277@fastmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>

Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@edgarrmondragon edgarrmondragon force-pushed the feat/streams-from-catalog-2 branch from 4d6bcf6 to be2d51f Compare November 7, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type/Tap Singer taps

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant