feat(taps): New RESTStream.get_http_request method to construct the HTTP request for a stream#3483
feat(taps): New RESTStream.get_http_request method to construct the HTTP request for a stream#3483edgarrmondragon wants to merge 34 commits intomainfrom
RESTStream.get_http_request method to construct the HTTP request for a stream#3483Conversation
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Reviewer's GuideIntroduce an HTTPRequest/HTTPRequestContext abstraction for REST streams, refactor RESTStream request preparation to build PreparedRequest objects from HTTPRequest instances, migrate pagination/request examples and taps to override get_http_request instead of get_url_params, and add tests and docs for the new behavior and deprecation of prepare_request overrides. Sequence diagram for RESTStream.request_records with HTTPRequestsequenceDiagram
actor TapRunner
participant RESTStream
participant Paginator as BaseAPIPaginator
participant Ctx as HTTPRequestContext
participant Req as HTTPRequest
participant Session as RequestsSession
participant API
TapRunner->>RESTStream: request_records(context)
RESTStream->>RESTStream: get_new_paginator()
RESTStream-->>RESTStream: Paginator instance
loop until Paginator.finished
alt prepare_request overridden (deprecated path)
RESTStream->>RESTStream: prepare_request(context,next_page_token)
RESTStream-->>Session: PreparedRequest
else default path using HTTPRequest
RESTStream->>RESTStream: _prepare_request(context,paginator)
RESTStream->>Ctx: construct HTTPRequestContext(stream_context,paginator)
RESTStream->>RESTStream: get_http_request(context:HTTPRequestContext)
RESTStream-->>Req: HTTPRequest
Req->>Req: encode_params()
RESTStream->>Session: build_prepared_request(method,url,encoded_params,headers,data or json)
Session-->>RESTStream: PreparedRequest
end
RESTStream->>API: decorated_request(PreparedRequest,context)
API-->>RESTStream: Response
RESTStream->>Paginator: advance(response)
end
RESTStream-->>TapRunner: records iterator
Updated class diagram for RESTStream HTTPRequest abstractionclassDiagram
class HTTPRequest {
+str url
+str method
+dict~str,str~ headers
+dict~str,any or list~any~ or None~ params
+JSONPayload data
+str safe_query_chars
+list_tuple_str_any_ _get_url_params()
+str encode_params()
}
class HTTPRequestContext~TPag~ {
+Context or None stream_context
+TPag paginator
}
class JSONPayload {
}
class _HTTPStream~TToken~ {
<<abstract>>
+requests.Session requests_session
+any authenticator
+PreparedRequest build_prepared_request(method,url,params,headers,json,data)
}
class RESTStream~TToken~ {
+str http_method
+dict~str,str~ http_headers
+bool payload_as_json
+BaseAPIPaginator~TToken~ or None get_new_paginator()
+HTTPRequest get_http_request(context:HTTPRequestContext)
+PreparedRequest _prepare_request(context:Context or None,paginator:BaseAPIPaginator)
+PreparedRequest prepare_request(context:Context or None,next_page_token:TToken or None)
+JSONPayload prepare_request_payload(context:Context or None,next_page_token:TToken or None)
+Iterable~dict~ request_records(context:Context or None)
}
class BaseAPIPaginator~TToken~ {
<<abstract>>
+TToken or None current_value
+bool finished
+void advance(response)
}
class LegacyStreamPaginator~TToken~ {
}
class JSONPathPaginator {
}
class SimpleHeaderPaginator {
}
class OffsetPaginator {
+int page_size
}
HTTPRequestContext~TPag~ --> HTTPRequest : used_by_get_http_request
RESTStream~TToken~ --> HTTPRequest : returns
RESTStream~TToken~ --> HTTPRequestContext~BaseAPIPaginator~ : constructs
_HTTPStream~TToken~ <|-- RESTStream~TToken~
BaseAPIPaginator~TToken~ <|-- LegacyStreamPaginator~TToken~
BaseAPIPaginator~TToken~ <|-- JSONPathPaginator
BaseAPIPaginator~TToken~ <|-- SimpleHeaderPaginator
BaseAPIPaginator~TToken~ <|-- OffsetPaginator
RESTStream~TToken~ --> BaseAPIPaginator~TToken~ : get_new_paginator
HTTPRequestContext~TPag~ --> BaseAPIPaginator~TToken~ : paginator
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
tap-dummyjsonRESTStream.get_http_request
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3483 +/- ##
=======================================
Coverage 94.43% 94.43%
=======================================
Files 72 72
Lines 5890 5926 +36
Branches 727 730 +3
=======================================
+ Hits 5562 5596 +34
- Misses 231 232 +1
- Partials 97 98 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Documentation build overview
Show files changed (10 files in total): 📝 8 modified | ➕ 2 added | ➖ 0 deleted
|
RESTStream.get_http_requestRESTStream.get_http_request method to construct the HTTP request for a stream
…pagination Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
…headers Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
… in case developer is overridin `build_prepared_request()` Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
4e65c28 to
0bfa839
Compare
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
a1b8929 to
5f35372
Compare
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
5f35372 to
4f380b1
Compare
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
…eam.get_http_request` (#3509) SSIA ## Summary by Sourcery Deprecate RESTStream.prepare_request in favor of RESTStream.get_http_request and update pagination handling to operate on paginator instances instead of raw page tokens. New Features: - Allow HTTPRequestContext to carry a paginator instance, enabling richer pagination-aware request construction. - Expose the page_size attribute on BaseOffsetPaginator for use in request parameter building. Enhancements: - Adjust REST stream request construction to use paginator.current_value via the new HTTPRequestContext paginator field. - Introduce an internal helper to build PreparedRequest objects from HTTPRequestContext while keeping backward compatibility with overridden prepare_request implementations. - Tighten typing on get_new_paginator and related HTTPRequestContext generics to bind them to concrete paginator types across the SDK and example taps. - Update example taps and tests to construct HTTP requests using paginator instances rather than raw next_page tokens. Tests: - Adapt REST pagination tests to the new HTTPRequestContext paginator API and ensure backward compatibility paths are covered. Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
…n favor of `PageNumberPaginator` and `OffsetPaginator`, respectively (#3510) These were made "base" classes with the expectation that users would override them, but #1918 made that redundant. ## Related - #1918 - #2989 - #3483 ## Summary by Sourcery Deprecate the base paginator classes in favor of concrete PageNumberPaginator and OffsetPaginator and update usages, tests, and docs accordingly. New Features: - Introduce PageNumberPaginator and OffsetPaginator as the primary paginator implementations for page-number and offset-based pagination. - Add a shared singer_sdk_deprecated decorator helper preconfigured with the SDK's deprecation warning type. Bug Fixes: - Tighten paginator-related tests by asserting concrete types and cleaning up header handling in header-based pagination tests. Enhancements: - Mark BasePageNumberPaginator and BaseOffsetPaginator as deprecated wrappers around the new paginator classes, exposing page_size as a property on OffsetPaginator. - Update REST stream examples, cookiecutter templates, and dummyjson tap client to use the new paginator classes and type hints, including explicit override annotations and typed Context usage. Documentation: - Revise pagination documentation and cookiecutter guidance to recommend PageNumberPaginator and OffsetPaginator, and add reference stubs for their API documentation. Tests: - Extend and adjust pagination tests to cover deprecation warnings for the base paginator classes and to use the new PageNumberPaginator and OffsetPaginator implementations. ## Summary by Sourcery Deprecate legacy base pagination classes in favor of concrete page-number and offset paginator implementations and update usages, tests, and docs accordingly. New Features: - Introduce PageNumberPaginator and OffsetPaginator as the primary page-number and offset-based paginator implementations. - Add a shared singer_sdk_deprecated helper preconfigured with the SDK's deprecation warning type for consistent deprecation messaging. Bug Fixes: - Tighten header-based and link-based pagination tests by asserting concrete types and correcting header handling edge cases. Enhancements: - Mark BasePageNumberPaginator and BaseOffsetPaginator as deprecated wrappers around the new paginator classes, exposing page_size as a property on OffsetPaginator. - Update REST stream examples, cookiecutter templates, and the dummyjson tap client to use the new paginator classes, override annotations, and typed Context parameters. Documentation: - Revise pagination guides and cookiecutter documentation to recommend PageNumberPaginator and OffsetPaginator and add API reference stubs for their classes. Tests: - Extend pagination tests to cover deprecation warnings for the base paginator classes and to exercise the new concrete paginator implementations. --------- Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- The new
HTTPRequest.encode_params()assumesself.paramsis a mapping, which breaks the legacyget_url_paramscontract that allowed returning a pre-encodedstr; consider either preserving the oldprepare_requestbehavior for taps that only overrideget_url_params, or teachingHTTPRequest/get_http_requestto handle a stringparamsvalue for backward compatibility. HTTPRequestContextrequires a non-optional paginator type parameter (_TPagbounded toBaseAPIPaginator), but some usages conceptually allowNone(e.g. unpaginated streams/tests); you may want to makepaginatoroptional in the context or relax the generic bound so type hints accurately reflect valid runtime states.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `HTTPRequest.encode_params()` assumes `self.params` is a mapping, which breaks the legacy `get_url_params` contract that allowed returning a pre-encoded `str`; consider either preserving the old `prepare_request` behavior for taps that only override `get_url_params`, or teaching `HTTPRequest`/`get_http_request` to handle a string `params` value for backward compatibility.
- `HTTPRequestContext` requires a non-optional paginator type parameter (`_TPag` bounded to `BaseAPIPaginator`), but some usages conceptually allow `None` (e.g. unpaginated streams/tests); you may want to make `paginator` optional in the context or relax the generic bound so type hints accurately reflect valid runtime states.
## Individual Comments
### Comment 1
<location path="singer_sdk/streams/rest.py" line_range="102-110" />
<code_context>
+ #: ASCII characters that should not be quoted when encoding URL parameters.
+ safe_query_chars: str = ""
+
+ def _get_url_params(self) -> list[tuple[str, t.Any]]:
+ """Encode parameters as a list of tuples.
+
+ Returns:
+ A list of tuples of encoded parameters.
+ """
+ result: list[tuple[str, t.Any]] = []
+ for k, v in self.params.items():
+ values = [v] if not isinstance(v, list) else v
+ result.extend((k, v) for v in values if v is not None)
+ return result
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Broaden handling of multi-valued parameters beyond just `list`.
`_get_url_params` currently special-cases only `list` for multi-valued params. If a caller uses another sequence type like a `tuple`, it will be treated as a scalar and encoded via `str()`, leading to different behavior from other parts of the SDK that already accept tuples (e.g. `JSONPayload`). Consider treating any non-string `Sequence` (or at least adding `tuple`) as multi-valued to keep encoding consistent and avoid subtle issues for consumers passing tuples.
Suggested implementation:
```python
def _get_url_params(self) -> list[tuple[str, t.Any]]:
"""Encode parameters as a list of tuples.
Returns:
A list of tuples of encoded parameters.
"""
result: list[tuple[str, t.Any]] = []
for k, v in self.params.items():
if v is None:
continue
# Treat any non-string sequence as multi-valued (e.g. list, tuple).
try:
from collections.abc import Sequence # type: ignore[import]
except Exception: # pragma: no cover - fallback if import location changes
Sequence = tuple # type: ignore[assignment]
if isinstance(v, Sequence) and not isinstance(v, (str, bytes, bytearray)):
values = v
else:
values = [v]
result.extend((k, item) for item in values if item is not None)
return result
```
To keep imports clean and avoid the fallback import inside the loop, you may want to:
1. Add `from collections.abc import Sequence` near the top of `rest.py` with the other imports.
2. After that, simplify the function by removing the `try/except` and using `Sequence` directly:
- Remove the `try/except` block and `Sequence = tuple` fallback.
- Use just: `if isinstance(v, Sequence) and not isinstance(v, (str, bytes, bytearray)):`.
If the codebase has a shared typing/import convention (e.g. already importing `Sequence` elsewhere), align this change with that convention instead of the inline import.
</issue_to_address>
### Comment 2
<location path="packages/meltano-tap-dummyjson/tap_dummyjson/client.py" line_range="63-74" />
<code_context>
@override
- def get_url_params(
+ def get_http_request(
self,
- context: Context | None,
- next_page_token: int | None,
- ) -> dict[str, int | None]:
- return {
- "skip": next_page_token,
- "limit": PAGE_SIZE,
+ *,
+ context: HTTPRequestContext[OffsetPaginator],
+ ) -> HTTPRequest:
+ request = super().get_http_request(context=context)
+ request.params = {
+ "skip": context.paginator.current_value,
+ "limit": context.paginator.page_size,
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid clobbering base query params when customizing pagination.
This override replaces `request.params` instead of extending it, so any defaults from the base class (or future additions) are lost. To keep base behavior while adjusting pagination, update the existing params instead, e.g. `request.params.update({"skip": ..., "limit": ...})`.
```suggestion
@override
def get_http_request(
self,
*,
context: HTTPRequestContext[OffsetPaginator],
) -> HTTPRequest:
request = super().get_http_request(context=context)
# Ensure params is a dict so we can safely update it
if request.params is None:
request.params = {}
request.params.update(
{
"skip": context.paginator.current_value,
"limit": context.paginator.page_size,
}
)
return request
```
</issue_to_address>
### Comment 3
<location path="packages/meltano-tap-dummyjson/tap_dummyjson/client.py" line_range="77-74" />
<code_context>
@override
- def post_process(
+ def prepare_request(
self,
- row: Record,
- context: Context | None = None,
- ) -> Record | None:
- return row
+ context: Context | None,
+ next_page_token: int | None,
+ ) -> requests.PreparedRequest:
+ request = super().prepare_request(context, next_page_token)
+ if next_page_token is not None:
+ request.headers["X-Custom-Header"] = "value"
+ return request
</code_context>
<issue_to_address>
**issue:** Overriding `prepare_request` defeats the new `get_http_request` flow and triggers deprecation warnings.
With `request_records`, overriding `prepare_request` keeps you on the deprecated path and bypasses `_prepare_request`/`get_http_request`, so your `get_http_request` override is never used. To follow the new API and avoid deprecation warnings, move the header logic into `get_http_request` (e.g. `request.headers["X-Custom-Header"] = "value"`) and drop the `prepare_request` override.
</issue_to_address>
### Comment 4
<location path="tests/core/rest/test_request.py" line_range="6-15" />
<code_context>
+from singer_sdk.streams.rest import HTTPRequest
+
+
+def test_custom_safe_chars():
+ request = HTTPRequest(
+ method="GET",
+ url="https://example.com/api",
+ headers={"Authorization": "Bearer token"},
+ params={"key": "(abc)"},
+ safe_query_chars="()",
+ )
+
+ assert request.encode_params() == "key=(abc)"
</code_context>
<issue_to_address>
**suggestion (testing):** Broaden coverage of `HTTPRequest.encode_params` to include lists, `None` values, and default quoting behavior.
Consider expanding the tests to cover:
- Multiple params including some with `None` values, asserting `None` entries are omitted and ordering is preserved.
- A param whose value is a list (e.g., `{"key": ["a", "b"]}`) to verify `doseq=True` behavior.
- A call without `safe_query_chars`, asserting characters like `(` and `)` are percent-encoded.
This will exercise `_get_url_params`/`encode_params` across key edge cases.
</issue_to_address>
### Comment 5
<location path="docs/guides/pagination-classes.md" line_range="43-45" />
<code_context>
+ def get_http_request(self, *, context):
+ request = super().get_http_request(context=context)
+ if context.paginator.current_value:
+ # Next page token is a URL, so we can to set the request URL directly
+ request.url = context.paginator.current_value.geturl()
+ return request
</code_context>
<issue_to_address>
**issue (typo):** Fix small grammar issue in this comment text.
"so we can to set" is ungrammatical; please change it to "so we can set the request URL directly".
```suggestion
if context.paginator.current_value:
# Next page token is a URL, so we can set the request URL directly
request.url = context.paginator.current_value.geturl()
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
HTTPRequestContextgeneric is bounded toBaseAPIPaginator, but several usages (e.g.HTTPRequestContext[None]in tests) violate that bound and will upset static type checkers; consider either loosening the bound or updating those call sites to use a paginator-compatible type parameter. - In
EpicIssuesStream.get_http_request, you now readcontext.stream_contextdirectly; it might be worth adding a short comment or helper to make it clearer thatstream_contextcan beNoneand that the method assumes anepic_id-bearing context is always provided for this stream.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `HTTPRequestContext` generic is bounded to `BaseAPIPaginator`, but several usages (e.g. `HTTPRequestContext[None]` in tests) violate that bound and will upset static type checkers; consider either loosening the bound or updating those call sites to use a paginator-compatible type parameter.
- In `EpicIssuesStream.get_http_request`, you now read `context.stream_context` directly; it might be worth adding a short comment or helper to make it clearer that `stream_context` can be `None` and that the method assumes an `epic_id`-bearing context is always provided for this stream.
## Individual Comments
### Comment 1
<location path="docs/guides/pagination-classes.md" line_range="44" />
<code_context>
- return params
+ def get_http_request(self, *, context):
+ request = super().get_http_request(context=context)
+ if context.paginator.current_value:
+ # Next page token is a URL, so we can to set the request URL directly
+ request.url = context.paginator.current_value.geturl()
+ return request
</code_context>
<issue_to_address>
**issue (typo):** Fix the grammatical error "can to" in this comment (appears twice in this file).
The phrase is currently "so we can to set the request URL directly"; update it to "so we can set the request URL directly" in both occurrences in this file.
```suggestion
# Next page token is a URL, so we can set the request URL directly
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
request_records, the deprecation check forprepare_requestis performed on every pagination loop iteration, which will emit a warning per page; consider moving thetype(self).prepare_request is not _HTTPStream.prepare_requestcheck (and warning) outside the loop so each stream only warns once per sync.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `request_records`, the deprecation check for `prepare_request` is performed on every pagination loop iteration, which will emit a warning per page; consider moving the `type(self).prepare_request is not _HTTPStream.prepare_request` check (and warning) outside the loop so each stream only warns once per sync.
## Individual Comments
### Comment 1
<location path="tests/core/rest/test_pagination.py" line_range="578-587" />
<code_context>
+def test_prepare_request_override_emits_deprecation(tap: Tap):
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert on the returned records to ensure the deprecated path still behaves correctly functionally.
You can mirror the pattern from `test_prepare_request_override_in_intermediate_class_emits_deprecation`, e.g.:
```python
with pytest.warns(SingerSDKDeprecationWarning, match="prepare_request"):
records = list(stream.request_records(context=None))
assert records == [{"id": 1}]
```
This ensures the deprecated path not only warns but still issues the request and parses the response correctly.
Suggested implementation:
```python
def test_prepare_request_override_emits_deprecation(tap: Tap):
"""Validate that overriding prepare_request emits a deprecation warning."""
class MyAPIStream(RESTStream):
name = "my-api-stream"
url_base = "https://my.api.test"
path = "/path/to/resource"
schema: t.ClassVar[dict] = {
"type": "object",
"properties": {"id": {"type": "integer"}},
}
stream = MyAPIStream(tap=tap)
with pytest.warns(SingerSDKDeprecationWarning, match="prepare_request"):
records = list(stream.request_records(context=None))
assert records == [{"id": 1}]
```
This change assumes:
1. `pytest` and `SingerSDKDeprecationWarning` are already imported in `tests/core/rest/test_pagination.py`. If not, add them to the imports at the top of the file.
2. The test or shared fixtures already configure the HTTP response for `MyAPIStream` so that `request_records` yields a single record `{"id": 1}`. If not, mirror the response mocking setup used in `test_prepare_request_override_in_intermediate_class_emits_deprecation` so that the functional assertion on `records` passes.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
4b77e37 to
1ca4ea0
Compare
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
|
@dependabot rebase |
Related
RESTStream.get_http_requestreservoir-data/tap-bitly#494Summary by Sourcery
Introduce a dedicated HTTPRequest abstraction in the REST stream layer and update tap-dummyjson to use it for request construction and parameter handling.
New Features:
Enhancements:
Tests:
Summary by Sourcery
Introduce an HTTPRequest abstraction and context for RESTStream to construct and customize HTTP requests, and migrate pagination and client patterns to use it instead of get_url_params while deprecating prepare_request overrides.
New Features:
Enhancements:
Documentation:
Tests: