-
-
Notifications
You must be signed in to change notification settings - Fork 173
Description
I would like to report an issue I found while testing the /search endpoint.
More specifically, whatever I do I always get 100 records (which seems to be the page size). I tested this both by calling the /search endpoint directly and using the pystac-client library.
For example, I tested a collection having 2606 items.
When I use the endpoint /collections/{cid}/items I get the full set (2606 items).
However, with /search I always get only 100 items. One of my tests is the following:
def iter_search_items(
session: requests.Session,
base_url: str,
search_body: Dict[str, Any],
limit: int = 100
) -> Iterable[Dict[str, Any]]:
"""Iterate over all items returned by POST /search, following rel=next links.
This version assumes the backend returns a STAC ItemCollection with:
- "features": [...]
- "links": [{"rel": "next", "href": "https://.../collections/metadata:main/items?...&offset=100"}, ...]
"""
base_url = base_url.rstrip("/")
url = f"{base_url}/search"
body = dict(search_body)
body.setdefault("limit", limit)
# First page: POST /search
print(f"POST {url} with body={body}")
resp = session.post(url, json=body, timeout=60)
resp.raise_for_status()
page_idx = 0
while True:
data = resp.json()
page_idx += 1
features = data.get("features") or []
links = data.get("links") or []
print(f"Page: {page_idx}, features: {len(features)}, link rels: {[l.get("rel") for l in links]}")
for feat in features:
yield feat
# Find next link
next_link = next((l for l in links if l.get("rel") == "next"), None)
if not next_link:
print(f"No 'next' link on page {page_idx} - stopping")
break
next_href = next_link.get("href")
if not next_href:
print(f"Next link on page {page_idx} has no 'href' - stopping")
break
print(f"GET next page: {next_href}")
resp = session.get(next_href, timeout=60)
resp.raise_for_status()
And I call it like this:
session = get_auth_session(client_id)
search_body = {"collections": [collection_id]} # plus bbox, datetime, etc
item_ids = [feat["id"] for feat in iter_search_items(session, base_url, search_body, limit=2000)]
print(f"Total items: {len(item_ids)}")
The result is:
POST https://stac-catalogue.xxxxxxx/search with body={'collections': ['ARD_S2'], 'limit': 2000}
Page: 1, features: 100, link rels: ['self', 'collection', 'first', 'next', 'root']
GET next page: https://stac-catalogue.xxxxxx/collections/metadata:main/items?limit=2000&collections=%5B%27ARD_S2%27%5D&offset=100
Page: 2, features: 0, link rels: ['self', 'collection', 'root']
No 'next' link on page 2 - stopping
Total items: 100
So:
The first /search call returns 100 items and includes a rel="next" link.
Following that rel="next" link returns 0 features and no further next link.
The iteration therefore stops at 100 items total, even though /collections/ARD_S2/items exposes 2606 items.
Is there something I am missing in how /search is supposed to work or is the endpoint not behaving as expected?