Skip to content

Page size issue in /search endpoint #1167

@EleniLouvari

Description

@EleniLouvari

I would like to report an issue I found while testing the /search endpoint.
More specifically, whatever I do I always get 100 records (which seems to be the page size). I tested this both by calling the /search endpoint directly and using the pystac-client library.
For example, I tested a collection having 2606 items.
When I use the endpoint /collections/{cid}/items I get the full set (2606 items).
However, with /search I always get only 100 items. One of my tests is the following:

def iter_search_items(
    session: requests.Session,
    base_url: str,
    search_body: Dict[str, Any],
    limit: int = 100
) -> Iterable[Dict[str, Any]]:
    """Iterate over all items returned by POST /search, following rel=next links.

    This version assumes the backend returns a STAC ItemCollection with:
    - "features": [...]
    - "links": [{"rel": "next", "href": "https://.../collections/metadata:main/items?...&offset=100"}, ...]
    """

    base_url = base_url.rstrip("/")
    url = f"{base_url}/search"

    body = dict(search_body)
    body.setdefault("limit", limit)

    # First page: POST /search
    print(f"POST {url} with body={body}")
    resp = session.post(url, json=body, timeout=60)
    resp.raise_for_status()

    page_idx = 0

    while True:
        data = resp.json()
        page_idx += 1

        features = data.get("features") or []
        links = data.get("links") or []

        print(f"Page: {page_idx}, features: {len(features)}, link rels: {[l.get("rel") for l in links]}")

        for feat in features:
            yield feat

        # Find next link
        next_link = next((l for l in links if l.get("rel") == "next"), None)
        if not next_link:
            print(f"No 'next' link on page {page_idx} - stopping")
            break

        next_href = next_link.get("href")
        if not next_href:
            print(f"Next link on page {page_idx} has no 'href' - stopping")
            break

        print(f"GET next page: {next_href}")
        resp = session.get(next_href, timeout=60)
        resp.raise_for_status()

And I call it like this:

session = get_auth_session(client_id)
search_body = {"collections": [collection_id]}  # plus bbox, datetime, etc
item_ids = [feat["id"] for feat in iter_search_items(session, base_url, search_body, limit=2000)]
print(f"Total items: {len(item_ids)}")

The result is:

POST https://stac-catalogue.xxxxxxx/search with body={'collections': ['ARD_S2'], 'limit': 2000}
Page: 1, features: 100, link rels: ['self', 'collection', 'first', 'next', 'root']
GET next page: https://stac-catalogue.xxxxxx/collections/metadata:main/items?limit=2000&collections=%5B%27ARD_S2%27%5D&offset=100
Page: 2, features: 0, link rels: ['self', 'collection', 'root']
No 'next' link on page 2 - stopping
Total items: 100

So:
The first /search call returns 100 items and includes a rel="next" link.
Following that rel="next" link returns 0 features and no further next link.
The iteration therefore stops at 100 items total, even though /collections/ARD_S2/items exposes 2606 items.
Is there something I am missing in how /search is supposed to work or is the endpoint not behaving as expected?

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions