fix: add fix for slow full count in postgresql #2174

mdearos · 2025-12-03T15:33:45Z

Overview

This commit adds a fix for the slow performance of full counts in PostgreSQL (Issue #1969).

To achieve this two new PostgreSQL provider settings have been added (postgresql_pseudo_count_enabled and postgresql_pseudo_count_start). Allowing the use of pseudo counts to be configured individually on each use of the PostgreSQL provider.

This fix then uses the PostgreSQL EXPLAIN function to "guess" the number of rows that will be returned by a given request. But this does not affect all queries equally because pseudo counts cannot be run on the following types of query:

Requests with a Result Type of Hits.
Requests with a CQL filter.
Requests with a BBOX filter.
Requests with a Temporal filter.

Also, you can use the postgresql_pseudo_count_start setting to tell the system to run a full count if the row estimate is to small meaning there is enough time for a full count to be run.

This commit also adds the required documentation and postgreSQL provider test changes. Including adding a building_type and datetime column to the dummy_data.sql file.

Additional information

Dependency policy (RFC2)

I have ensured that this PR meets RFC2 requirements

Updates to public demo

I have ensured that breaking changes to the pygeoapi master demo server have been addressed
https://github.com/geopython/demo.pygeoapi.io/blob/master/services/pygeoapi_master/local.config.yml

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

I'd like to contribute bugfix Slow Query Performance in Postgres Provider Due to Full count on Large Tables to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
I have already previously agreed to the pygeoapi Contributions and Licensing

This commit adds a fix for the slow performance of full counts in PostgreSQL (Issue geopython#1969). To achieve this two new PostgreSQL provider settings have been added (postgresql_pseudo_count_enabled and postgresql_pseudo_count_start). Allowing the use of pseudo counts to be configured individually on each use of the PostgreSQL provider. This fix then uses the PostgreSQL EXPLAIN function to "guess" the number of rows that will be returned by a given request. But this does not affect all queries equally because pseudo counts cannot be run on the following types of query: - Requests with a Result Type of Hits. - Requests with a CQL filter. - Requests with a BBOX filter. - Requests with a Temporal filter. Also, you can use the postgresql_pseudo_count_start setting to tell the system to run a full count if the row estimate is to small meaning there is enough time for a full count to be run. This commit also adds the required documentation and postgreSQL provider test changes. Including adding a building_type and datetime column to the dummy_data.sql file.

webb-ben · 2025-12-06T01:34:57Z

I wonder if a postgres specific addition to the provider block is the correct approach if we want to maintain a rigid pygeoapi config schema. This appears to port some of the logic implemented in a Psuedocount-specfic pygeoapi Postgres provider.

I wonder if there is a configuration option / solution that could be used across all pygeoapi providers given the numberMatched is not required by the specification. Is better to have no count, or an incorrect one

mikemahoney218-usgs · 2025-12-08T14:29:15Z

Sharing my two cents: for our deployment, we decided that no count was the better option, so none of our postgresql-backed endpoints have counts. We evaluated the same pseudocount implementation and decided that it didn't give enough benefits (both in performance and in allowing users to predict result sizes -- because the count isn't accurate, so you'll likely still need to incrementally grow your result set) versus removing numberMatched in line with the specification. So our deployment has a flag in the providers block of each resource, number_matched_for_results_enabled, controlling if /items queries return numberMatched -- as well as a matching number_matched_for_hits_enabled controlling if resulttype=hits is allowed at all.

Edit to add: Though unlike Ben's comment above, I do prefer controlling this on a resource level, rather than at the server level; we have other data providers that don't have the same trouble counting records (particularly for small tables) and it's nice to enable numberMatched there. We haven't had anyone comment on the inconsistency yet. We also enable resulttype=hits for most resources, and only disable it for particularly large tables.

tomkralidis requested review from justb4, tomkralidis and webb-ben December 4, 2025 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: add fix for slow full count in postgresql #2174

fix: add fix for slow full count in postgresql #2174

mdearos commented Dec 3, 2025 •

edited

Loading

Uh oh!

webb-ben commented Dec 6, 2025

Uh oh!

mikemahoney218-usgs commented Dec 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: add fix for slow full count in postgresql #2174

Are you sure you want to change the base?

fix: add fix for slow full count in postgresql #2174

Conversation

mdearos commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Dependency policy (RFC2)

Updates to public demo

Contributions and licensing

Uh oh!

webb-ben commented Dec 6, 2025

Uh oh!

mikemahoney218-usgs commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdearos commented Dec 3, 2025 •

edited

Loading

mikemahoney218-usgs commented Dec 8, 2025 •

edited

Loading