Error Message and Logs
Description
The default healthcheck for standalone database resources uses psql -U postgres -d postgres -c 'SELECT 1' with a 5-second interval. Under I/O stall conditions (e.g., cloud provider maintenance), this causes a cascading failure: each healthcheck opens a full PostgreSQL connection that never completes, piling up 12 connections/minute. Within ~8 minutes the default max_connections = 100 is exhausted, locking out all services. The server load spiked to 208, rendering the entire server unresponsive — including services with their own separate databases not connected to the affected instance. Neither docker restart, docker kill, nor kill -9 could recover the container — only a full server reboot resolved it.
The healthcheck — which should detect problems — becomes the primary cause of the outage.
Error Message and Logs
PostgreSQL log (from dependent services and direct connection attempts):
FATAL: sorry, too many clients already
Healthcheck processes visible on the host (ps -efa | grep 2077, where 2077 is the postgres main PID):
root 2011149 2077 0 08:47 ? 00:00:00 /usr/lib/postgresql/17/bin/psql -U postgres -d postgres -c SELECT 1
999 2011150 2077 0 08:47 ? 00:00:00 postgres: postgres postgres [local] initializing
root 2011455 2077 0 08:47 ? 00:00:00 /usr/lib/postgresql/17/bin/psql -U postgres -d postgres -c SELECT 1
999 2011456 2077 0 08:47 ? 00:00:00 postgres: postgres postgres [local] initializing
[... 144 such pairs accumulated over ~12 minutes ...]
Each psql healthcheck spawns a postgres backend in "initializing" state that never completes. These are not cleaned up because the 5-second timeout does not apply when the underlying I/O is blocked (process enters uninterruptible D-state).
Steps to Reproduce
- Deploy a standalone PostgreSQL database via Coolify (default healthcheck settings apply automatically)
- Simulate an I/O stall on the database volume (or wait for cloud provider maintenance that blocks volume I/O)
- Observe:
psql SELECT 1 healthcheck processes accumulate every 5 seconds, each holding a connection in "initializing" state
- After ~8 minutes (with default
max_connections = 100), all connection slots are consumed by healthcheck processes
- All services depending on the database receive
FATAL: sorry, too many clients already
- Server load climbs into the hundreds; the container becomes unrecoverable via
docker restart/kill
The hardcoded healthcheck configuration from the generated docker-compose.yml:
healthcheck:
test: ["CMD-SHELL", "psql -U postgres -d postgres -c 'SELECT 1' || exit 1"]
interval: 5s
timeout: 5s
retries: 10
start_period: 5s
Example Repository URL
No response
Coolify Version
v4.0.0-beta.473
Are you using Coolify Cloud?
No (self-hosted)
Operating System and Version (self-hosted)
Ubuntu 24.04
Additional Information
Workaround attempted: Editing /data/coolify/databases/<uuid>/docker-compose.yml directly does not persist — Coolify regenerates this file from its internal database on every restart/redeploy. The "Custom PostgreSQL Configuration" field in the UI applies to postgresql.conf, not to Docker healthcheck parameters.
Related: The same class of problem was fixed for Nextcloud in #9440 (fix(service): nextcloud workers exhaustion due to low interval healthcheck). The standalone database resources have the same vulnerability.
Error Message and Logs
Description
The default healthcheck for standalone database resources uses
psql -U postgres -d postgres -c 'SELECT 1'with a 5-second interval. Under I/O stall conditions (e.g., cloud provider maintenance), this causes a cascading failure: each healthcheck opens a full PostgreSQL connection that never completes, piling up 12 connections/minute. Within ~8 minutes the defaultmax_connections = 100is exhausted, locking out all services. The server load spiked to 208, rendering the entire server unresponsive — including services with their own separate databases not connected to the affected instance. Neitherdocker restart,docker kill, norkill -9could recover the container — only a full server reboot resolved it.The healthcheck — which should detect problems — becomes the primary cause of the outage.
Error Message and Logs
PostgreSQL log (from dependent services and direct connection attempts):
Healthcheck processes visible on the host (
ps -efa | grep 2077, where 2077 is the postgres main PID):Each
psqlhealthcheck spawns a postgres backend in "initializing" state that never completes. These are not cleaned up because the 5-second timeout does not apply when the underlying I/O is blocked (process enters uninterruptible D-state).Steps to Reproduce
psql SELECT 1healthcheck processes accumulate every 5 seconds, each holding a connection in "initializing" statemax_connections = 100), all connection slots are consumed by healthcheck processesFATAL: sorry, too many clients alreadydocker restart/killThe hardcoded healthcheck configuration from the generated
docker-compose.yml:Example Repository URL
No response
Coolify Version
v4.0.0-beta.473
Are you using Coolify Cloud?
No (self-hosted)
Operating System and Version (self-hosted)
Ubuntu 24.04
Additional Information
Workaround attempted: Editing
/data/coolify/databases/<uuid>/docker-compose.ymldirectly does not persist — Coolify regenerates this file from its internal database on every restart/redeploy. The "Custom PostgreSQL Configuration" field in the UI applies topostgresql.conf, not to Docker healthcheck parameters.Related: The same class of problem was fixed for Nextcloud in #9440 (
fix(service): nextcloud workers exhaustion due to low interval healthcheck). The standalone database resources have the same vulnerability.