[Bug]: Database healthcheck causes connection stampede and complete server lockout under I/O stall

### Error Message and Logs

## Description

The default healthcheck for standalone database resources uses `psql -U postgres -d postgres -c 'SELECT 1'` with a 5-second interval. Under I/O stall conditions (e.g., cloud provider maintenance), this causes a cascading failure: each healthcheck opens a full PostgreSQL connection that never completes, piling up 12 connections/minute. Within ~8 minutes the default `max_connections = 100` is exhausted, locking out all services. The server load spiked to 208, rendering the entire server unresponsive — including services with their own separate databases not connected to the affected instance. Neither `docker restart`, `docker kill`, nor `kill -9` could recover the container — only a full server reboot resolved it.

The healthcheck — which should detect problems — becomes the primary cause of the outage.

## Error Message and Logs

PostgreSQL log (from dependent services and direct connection attempts):

```
FATAL: sorry, too many clients already
```

Healthcheck processes visible on the host (`ps -efa | grep 2077`, where 2077 is the postgres main PID):

```
root  2011149  2077  0 08:47 ?  00:00:00 /usr/lib/postgresql/17/bin/psql -U postgres -d postgres -c SELECT 1
999   2011150  2077  0 08:47 ?  00:00:00 postgres: postgres postgres [local] initializing
root  2011455  2077  0 08:47 ?  00:00:00 /usr/lib/postgresql/17/bin/psql -U postgres -d postgres -c SELECT 1
999   2011456  2077  0 08:47 ?  00:00:00 postgres: postgres postgres [local] initializing
[... 144 such pairs accumulated over ~12 minutes ...]
```

Each `psql` healthcheck spawns a postgres backend in "initializing" state that never completes. These are not cleaned up because the 5-second timeout does not apply when the underlying I/O is blocked (process enters uninterruptible D-state).

### Steps to Reproduce

1. Deploy a standalone PostgreSQL database via Coolify (default healthcheck settings apply automatically)
2. Simulate an I/O stall on the database volume (or wait for cloud provider maintenance that blocks volume I/O)
3. Observe: `psql SELECT 1` healthcheck processes accumulate every 5 seconds, each holding a connection in "initializing" state
4. After ~8 minutes (with default `max_connections = 100`), all connection slots are consumed by healthcheck processes
5. All services depending on the database receive `FATAL: sorry, too many clients already`
6. Server load climbs into the hundreds; the container becomes unrecoverable via `docker restart/kill`

The hardcoded healthcheck configuration from the generated `docker-compose.yml`:

```yaml
healthcheck:
  test: ["CMD-SHELL", "psql -U postgres -d postgres -c 'SELECT 1' || exit 1"]
  interval: 5s
  timeout: 5s
  retries: 10
  start_period: 5s
```


### Example Repository URL

_No response_

### Coolify Version

v4.0.0-beta.473

### Are you using Coolify Cloud?

No (self-hosted)

### Operating System and Version (self-hosted)

Ubuntu 24.04

### Additional Information

**Workaround attempted:** Editing `/data/coolify/databases/<uuid>/docker-compose.yml` directly does not persist — Coolify regenerates this file from its internal database on every restart/redeploy. The "Custom PostgreSQL Configuration" field in the UI applies to `postgresql.conf`, not to Docker healthcheck parameters.

**Related:** The same class of problem was fixed for Nextcloud in #9440 (`fix(service): nextcloud workers exhaustion due to low interval healthcheck`). The standalone database resources have the same vulnerability.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Database healthcheck causes connection stampede and complete server lockout under I/O stall #9939

Error Message and Logs

Description

Error Message and Logs

Steps to Reproduce

Example Repository URL

Coolify Version

Are you using Coolify Cloud?

Operating System and Version (self-hosted)

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Database healthcheck causes connection stampede and complete server lockout under I/O stall #9939

Description

Error Message and Logs

Description

Error Message and Logs

Steps to Reproduce

Example Repository URL

Coolify Version

Are you using Coolify Cloud?

Operating System and Version (self-hosted)

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions