Leafnode stale/reconnect cycle under high traffic load

### Observed behavior

When hub-to-leaf traffic exceeds link capacity, the hub's write buffer saturates. PONG responses queue behind data and never reach the leaf. The leaf detects stale, disconnects, reconnects, and hits the same backlog. Cycle repeats indefinitely - data never transfers.


- NATS Server 2.12.2 (also tested on 2.10.22)
- Bandwidth-constrained leafnode link (~50kbit/s, simulating cellular)



Hub logs:

```
[INF] Slow Consumer Detected: WriteDeadline of 10s exceeded with 1 chunks of 2025807 total bytes.
[ERR] Leafnode Error 'Stale Connection'
```

Leaf logs (note cycling lid numbers):

```
[DBG] hub:7422 - lid:6 - Stale Client Connection - Closing
[INF] hub:7422 - lid:6 - Leafnode connection closed: Stale Connection - Remote: hub
[DBG] hub:7422 - lid:13 - Stale Client Connection - Closing
[INF] hub:7422 - lid:14 - Leafnode connection closed: Stale Connection - Remote: hub
```

## Root Cause

Hub's write buffer saturation blocks PONG delivery:

1. Hub's write buffer fills with data (hub -> leaf direction)
2. Hub receives PING from leaf, queues PONG immediately (32us later)
3. PONG stuck behind data - never delivered
4. Leaf detects stale (no PONG after `ping_max` attempts), closes connection
5. Leaf reconnects, hits same backlog, cycle repeats

## Verified Evidence

| Metric                 | Value       |
| ---------------------- | ----------- |
| LMSGs queued by hub    | 1917        |
| LMSGs received by leaf | 139 (7%)    |
| PONGs sent by hub      | 3           |
| PONGs received by leaf | 1 (33%)     |
| Stale events on leaf   | 8           |
| Connection cycles      | 5           |
| Buffer backlog         | ~29 seconds |

With `ping_interval: 10s` and `ping_max: 2`, stale triggers in 20 seconds - but 29 seconds of data sits ahead of the PONG.

## Affected Scenarios

Any high-volume hub-to-leaf traffic over a slow link:

- JetStream mirrors/sources (bulk historical replication)
- Fan-out (hub publishes to leaf subscribers)
- Large request/reply responses
- KV replication

### Expected behavior

Leafnode connection should remain stable during high-volume replication. PING/PONG keepalive should function regardless of data backlog, allowing slow links to eventually drain.

### Server and client version

Server v2.12.2
Client Leafnode also v2.12.2

### Host environment

Bandwidth-constrained leafnode link (~50kbit/s, simulating cellular)

Reproduced in linux docker containers on MacOS ARM 

### Steps to reproduce

Will link repro repo in a comment.

```bash
docker compose up -d && sleep 5

# Create stream with 50MB test data
nats -s nats://app:app@127.0.0.1:4222 stream add TEST \
  --subjects="test.>" --storage=file --replicas=1 --defaults
nats -s nats://app:app@127.0.0.1:4222 pub test.load \
  "$(head -c 1024 < /dev/zero | tr '\0' 'x')" --count=50000

# Apply traffic shaping (50kbit/s)
docker exec nats-hub tc qdisc add dev eth0 root tbf rate 50kbit burst 16kbit latency 100ms
docker exec nats-leaf tc qdisc add dev eth0 root tbf rate 50kbit burst 16kbit latency 100ms

# Create mirror (triggers replication)
cat > mirror.json << 'EOF'
{"name":"TEST_MIRROR","storage":"file","mirror":{"name":"TEST","external":{"api":"$JS.hub.API"}}}
EOF
nats -s nats://app:app@127.0.0.1:4223 stream add --config=mirror.json

# Watch for stale events (~30-60 seconds)
watch -n5 'docker logs nats-hub 2>&1 | grep -E "Stale|Slow" | tail -5'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Leafnode stale/reconnect cycle under high traffic load #7615

Observed behavior

Root Cause

Verified Evidence

Affected Scenarios

Expected behavior

Server and client version

Host environment

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
LMSGs queued by hub	1917
LMSGs received by leaf	139 (7%)
PONGs sent by hub	3
PONGs received by leaf	1 (33%)
Stale events on leaf	8
Connection cycles	5
Buffer backlog	~29 seconds

Uh oh!

Leafnode stale/reconnect cycle under high traffic load #7615

Description

Observed behavior

Root Cause

Verified Evidence

Affected Scenarios

Expected behavior

Server and client version

Host environment

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions