Skip to content

Fix data loss when POLLHUP races with unread kernel data#23

Open
SoloJacobs wants to merge 2 commits into
mtrojnar:masterfrom
SoloJacobs:master
Open

Fix data loss when POLLHUP races with unread kernel data#23
SoloJacobs wants to merge 2 commits into
mtrojnar:masterfrom
SoloJacobs:master

Conversation

@SoloJacobs
Copy link
Copy Markdown

Some of our customers reported that responses of our UNIX socket based back-ends were truncated. After some investigation, I found that this issue could be fairly easily reproduced with the below AI generated python script.

I found some related bug reports:

However, the issue can still be reproduced in stunnel 5.78.

For the implementation, I simply copied this implementation from src/client.c:

        if(sock_open_rd && s_poll_rdhup(c->fds, c->sock_rfd->fd) &&
                (ioctlsocket(c->sock_rfd->fd, FIONREAD, &bytes) || !bytes)) {
            s_log(LOG_INFO, "Read socket closed (read hangup)");
            sock_open_rd=0;
        }

Now my knowledge of c is somewhat limited, I only have a surface level understanding of read loop in src/client.c . The patch should thus be reviewed with caution.

"""
Regression test: stunnel must not truncate backend responses when
POLLHUP races with POLLIN on the backend unix socket.
"""

import contextlib
import os
import socket
import ssl
import subprocess
import threading
import time
from collections.abc import Generator

import pytest

_RUNFILES = os.environ.get("RUNFILES_DIR", "")
_WS = "_main"

STUNNEL_BIN = os.path.join(_RUNFILES, _WS, "omd/packages/stunnel/bin_runpath/stunnel")
OPENSSL_LIBS = os.path.join(_RUNFILES, _WS, "omd/packages/openssl/lib")

_NUM_LINES = 10_000
_LINE = "host_{n:05d};plugin_output_padding_to_stress_the_pollhup_race_xxxx\n"
_PAYLOAD = b"".join(_LINE.format(n=i).encode() for i in range(_NUM_LINES))


def _backend(sock_path: str, ready: threading.Event, stop: threading.Event) -> None:
    """Mock CMC: for each connection, discard the query then send _PAYLOAD and close."""
    srv = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    srv.bind(sock_path)
    srv.listen(64)
    srv.settimeout(0.2)
    ready.set()
    while not stop.is_set():
        try:
            conn, _ = srv.accept()
        except socket.timeout:
            continue
        with conn:
            conn.settimeout(0.2)
            try:
                conn.recv(4096)
            except (socket.timeout, OSError):
                pass
            conn.sendall(_PAYLOAD)
            # close immediately — this triggers the POLLHUP/POLLIN race in stunnel
    srv.close()


def _make_cert(path: str) -> None:
    subprocess.check_call(
        [
            "openssl",
            "req",
            "-x509",
            "-newkey",
            "rsa:2048",
            "-keyout",
            path,
            "-out",
            path,
            "-days",
            "1",
            "-nodes",
            "-subj",
            "/CN=test",
        ],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )


@contextlib.contextmanager
def _stunnel(tmp: str, cert: str, tls_sock: str, backend_sock: str) -> Generator[None, None, None]:
    pid_path = os.path.join(tmp, "stunnel.pid")
    conf_path = os.path.join(tmp, "stunnel.conf")
    with open(conf_path, "w") as f:
        f.write(
            f"cert           = {cert}\n"
            f"pid            = {pid_path}\n"
            f"output         = {tmp}/stunnel.log\n"
            "syslog         = no\n"
            "debug          = 5\n"
            "foreground     = no\n"
            "sslVersionMin  = TLSv1.2\n"
            "socket         = l:TCP_NODELAY=1\n"
            "socket         = r:TCP_NODELAY=1\n"
            "client         = no\n"
            "[live-tls]\n"
            f"accept  = {tls_sock}\n"
            f"connect = {backend_sock}\n"
        )
    env = {**os.environ, "LD_LIBRARY_PATH": OPENSSL_LIBS}
    subprocess.check_call([STUNNEL_BIN, conf_path], env=env)
    for _ in range(50):
        if os.path.exists(tls_sock):
            break
        time.sleep(0.05)
    else:
        raise RuntimeError("stunnel TLS socket never appeared")
    try:
        yield
    finally:
        try:
            with open(pid_path) as f:
                os.kill(int(f.read().strip()), 15)
        except (OSError, ValueError):
            pass


def _query(tls_sock: str) -> bytes:
    ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    raw = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    raw.connect(tls_sock)
    with ctx.wrap_socket(raw) as s:
        s.sendall(b"GET services\nColumns: host_name plugin_output\n\n")
        buf = bytearray()
        while chunk := s.read(65536):
            buf += chunk
    return bytes(buf)


def test_no_truncation(tmp_path: pytest.TempPathFactory) -> None:
    tls_sock = str(tmp_path / "live-tls.sock")
    backend_sock = str(tmp_path / "live.sock")
    cert = str(tmp_path / "cert.pem")

    _make_cert(cert)

    ready, stop = threading.Event(), threading.Event()
    t = threading.Thread(target=_backend, args=(backend_sock, ready, stop), daemon=True)
    t.start()
    assert ready.wait(timeout=5), "backend socket never appeared"

    failures: list[tuple[int, int]] = []
    try:
        with _stunnel(str(tmp_path), cert, tls_sock, backend_sock):
            for run in range(1, 31):
                data = _query(tls_sock)
                if data != _PAYLOAD:
                    failures.append((run, len(data)))
    finally:
        stop.set()
        t.join(timeout=2)

    assert not failures, "{}/{} queries were truncated (want {} bytes each):\n{}".format(
        len(failures),
        30,
        len(_PAYLOAD),
        "\n".join(f"  run {r}: got {n} bytes" for r, n in failures),
    )
    ```

SoloJacobs added 2 commits May 5, 2026 20:57
When a backend (unix-socket) peer closes its connection, Linux can
report POLLHUP without simultaneously reporting POLLIN, even though
there is still data waiting in the kernel receive buffer.

The remaining data in the buffer should be transmitted,
quoting from https://man7.org/linux/man-pages/man2/poll.2.html

> POLLHUP
>
> Hang up (only returned in revents; ignored in events). Note that when
> reading from a channel such as a pipe or a stream socket, this event
> merely indicates that the peer closed its end of the channel. Subsequent
> reads from the channel will return 0 (end of file) only after all
> outstanding data in the channel has been consumed.

The already-correct "Read socket closed (read hangup)" path guards the
same close decision with an ioctlsocket(FIONREAD) check. This patch
simply applies the guard to s_poll_hup().
Previously, `stunnel` would simply abandon the data in the read buffer,
once a `POLLHUP` signal is received. With
`0001-fix-hup-race-data-loss.patch` we attempt to send the remaining
data. In particular, we require for the `transfer()` loop to make
progress, even if the socket is closed.

The following issue could occur:

If the backend has closed the connection while both the read and TLS
send buffers are full, `stunnel` would still register the backend socket
with `poll()`. `poll()` returns immediately on every iteration due to
`POLLHUP`. This would exhausts the max iteration number in the
`transfer()` loop. Subsequently, the data in the read buffer would be
abandoned.

`s_poll_add` opts into `POLLRDHUP` by explicitly setting it in `events`
when the buffer has room to receive data. This way TCP sockets avoid the
busy loop and the associated data loss. `POLLHUP` has no equivalent
mechanism: https://man7.org/linux/man-pages/man2/poll.2.html

> The bits returned in revents can include any of those specified in
> events, or one of the values POLLERR, POLLHUP, or POLLNVAL.

The fix: We omit the call to `s_poll_add` completely if the read buffer
is full. This way the `transfer()` loop will block until the TLS send
buffer has room, allowing data to drain before polling the backend
again.

Any socket errors on the backend fd are detected on the next iteration
once sock_ptr has drained below BUFFSIZE and the fd is polled again.
@SoloJacobs
Copy link
Copy Markdown
Author

So, I found the previous patch did not fix all the instances of truncation. Thus, I have added another patch. With this one our live instance no longer looses any data. Of course, there might be more issues, which are just not able to reproduce.

Kind regards

CheckmkCI pushed a commit to Checkmk/checkmk that referenced this pull request May 12, 2026
…esponses

When a unix-socket backend (CMC) closes its connection immediately after
writing the last chunk, the Linux kernel can report POLLHUP on the
stunnel side before reporting POLLIN, even though data is still sitting
in the socket receive buffer. stunnel then silently discarded the
buffered data.

The POLLRDHUP path already has a guard in place:

```c
/* http://marc.info/?l=linux-man&m=128002066306087 */
/* readsocket() must be the last sock_rfd operation before FIONREAD */
if(sock_open_rd && s_poll_rdhup(c->fds, c->sock_rfd->fd) &&
        (ioctlsocket(c->sock_rfd->fd, FIONREAD, &bytes) || !bytes)) {
    s_log(LOG_INFO, "Read socket closed (read hangup)");
    sock_open_rd=0;
}
```

This guard is the reason why this bug is specific to unix sockets: For a
unix socket stunnel receives a POLLHUP signal rather than a POLLRDHUP
signal. The patch simply copies that guard to the `POLLHUP` path.

An upstream PR was opened: mtrojnar/stunnel#23

stunnel was updated to the latest version 5.78.

SUP-26589

Change-Id: I5bec48e79a3bd938dd8941591bac3b64eea8e319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant