Skip to content

Harden SIGHUP reload path#55

Open
etbyrd wants to merge 2 commits into
mainfrom
safe-config-reload
Open

Harden SIGHUP reload path#55
etbyrd wants to merge 2 commits into
mainfrom
safe-config-reload

Conversation

@etbyrd
Copy link
Copy Markdown
Member

@etbyrd etbyrd commented Apr 25, 2026

Summary

Follow-up to #50, fixing four issues in the config-file + SIGHUP reload path.

  • Signal handler is now flag-only. SIGHUP sets a threading.Event; a daemon config-reloader thread waits on it and runs the reload off the signal stack. The previous handler did file I/O, JSON parsing, and logger.info directly, which could stall or deadlock against the Loki/file logging handler locks held by a worker thread mid-emit.
  • Guard rejection no longer drops unrelated edits. When a reload trips the mode-switch or missing-secret guard, the live webhook fields are preserved but the other reloadable values (storage, filtering lists, spoof protection) still apply. Previously the whole reload was rejected, so editing allowed_recipients and accidentally breaking the webhook in the same file silently kept the recipient change from taking effect.
  • storage_upload_threshold parsing is robust. Bad values from env or config no longer crash startup; _safe_int logs a warning and falls back to the default.
  • Warn on world-readable config file. The file may contain webhook_secret and storage_key; we now log a warning at load time if mode has any group/world bits set, recommending chmod 0600.

Why

These were the remaining major issues from review of #50. The signal-handler refactor closes a latent deadlock; the guard fix prevents an operational footgun where ops edits silently no-op; the int parsing and file-mode warning are small, localized hardening.

Test plan

  • Existing tests still pass (152 total, including 9 new).
  • _sighup_handler only sets the event.
  • _reload_worker consumes the event and calls reload_config.
  • Mode-switch reload preserves webhook, applies a non-webhook edit in the same file.
  • Missing-secret reload preserves webhook, applies a non-webhook edit in the same file.
  • _safe_int returns the default for None, non-numeric strings, and lists.
  • World-readable (0644) config file logs the warning; owner-only (0600) does not.
  • Manual: kill -HUP <pid> against a running milter; verify reload occurs and the signal handler returns immediately.

Notes

  • README / docker-compose.yml mount / dead RELOADABLE_KEYS cleanup are intentionally not in this PR; they are separate.
  • The deferred-reload thread coalesces rapid SIGHUPs into one reload, which matches standard expectations for SIGHUP-driven config reload.

- Move signal handler to flag-only; defer reload work to a daemon
  thread so file I/O and logger calls don't run on the signal stack
  (avoids potential deadlock with the Loki/file logging handler locks).

- When a SIGHUP reload trips the mode-switch or missing-secret guard,
  preserve the live webhook fields but still apply the other reloadable
  values. Previous behaviour silently dropped unrelated edits made in
  the same file.

- Replace int(...) on storage_upload_threshold with _safe_int helper
  that warns and falls back to default instead of crashing startup.

- Warn at config-file load time if the file is group/world-readable,
  since it may contain webhook_secret and storage_key.

- Tests: nine new cases covering the handler/worker split, the
  preserve-webhook reload path, _safe_int, and file-mode warnings.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR hardens the SIGHUP reload path with four targeted fixes: moving heavy work off the signal-handler stack into a daemon thread, converting guard-on-reject to guard-on-preserve-webhook so unrelated edits are never silently dropped, adding _safe_int to prevent startup crashes on bad storage_upload_threshold values, and warning on world-readable config files. The previous review concerns about logger.exception on reload failure and orphaned test threads are both fully addressed.

Confidence Score: 5/5

Safe to merge; all four stated fixes are correctly implemented with no P0/P1 issues found.

The signal-handler refactor is sound (set-only handler, daemon thread consuming the event), the partial-preserve logic correctly reconstructs ReloadableConfig with all fields accounted for, _safe_int handles all edge cases tested, and the file-permission warning is non-fatal. The only remaining finding is a P2 log-spam concern when world-readable permissions go unfixed over multiple SIGHUPs.

No files require special attention.

Important Files Changed

Filename Overview
milter/primitivemail_milter.py Adds flag-only SIGHUP handler, daemon reloader thread, partial-preserve webhook guard, _safe_int helper, and world-readable config warning; logic is correct and well-scoped
test_milter.py Updates tests to match new partial-preserve semantics, adds TestSighupHandler (uses _run_one_reload to avoid orphan threads), TestSafeInt, and TestConfigFilePerms; coverage is thorough

Sequence Diagram

sequenceDiagram
    participant OS as OS (SIGHUP)
    participant Main as Main Thread
    participant Event as _reload_event
    participant Worker as config-reloader thread
    participant Config as reload_config()

    OS->>Main: SIGHUP delivered
    Main->>_sighup_handler: invoke handler
    _sighup_handler->>Event: set()
    _sighup_handler-->>Main: return immediately

    Note over Worker: blocking on Event.wait()
    Event-->>Worker: unblocks
    Worker->>Event: clear()
    Worker->>Config: reload_config()
    Config->>Config: _read_config_file()
    Config->>Config: _apply_config(reloadable_only=True)
    alt webhook guard tripped
        Config->>Config: preserve old webhook fields, apply other reloadable values
    else guard OK
        Config->>Config: swap _rcfg atomically (GIL)
    end
    Config-->>Worker: return
    Worker->>Worker: loop back to wait()
Loading

Reviews (2): Last reviewed commit: "Address lint and Greptile review" | Re-trigger Greptile

Comment thread milter/primitivemail_milter.py Outdated
Comment thread test_milter.py
- Drop the redundant inline `import threading` in main(); the module-level
  import means ruff's F823 was treating the second binding as making
  threading a function-local before its first use.
- Use logger.exception in _reload_worker so failed reloads surface a
  full traceback instead of just the message.
- Extract _run_one_reload so the worker test can drive a single iteration
  on a thread that exits cleanly, rather than orphaning a daemon thread
  that might wake up on _reload_event in a later test.
@jetpham jetpham self-requested a review April 25, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant