Harden SIGHUP reload path#55
Conversation
- Move signal handler to flag-only; defer reload work to a daemon thread so file I/O and logger calls don't run on the signal stack (avoids potential deadlock with the Loki/file logging handler locks). - When a SIGHUP reload trips the mode-switch or missing-secret guard, preserve the live webhook fields but still apply the other reloadable values. Previous behaviour silently dropped unrelated edits made in the same file. - Replace int(...) on storage_upload_threshold with _safe_int helper that warns and falls back to default instead of crashing startup. - Warn at config-file load time if the file is group/world-readable, since it may contain webhook_secret and storage_key. - Tests: nine new cases covering the handler/worker split, the preserve-webhook reload path, _safe_int, and file-mode warnings.
Greptile SummaryThis PR hardens the SIGHUP reload path with four targeted fixes: moving heavy work off the signal-handler stack into a daemon thread, converting guard-on-reject to guard-on-preserve-webhook so unrelated edits are never silently dropped, adding Confidence Score: 5/5Safe to merge; all four stated fixes are correctly implemented with no P0/P1 issues found. The signal-handler refactor is sound (set-only handler, daemon thread consuming the event), the partial-preserve logic correctly reconstructs ReloadableConfig with all fields accounted for, _safe_int handles all edge cases tested, and the file-permission warning is non-fatal. The only remaining finding is a P2 log-spam concern when world-readable permissions go unfixed over multiple SIGHUPs. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant OS as OS (SIGHUP)
participant Main as Main Thread
participant Event as _reload_event
participant Worker as config-reloader thread
participant Config as reload_config()
OS->>Main: SIGHUP delivered
Main->>_sighup_handler: invoke handler
_sighup_handler->>Event: set()
_sighup_handler-->>Main: return immediately
Note over Worker: blocking on Event.wait()
Event-->>Worker: unblocks
Worker->>Event: clear()
Worker->>Config: reload_config()
Config->>Config: _read_config_file()
Config->>Config: _apply_config(reloadable_only=True)
alt webhook guard tripped
Config->>Config: preserve old webhook fields, apply other reloadable values
else guard OK
Config->>Config: swap _rcfg atomically (GIL)
end
Config-->>Worker: return
Worker->>Worker: loop back to wait()
Reviews (2): Last reviewed commit: "Address lint and Greptile review" | Re-trigger Greptile |
- Drop the redundant inline `import threading` in main(); the module-level import means ruff's F823 was treating the second binding as making threading a function-local before its first use. - Use logger.exception in _reload_worker so failed reloads surface a full traceback instead of just the message. - Extract _run_one_reload so the worker test can drive a single iteration on a thread that exits cleanly, rather than orphaning a daemon thread that might wake up on _reload_event in a later test.
Summary
Follow-up to #50, fixing four issues in the config-file + SIGHUP reload path.
threading.Event; a daemonconfig-reloaderthread waits on it and runs the reload off the signal stack. The previous handler did file I/O, JSON parsing, andlogger.infodirectly, which could stall or deadlock against the Loki/file logging handler locks held by a worker thread mid-emit.allowed_recipientsand accidentally breaking the webhook in the same file silently kept the recipient change from taking effect.storage_upload_thresholdparsing is robust. Bad values from env or config no longer crash startup;_safe_intlogs a warning and falls back to the default.webhook_secretandstorage_key; we now log a warning at load time if mode has any group/world bits set, recommendingchmod 0600.Why
These were the remaining major issues from review of #50. The signal-handler refactor closes a latent deadlock; the guard fix prevents an operational footgun where ops edits silently no-op; the int parsing and file-mode warning are small, localized hardening.
Test plan
_sighup_handleronly sets the event._reload_workerconsumes the event and callsreload_config._safe_intreturns the default forNone, non-numeric strings, and lists.kill -HUP <pid>against a running milter; verify reload occurs and the signal handler returns immediately.Notes
docker-compose.ymlmount / deadRELOADABLE_KEYScleanup are intentionally not in this PR; they are separate.