Skip to content

SVT-AV1 EOS-flush watchdog trip should surface as an explicit node failure #539

@staging-devin-ai-integration

Description

Follow-up from #537 (Devin Review).

The bounded EOS-flush idle watchdog added in #537 abandons a stalled SVT-AV1 codec task after 30s of idle, logs an error, and lets the node finalize whatever output was already produced. However run_encoder still emits a normal Stopped("input_closed") state event and returns Ok(()), so a caller cannot programmatically distinguish a clean flush from a truncated one — and a truly stuck spawn_blocking OS thread is detached (encoder handle intentionally leaked), so repeated trips could leak native resources.

For the rare-flake fix in #537 this is an acceptable, logged tradeoff (completing the request with partial output beats hanging to the 300s client timeout). But the watchdog trip should ideally be propagated out of drain_codec_results / codec_forward_loop so callers and state events can mark the run as degraded/failed rather than a successful encode.

This changes the node-failure contract, so it deserves its own discussion separate from the flake fix.

Refs:

  • crates/nodes/src/codec_utils.rs (watchdog branch)
  • crates/nodes/src/video/encoder_trait.rs (run_encoder always returns Ok)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions