Skip to content

Stuck job evaluation doesn't take into account worker-level job timeouts #1125

@alexpls

Description

@alexpls

Jobs are considered "stuck" based on the timeout defined at the client level only, and not the timeout that can be set by individual workers (per Worker-level job timeouts).

Repro

  1. Create a worker that defines a timeout function greater than the client-level timeout
  2. Start a job on that worker that runs for a time greater than the client-level timeout
// configure river client to have a job timeout of 1 minute

type StuckJobArgs struct {}

func (StuckJobArgs) Kind() string {
	return "stuck"
}

type StuckWorker struct {
	river.WorkerDefaults[StuckJobArgs]
}

func (w *StuckWorker) Timeout(job *river.Job[StuckJobArgs]) time.Duration {
    // this job has a higher timeout that the default
	return 2 * time.Minute
}

func (w *StuckWorker) Work(ctx context.Context, job *river.Job[StuckJobArgs]) error {
	time.Sleep(2 * time.Minute)
}

In logs - note the timeout is not the custom one defined above:

WRN jobexecutor.JobExecutor: Job appears to be stuck source=river job_id=577 kind=stuck timeout=1m0s
INF producer: Producer job counts source=river num_completed_jobs=0 num_jobs_running=1 num_jobs_stuck=1 queue=default

Expected

When a worker-level timeout is defined, the job should be considered stuck once it has exeeded the worker-level timeout.

Actual

The worker-level timeout is not taken into account, and the job is considered stuck after it exceeds client-level timeout.


Thanks for your work on River!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions