-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Description
Jobs are considered "stuck" based on the timeout defined at the client level only, and not the timeout that can be set by individual workers (per Worker-level job timeouts).
Repro
- Create a worker that defines a timeout function greater than the client-level timeout
- Start a job on that worker that runs for a time greater than the client-level timeout
// configure river client to have a job timeout of 1 minute
type StuckJobArgs struct {}
func (StuckJobArgs) Kind() string {
return "stuck"
}
type StuckWorker struct {
river.WorkerDefaults[StuckJobArgs]
}
func (w *StuckWorker) Timeout(job *river.Job[StuckJobArgs]) time.Duration {
// this job has a higher timeout that the default
return 2 * time.Minute
}
func (w *StuckWorker) Work(ctx context.Context, job *river.Job[StuckJobArgs]) error {
time.Sleep(2 * time.Minute)
}In logs - note the timeout is not the custom one defined above:
WRN jobexecutor.JobExecutor: Job appears to be stuck source=river job_id=577 kind=stuck timeout=1m0s
INF producer: Producer job counts source=river num_completed_jobs=0 num_jobs_running=1 num_jobs_stuck=1 queue=default
Expected
When a worker-level timeout is defined, the job should be considered stuck once it has exeeded the worker-level timeout.
Actual
The worker-level timeout is not taken into account, and the job is considered stuck after it exceeds client-level timeout.
Thanks for your work on River!
Metadata
Metadata
Assignees
Labels
No labels