-
Notifications
You must be signed in to change notification settings - Fork 480
Description
Hi there, I have been met with a situation where using when resque-scheduler, downtime is unavoidable.
To give more context:
- resque-scheduler-version: 4.7.0 (don't judge me please)
resque.rake:
# boilerplate
Resque.schedule = YAML.load_file('config/dummy_schedule.yml')
# boilerplatedummy_schedule.yml
random_job:
cron: "* * * * *"
queue: generic
custom_job_class: "ResqueRandomJob"
description: "Runs every minuteOn a multi-scheduler setup, where I reach a situation when I have:
- 1 master
- 1 child
If the master somehow dies, there are a few seconds (based on the lock-acquiring time) where the jobs do not run at all. I tested this by killing the process kill -9 pid at XX:XX:58, the next run being at XX:XX+1:00. This means that the child must become master, but in that situation, it might take a variable amount of time, based on where in the interval the child is, to become master again.
Therefore, in that one specific minute, there will be a missed enqueued job.
I was wondering if I'm doing anything wrong in this setup;
I also have an idea for a solution, but first I would like if I'm making any mistakes in this setup.
Thank you!