Skip to content

Conversation

@Fruchix
Copy link

@Fruchix Fruchix commented Oct 30, 2024

Description

I've added the possibility to "inject" two different functions into _job_pool_worker. One will be run before the job begins, and the other one will be run after. These functions are optional (defaults to ""), but if provided will be called for each job.

Motivation

This improves the job management, as the function job_pool_wait waits for ALL queued jobs to finish.
We can inject functions that add new jobs according to how/which previous jobs finished, kill all jobs if one fails, add our own logging, etc. (and this, dynamically, without waiting for all queued jobs to finish).
I think there is a wide range of possibilities.

Implementation

I added two new parameters to job_pool_init:

  • job_pool_function_pre_job (optional) a function that will be called before each job
  • job_pool_function_post_job (optional) a function that will be called after each job.

Those variables can't be local variables, passed as parameters to _job_pool_start_workers then _job_pool_worker as job_pool_wait would need to pass those two functions again to _job_pool_start_workers.
Those functions are aware of _job_pool_worker's local variables and job_pool.sh's global variables, allowing us to use them (see code sample).

Further implementation?

I think job_pool_wait could accept parameters to redefine the two functions after waiting for each previous job to finish, but it is just an idea and I didn't investigate more in that direction.

Code Sample

Sample program demonstrating function injection
#!/bin/bash

. job_pool.sh

#####################################################
# Demonstration of function injection into each job #
#####################################################

echo "Demonstration of function injection into each job:"

# sleep some time ($1) then echo something ($2)
function sleep_n_echo()
{
    sleep "$1"
    echo "$2"
}

# Injected function that will be called before each job
# 
# Print which worker is starting which job
function print_starting_job()
{
    echo " # _job_pool_worker-${id}: Starting job: ${cmd} $(echo "${args[@]}" | xargs | tr '\v' ' ')"
}

# Injected function that will be called afetr each job
# 
# Kill all workers if the local variable "result" from _job_pool_worker
# indicates that the job failed
function kill_workers()
{
    echo " # _job_pool_worker-${id}: Finished job: ${cmd} $(echo "${args[@]}" | xargs | tr '\v' ' ')"

    # result is undefined in this script, but will be defined when
    # the function is injected in _job_pool_worker
    if [[ "${result}" != "0" ]]; then
        # get the pids of all workers:
        # - each worker's process is named after the current script (here, job_pool_sample.sh),
        #   so we use this name to get the pids
        # - we do not include the current script's pid ($$) as it is not a worker,
        #   (we do not want to kill the script itself, only the workers)
        local workers_pids=("$(pgrep -f "$0" | grep -v $$)")
        kill ${workers_pids[@]} &> /dev/null &
    fi
}

# allow 3 parallel jobs, and kill all jobs at the first fail using "kill_workers" function
job_pool_init 3 0 print_starting_job kill_workers

# simulate 3 jobs, where one fails before the others are finished, and interrupts the others
job_pool_run sleep_n_echo 3 a   # job 1
job_pool_run /bin/false         # job 2
job_pool_run sleep_n_echo 3 b   # job 3

# the job 2 will kill all other running workers, using the function "kill_workers"
# (that is ran after processing each job)

job_pool_shutdown

echo -e "\nOnly the failed job exited, the others did not because they were canceled."

The function must be passed as 3rd parameter to job_pool_init.
This function is called at the end of each job.
The function is aware of each local variable from _job_pool_worker and each global variable of job_pool.sh, meaning its definition can contain references to those variables, even though then do not exist in the current program (that uses job_pool.sh).
Two different functions can now be injected, one that will be run before each job, and one after.

If only a post job function is wanted, the pre job function should be "" (in order for the post job function to be the fourth parameter).

Updated the demonstration for pre and post functions injection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant