Add gradient accumulation option for bicleaner-ai training

The wiki suggests a batch size of 128 is recommended for 'stable training'.

It would be helpful to have the option to accumulate gradients so that bicleaner-ai training with larger "effective batch size" were possible on GPUs with a relatively small amount of RAM.

Fairseq calls this option "--update-freq"
Sockeye calls this option "--update-interval"