-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, heavy processing can happen in two ways:
- By providing an identifier — the BED file ID that needs to be processed.
- By running
reprocess-all— Bedboss will query all files that haven’t been processed and process them sequentially.
There are a few possible solutions for enabling parallel processing:
-
Launch multiple instances of reprocess-all.
To do this safely, we need to know exactly which files are in the processing queue. We must ensure that if two processes are running, one won’t "steal" a job (ID) from the other. In other words, we need to prevent any concurrency issues. -
Use Looper with subsamples, where each sample represents a job and subsamples are the IDs to be processed.
This approach introduces downstream challenges, such as batching the subsamples. Also, Looper isn’t designed to handle thousands of samples efficiently.
After discussion, we’re currently satisfied with the upload time. If needed, we’ll revisit and address this issue in the future.