I think to ease every RIs deployment and adoption we should create one new endpoint in every data catalogue search API (e.g. in the SciCat search-API), which takes a bearer token and triggers the whole flow (let's say /compute-weights).
Namely:
- using a json file provided to the search-API we define the collection+fields we want to take from the (scicat) DB
- the /compute-weights does all the data preprocessing (i.e. for every item in each collection extract the relevant fields and compose the body of the subsequent post to the scoring service)
- after 2, the /compute-weights posts the data to the compute service using "pss_items_url"
- after 3, the /compute-weights posts the "pss_compute_url"
2,3 and 4 could be implemented in disjoint endpoints triggered sequentially if preferred.
It's left to every facility to schedule the /compute-weights computation and do the authz/auth to scicat.