Skip to content

Conversation

@FabiPi3
Copy link
Contributor

@FabiPi3 FabiPi3 commented Nov 17, 2025

I have a use case where I propagate a large dictionary with many sub-dictionaries and many keys to the output of a job. I don't have full control over this dictionary as it is part of the user input. I use the addtional stores feature to store large parts of my output into additional stores. Now it could happen by chance that one of my additional store keys matches with one of the keys in the user-provided dictionary. Than this would be put into the additional store as well, which is not what I want.

A simple example showing this would be:

from jobflow import job, run_locally, JobStore
from maggma.stores import MemoryStore

store = JobStore(MemoryStore(), additional_stores={"data": MemoryStore()})

@job(data=["ab"])
def partial_data():
    return {"xyz": {"ab": 1}, "cd": 4, "xml": {"ab": 3}}

partial_job = partial_data()
run_locally(partial_job, store=store)
print(store.query_one({"uuid": partial_job.uuid})["output"])

You will see two blob_uuids in the output. But I only want a single one.

The option to do this I propose is to give the full path directly as a list:

@job(data=[["output", "xyz", "ab"]])
def partial_data():
    return {"xyz": {"ab": 1}, "cd": 4, "xml": {"ab": 3}}

partial_job = partial_data()
run_locally(partial_job, store=store)
print(store.query_one({"uuid": partial_job.uuid})["output"])

Now only the given path is put into the additional store and the other one not.

TODO

  • Should the output part in the list be omitted and prepended automatically?
  • One could think about doing something similar for the load procedure.

@FabiPi3
Copy link
Contributor Author

FabiPi3 commented Dec 1, 2025

Pinging @utf @gpetretto @davidwaroquiers. Any opinion on this? Thanks.

@davidwaroquiers
Copy link
Contributor

Hi @FabiPi3

I think it's interesting to have such an option. One thing that comes to mind is that maybe we could use mongo's dot notation instead of a list ? In your example, the decorator would be:

@job(data=["output.xyz.ab"])

I haven't looked at the implementation but just commented from the user's perspective at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants