-
Notifications
You must be signed in to change notification settings - Fork 640
Add download checkpoint functionality #464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a25634a to
e4dfd1a
Compare
|
This looks pretty good so far. Could you also add support for deploying to W&B Inference? That might inform changes we need to make to the API design. |
|
Added a function to upload to W&B inference. |
src/art/local/backend.py
Outdated
| if os.path.exists(original_checkpoint_dir): | ||
| if local_path is not None: | ||
| # Copy from original location to custom location | ||
| target_checkpoint_dir = os.path.join(local_path, f"{resolved_step:04d}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer just copying the checkpoint to whatever local_path is, so that the developer can just add that path to the artifact directly.
src/art/local/backend.py
Outdated
| ) | ||
|
|
||
| # Check if checkpoint exists in original location | ||
| if os.path.exists(original_checkpoint_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now a lot of logic goes into a single if statement with duplicated code:
1. Check if checkpoint exists locally
1. If it does, check if local path is provided
1. If it is, copy from original checkpoint dir to local path and return local path
2. If not, just return original checkpoint dir
2. If it does not, check if local path is provided
1. If it is, download from s3 and save to local path and return local path
2. If not, download from s3 and return original checkpoint dir
I think there would be fewer lines of code and they would be easier to read if we broke this into two if statements:
1. Check if checkpoint exists locally
1. If it does, continue
2. If not, download checkpoint from s3 to original checkpoint dir (default location)
2. Check if local path is provided
1. If it is, copy checkpoint from original checkpoint dir to local path
2. If not, return original checkpoint dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it looks better! Updated.
src/art/local/backend.py
Outdated
| pull_s3: bool = True, | ||
| wait_for_completion: bool = True, | ||
| ) -> LoRADeploymentJob: | ||
| pull_checkpoint: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's actually not let the user pull the checkpoint as part of the deployment function. They should just use _experimental_pull_model_checkpoint for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could you deprecate _experimental_deploy on all the backends? We want users to just call deploy_model directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should remove the _experimental_deploy function entirely as its only purpose was to combine the _experimental_pull_model_checkpoint and deploy_model utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose removing altogether is probably fine, that's why we marked these functions as _experimental
arcticfly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for taking care of this!
Add functionality that allows downloading a checkpoint to a local path.
Add a method that deletes a model from the serverless backend.