Skip to content

ci(runpod): auto-update template image after worker push#65

Merged
grossir merged 3 commits into
mainfrom
55-featci-automate-runpod-template-update-on-new-image-release
May 18, 2026
Merged

ci(runpod): auto-update template image after worker push#65
grossir merged 3 commits into
mainfrom
55-featci-automate-runpod-template-update-on-new-image-release

Conversation

@quevon24
Copy link
Copy Markdown
Member

Summary

After the Build and Push RunPod Worker workflow uploads a new image to Docker Hub, it PATCHes the RunPod template's imageName to the SHA-pinned tag via rest.runpod.io/v1/templates/<id>. Next cold-start workers pull the new image automatically, replacing the manual "edit image tag in the RunPod UI" step.

Setup before merging

RunPod

  1. Console → Settings → API Keys → Create API Key with Restricted permission. Set GraphQL: Read/Write and leave AI API: None. Copy the key.

    Why GraphQL R/W: RunPod exposes only two scopes on restricted keys, GraphQL (the management plane covering templates, endpoints, pods, secrets) and AI API (per-endpoint scope, only for invoking serverless jobs). Updating a template is a management operation, so it requires GraphQL R/W; the REST /v1/templates/<id> route used by the workflow is a thin shim over the GraphQL saveTemplate mutation internally. There is no per-template scope, so GraphQL R/W is the tightest setting that lets us do this. It does, however, grant full management of every template, endpoint, and pod on the account, so treat the key as a production credential: store only in GitHub Actions secrets, never echo to logs, and rotate if it leaks.

  2. Find the template ID. Endpoints created manually still have a backing template, hidden from the default listing, surface it with the includeEndpointBoundTemplates flag:

    curl -sS -H "Authorization: Bearer <API_KEY>" \
      "https://rest.runpod.io/v1/templates?includeEndpointBoundTemplates=true"

    Find the entry where isServerless: true and name matches your endpoint (e.g. Blackletter gpu worker). Its id is the template ID.

GitHub
3. Repo → Settings → Secrets and variables → Actions → New repository secret. Add both:

  • RUNPOD_API_KEY from step 1
  • RUNPOD_TEMPLATE_ID from step 2

Verifying after merge

Run the workflow manually (Actions → Build and Push RunPod Worker → Run workflow) and confirm the Update RunPod template image step logs HTTP 200. The template in the RunPod UI should then show the new freelawproject/blackletter-gpu-worker:<sha> tag, and the next job dispatched to the endpoint will cold-start on the new image.

Notes

  • This relies on Min Workers = 0: warm workers drain on the 5-min idle timeout and the next job picks up the new image. If we raise Min Workers > 0 later, we'll need to add an explicit roll step to force existing workers to recycle.
  • The workflow still triggers only on changes under scanning/runpod/** (plus workflow_dispatch), unchanged from before.

Closes #55

@quevon24 quevon24 linked an issue May 11, 2026 that may be closed by this pull request
@quevon24 quevon24 marked this pull request as ready for review May 15, 2026 22:46
@quevon24
Copy link
Copy Markdown
Member Author

Tested the new step via workflow_dispatch from this branch:

  • Without secrets set: workflow ran end-to-end and the step failed cleanly with HTTP 400 and RunPod's "templateId is required" error, confirming the error path, status parsing, and ::error:: annotation all behave as expected.
  • With personal RUNPOD_API_KEY and RUNPOD_TEMPLATE_ID: step returned HTTP 200, the response showed imageName updated to the new SHA, and all other template fields (containerDiskInGb, volumeMountPath, startSsh, etc.) were preserved, confirming RunPod's PATCH is partial-update semantics. GitHub Actions also masked the templateId in the logged response body.

Then removed the personal secrets. Ready for the production RUNPOD_API_KEY / RUNPOD_TEMPLATE_ID to be added before merge.

@quevon24 quevon24 requested a review from grossir May 15, 2026 22:50
@quevon24 quevon24 moved this to PRs to Review in Sprint (Case Law) May 15, 2026
Copy link
Copy Markdown
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@grossir grossir merged commit 1192c16 into main May 18, 2026
8 checks passed
@github-project-automation github-project-automation Bot moved this from PRs to Review to Done in Sprint (Case Law) May 18, 2026
@grossir
Copy link
Copy Markdown
Contributor

grossir commented May 18, 2026

@quevon24 Both RUNPOD_API_KEY and RUNPOD_TEMPLATE_ID are missing from the repo secrets. Should we open a ticket on the infra repo? Or can you add the values? I have permissions to add secrets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

feat(ci): automate RunPod template update on new image release

2 participants