-
Notifications
You must be signed in to change notification settings - Fork 291
feat: allow opting into caching image metadata by tag #5298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for docs-kargo-io ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5298 +/- ##
==========================================
- Coverage 55.95% 55.93% -0.03%
==========================================
Files 407 409 +2
Lines 29906 29955 +49
==========================================
+ Hits 16735 16754 +19
- Misses 12200 12221 +21
- Partials 971 980 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
5da66f0 to
8e26a59
Compare
hiddeco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to what is being done in this PR, which is excellent, I wonder if this would be a good time to handle the // TODO about making the concurrency settings configurable.
We have had reports internally that adding a zero to the current settings (metadata concurrency and rate limiter) brings down the refresh time from ~15 minutes to just over one.
Yes, please 🙏🏻 This made a HUGE difference in our environment. For different reasons we're stuck to using Another potential improvement would be some connection reuse on the client (though it's interesting to see how would this work in a multi-tenant environment), as the other thing we've noticed after bumping these parameters manually is that the querying of the repositories can be very aggressive with multiple tcp connections getting recycled, and also high number of /v2/token requests for docker, etc. |
Are you referring to the rate limit (currently 20 per registry) or the number of goroutines allowed to work on pulling down metadata concurrently (currently capped at 1,000, but shared across all registries)? Assuming you're referring to the rate limit, I've always been open to making that configurable, but have a strongly held opinions about how it's approached. #1139 is the issue capturing my thoughts about this in more detail. Someone did open a PR not too long ago that was rejected because it only addressed rate limits and not the rest of #1139, which is ok, but it did so in a way that was inadequately future-proof. We'd have been unable to build a comprehensive solution to registry configuration on top of it, which would inevitably have forced a breaking change on us. I'm in no way saying that we cannot make rate limits configurable without addressing the rest of #1139. What I am saying is whatever is done to make rate limits configurable needs to not paint us into a corner in terms of how we'd approach the rest of #1139. Separate from that, I have some worry that increasing the rate limit isn't the panacea people seem to think it is. There's a natural inclination toward thinking that if you crank that way up, things just get faster. I think there's not enough weight being given to the possibility that when the client isn't self-limiting and the rate limits are being enforced server-side instead, things might actually get worse. This isn't a reason not to do it. It's just a concern. Between my concern and my opinions about there being a (somewhat involved) right way of approaching this, my inclination is to not bundle that improvement into this cache improvement. Why don't we see about getting @jessesuen's buy-in on the idea of prioritizing #1139 in v1.10.0? |
Signed-off-by: Kent Rancourt <[email protected]>
Based on @nikolay-te's report, it is both. |
I think it was probably mostly the per repo rate limit that gave us the improvement. Having said that, I see now that the caching proxies (nginx, passing through tag list operations, caching only manifests) I've setup is able to hold ALL images in our environment mostly in memory (just below 2G RAM) which makes me think this caching feature has the potential to bring similar improvements maybe even with the default rate limits, except maybe on cold starts. |
Signed-off-by: Kent Rancourt <[email protected]>
|
@nikolay-te there's a lot of interesting insight there. Thank you. Rate limit-related improvements are coming, but they won't be bundled into this PR. |
Signed-off-by: Kent Rancourt <[email protected]>
| "tag", tag, | ||
| ) | ||
|
|
||
| cacheKey := fmt.Sprintf("%s:%s", r.repoURL, tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this take the platform constraints into account? Otherwise, I think we could get mismatches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think you're right. I'll look at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will fix. Converting to draft in the meantime.
pkg/image/cache.go
Outdated
|
|
||
| func init() { | ||
| var err error | ||
| imageCache, err = cache.NewInMemoryCache[Image](100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the size be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was... there might be a commit missing from this PR. Will follow up on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ffb1e16 was missing previously. It's configurable now.
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Fixes #5076
Highlights:
Creates a new, very simple cache abstraction with one implementation based on
github.com/hashicorp/golang-lru/v2. This replaceshttps://github.com/patrickmn/go-cache.github.com/hashicorp/golang-lru/v2, being, as the name implies, an LRU cache, it is better suited to the rest of the changes in this PR.Before, each registry got its own cache. Now there's just one shared cache. Its size is configurable. By default, it holds up to 100,000 entries. Each entry is pretty small.
Container image subscriptions can now opt into using cached tags. When choosing this, all image selection strategies, except Digest, will cache image information by tag and not by digest. If one does not opt into this, things work the same as they always have.
Operators can set policies that allow/disallow using cached tags or even require cached tags. This way operators who are not confident that users are working with immutable tags can forbid tags from beings cached. Those who are confident that users ARE working with immutable tags can require tag caching.