runtime-tools: log container adjustments.#268
runtime-tools: log container adjustments.#268klihub wants to merge 2 commits intocontainerd:mainfrom
Conversation
a6cac4e to
c0863cc
Compare
c0863cc to
9953d06
Compare
|
@mikebrow @samuelkarp PTAL |
|
@samuelkarp I have a few questions. Related to the approach taken here, is this roughly what you had in mind ? Related to event details, how detailed events do we want to log, and do we want to log them unconditionally ? The PR now logs unconditionally and detailed events, except in a few extreme cases where details could get really verbose. But should it be configurable, or should details be logged at a different logging level ? About the logged events/messages. The main messages are now exposed consts, with the idea that someone might want to build some tooling where it can come handy to have them exported. But I don't know if this really makes sense. Any thoughts ? |
|
Thanks for jumping on this, @klihub. You raised some important questions in your comment that I think we should probably settle on before finalizing the implementation. Since o11y is a key requirement for GA, could we open a GitHub issue to agree on the specific design goals and requirements first? We can treat this PR as a PoC to inform that discussion, but I'd feel more comfortable if we aligned on the "what" and "why" in a design issue before we iterate further on the "how" here. |
I created issue #270 for that. |
95bd1fb to
ef60131
Compare
| } | ||
|
|
||
| func (f *FieldOwners) compoundOwnerMap(field int32) (map[string]string, bool) { | ||
| func (f *FieldOwners) CompoundOwnerMap(field int32) (map[string]string, bool) { |
There was a problem hiding this comment.
I don't think this method is used in the generator so it could remain unexported.
| } | ||
|
|
||
| // Fields can be used to pass extra information for logged messages. | ||
| type Fields = map[string]any |
There was a problem hiding this comment.
If this is a map, how do we ensure deterministic ordering of logged fields? Is that an exercise left for the Logger?
There was a problem hiding this comment.
Yes. This is (expected to be) ultimately passed on to slog, logrus, or some other structured logger and then they deal with it as they see fit. This not different in any way from how you would expect (or not expect) a log.G(ctx).WithFields(Fields{...}) in containerd core code to result in deterministic logging order.
There was a problem hiding this comment.
It's worth considering here whether we want a map or to have named fields. A lot of the lines in this diff use the same names (value.name, value.value, nri.plugin).
There was a problem hiding this comment.
The repetitive logging of value.name, value.value in most of the cases is due to most of the adjusted fields having scalar values.
By 'named fields' do you mean a struct that has as fields the 'union' of all possible fields (IOW now keys) that we log for various adjustments, and then set those fields in a struct instead as opposed to key/value pairs in a map ?
Wouldn't that be a bit unnatural as the provided logger function is just acting as a postman towards a structured logger ? And wouldn't it also force the logger function to have to interpret the struct, so it can pick only the fields set and pass them on to a structured logger ?
| if key, marked := m.IsMarkedForRemoval(); !marked { | ||
| g.log(AuditAddProcessEnv, | ||
| Fields{ | ||
| "value.name": m.Key, |
There was a problem hiding this comment.
It depends on the Logger implementation, but this value. prefix will produce longer lines. Is the idea here for the Logger to parse this prefix in order to potentially handle these fields a certain way?
There was a problem hiding this comment.
No, the idea is to avoid conflicting with some existing key in an outer logging context. In this particular instance, to avoid a known conflict in CRI-O, which uses the name key to put the container name in the logging context, so that every logged message has (among others) also the name of the associated container. There would certainly be other ways to try avoiding it, and it would also be possible to not log adjustments this way but as a single value formatted with %v. The approach taken here is just one possibility.
There was a problem hiding this comment.
But nri.plugin could really just be plugin everywhere. I really don't think that could conflict with anything already in the context.
There was a problem hiding this comment.
The Logger can also add prefixes or adjust as necessary, especially if we have well-known fields here. I don't think we need to think too hard about collision. The Logger could unconditionally prefix nri. on every Field key.
|
|
||
| // CreateContainer relays the corresponding CRI request to plugins. | ||
| func (r *Adaptation) CreateContainer(ctx context.Context, req *CreateContainerRequest) (*CreateContainerResponse, error) { | ||
| func (r *Adaptation) CreateContainer(ctx context.Context, req *CreateContainerRequest) (*CreateContainerResponse, *api.OwningPlugins, error) { |
There was a problem hiding this comment.
Can you provide more context why we need to return the OwningPlugins?
There was a problem hiding this comment.
The approach taken by this PR is to log adjustments to the OCI Spec as/when they are made.
Therefore we need to pass ownership information along with the actual adjustment results, so we can eventually shove it in our runtime-tools/generator wrapper which then will use this new information to pull out the responsible/source plugin when it logs OCI Spec changes as they are being made.
With everything else staying the same (so still the same approach by large), one alternative to this detail would be to add a new OwningPlugins owners = 4; to the CreateContainerResponse message in api.proto, document that it is for the runtime with anything filled in on the stub/plugin side simply getting ignored, and then instead of returning it separately here, just set that new response.Owner field to it and let it travel in the response alongside with the rest of the data.
Add an option for setting an external audit event logger and use any configured logger to emit audit events as we adjust the OCI Spec. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Expose owning plugins for adjustments returned by CreateContainer. Include plugin in errors which originate from processing a request by a plugin. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
ef60131 to
6e1b91b
Compare
| if key, marked := m.IsMarkedForRemoval(); !marked { | ||
| g.log(AuditAddProcessEnv, | ||
| Fields{ | ||
| "value.name": m.Key, |
There was a problem hiding this comment.
The Logger can also add prefixes or adjust as necessary, especially if we have well-known fields here. I don't think we need to think too hard about collision. The Logger could unconditionally prefix nri. on every Field key.
| g.log(AuditAddProcessEnv, | ||
| Fields{ | ||
| "value.name": m.Key, | ||
| "value.value": "<value omitted>", |
There was a problem hiding this comment.
| "value.value": "<value omitted>", | |
| "value.value": "<removed>", |
might be more clear.
Though <removed> (or any string we pick here) could be an actual value. Might be worth making sure the value name has a - prefix?
There was a problem hiding this comment.
The "value omitted" content is an attempt to indicate that we do not log the value of environment variables. @chrishenzie was concerned about possibly logging secrets in the environment, so I changed it to this.
But my impression is that your comment is about a different context, when we remove environment values. But those we log with a different message and only with the variable name included. But did I misinterpret your comment ?
| } | ||
|
|
||
| // Fields can be used to pass extra information for logged messages. | ||
| type Fields = map[string]any |
There was a problem hiding this comment.
It's worth considering here whether we want a map or to have named fields. A lot of the lines in this diff use the same names (value.name, value.value, nri.plugin).
This PR implements NRI audit logging for OCI Spec adjustments, which has been identified as one of the missing things we need to add (be)for(e) a v1.0. This patch
Here are updated trees for contained and CRI-O: