-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Hello @opalmer, created this issue from your comment in #211
I came to this issue from #207 which was marked as a duplicate of #189. After reading through the initial description, I'm not sure that this issue (#211) is going to address #207. I do agree that liveness probes that automatically restarts the management container would be an improvement but I don't think it will solve #207.
Some specific scenarios in which this approach might not work:
- The time between when the OPA container comes back online, the management container's health check failing (multiple times) and the management container being restated causing the policies to be pushed can be seconds at best. During that time, the opa container will continue to process requests without any configuration being present.
- The OPA container's own health check could fail and cause the container to be restarted, this can eventually cause the management container's health check to fail since it can't connect to the opa container. Like the point above however, this will take time.
Even if it were less than a second that opa didn't have policies loaded, hundreds of requests could get through without being run through the proper policies. For something like opa where a policy could be blocking privileged containers, ensuring images can't come from an unknown registry, ensuring pods end up on the right nodes, etc this can have some major side effects from both a security and an operational perspective.
Now, I've tried thinking of ways to work around the current state:
- Use the existing bootstrap & extra volume arguments in the chart to mount policies. We have a chart that creates configmaps with regos that are tagged so the management container can push those into opa. We could turn those into volumes too except now we don't have a single source of truth and if we update the policies, which are controlled by a different chart, we also need to update opa causing a restart of the pods.
- Add extra arguments to pull policy bundles from a remote source.
The extra argument path might work except the docs state:
By default, the OPA REST APIs will prevent you from modifying policy and data loaded via bundles. If you need to load policy and data from multiple sources, see the section below.
.. so that would mean you can't really use remote bundles and the current approach with the REST API? Looking around a bit more, I found #76 which is talking about adding a bundle api. Then after reading the ensuring operational readiness docs:
On the other hand, if you use the Bundle service OPA will start up without any policies and immediately start downloading a bundle. But even before the bundle has successfully downloaded, OPA will answer policy queries if asked (which is in every case except the bootstrap case the right thing to do).
So.... that got me thinking. What if there was an init container, or an entrypoint for the opa container, that either runs a subcommand of the management container command or reaches out to the management container (may have to wait for it to be up) and pulls down the policies and drops those on disk before opa starts?
This way when opa comes up, all the policies are pre-loaded and it can't serve a request without those policies. The management container could then continue as normal pushing policies as they are updated in kubernetes. This would also have an advantage that you still have a single source of truth (config maps) and if there's something else wrong (k8s api issues, kubelet problems, etc) your pod will never be in a ready state. This should also tightly couple the chain of events leading up to opa starting so no matter how the pod dies or how the chart is configured it should always load up the policies you've defined every time.
... at least that's the idea. I'll admit, I'm not an OPA expert so I could be missing a glaring issue with this idea somewhere. If that's the case, happy to learn something new haha 😄