A Kubernetes controller that coordinates the rollout of multiple deployments to ensure only one deployment rolls out at a time. This prevents resource contention and ensures predictable, sequential rollouts across related deployments.
Note: This project targets a niche coordination problem in Kubernetes: ensuring sequential rollouts across multiple Deployments that should not be updated in parallel.
Most users of Kubernetes do not need this—native Deployments already support rolling updates, readiness checks, and gradual rollout. If you are not explicitly running multiple Deployments for the same app (for example, with different network or node configurations) and struggling with them being updated at the same time (usually by GitOps tools), you are likely solving the wrong problem, or employing an anti-pattern in your architecture.
If this situation fits your use-case, you'll know; otherwise, consider revisiting your deployment strategy before adopting this solution.
The kube-deployment-coordinator is a Kubernetes operator that manages the coordinated rollout of deployments using a Custom Resource Definition (CRD). It ensures that deployments matching a label selector roll out one at a time, preventing simultaneous rollouts that could cause resource contention or service disruption.
- Rolls out deployments one at a time
- Coordinates deployments using label selectors
- Waits for deployments to be ready before continuing
- Tracks rollout progress and status
- Pauses non-active deployments automatically
- Records Kubernetes events
The following diagram illustrates how kube-deployment-coordinator operates within a Kubernetes cluster:
graph TB
subgraph "Kubernetes API Server"
API[API Server]
end
subgraph "kube-deployment-coordinator-system namespace"
Controller[Controller Pod<br/>DeploymentCoordination Controller]
Webhook[Webhook Server<br/>Port 9443]
end
subgraph "User Namespace"
DC[DeploymentCoordination<br/>CRD Resource]
D1[Deployment 1<br/>Labels: component=nginx]
D2[Deployment 2<br/>Labels: component=nginx]
D3[Deployment 3<br/>Labels: component=nginx]
end
%% API interactions
API -->|"CREATE/UPDATE<br/>Deployment"| Webhook
Webhook -->|"Mutate:<br/>Set spec.paused=true<br/>(if not active)"| API
%% Controller watching
API -.->|"Watch DeploymentCoordination"| Controller
API -.->|"Watch Deployments"| Controller
Controller -->|"Update status"| API
Controller -->|"Unpause active<br/>Pause others"| API
%% Coordination resource
DC -.->|"Defines coordination<br/>via labelSelector"| Controller
%% Deployment matching
D1 -.->|"Matches selector"| DC
D2 -.->|"Matches selector"| DC
D3 -.->|"Matches selector"| DC
style Controller fill:#e1f5ff
style Webhook fill:#fff4e1
style DC fill:#e8f5e9
style D1 fill:#f3e5f5
style D2 fill:#f3e5f5
style D3 fill:#f3e5f5
style API fill:#ffebee
-
Mutating Webhook: Intercepts all
DeploymentCREATE and UPDATE operations. If a deployment matches aDeploymentCoordinationresource's label selector but is not the active deployment, it automatically setsspec.paused=true. -
Controller:
- Watches
DeploymentCoordinationresources and matchingDeploymentresources - Manages the coordination state by updating the
DeploymentCoordinationstatus - Activates one deployment at a time by unpausing it
- Tracks rollout progress using replica counts
- Waits for
MinReadySecondsbefore activating the next deployment
- Watches
-
DeploymentCoordination CRD: Defines which deployments should be coordinated using a label selector and optional
minReadySecondssetting. -
Coordinated Deployments: Deployments with labels matching the
DeploymentCoordinationselector are automatically managed by the controller.
When you need to run the same application with different pod configurations (e.g., different network annotations for Multus CNI, different node selectors, or different resource requirements), you typically create separate deployments. However, during upgrades, you want to maintain high availability by ensuring only one deployment rolls out at a time.
Example Scenario:
- Deployment A: Uses static IP annotation for Pod network interface (e.g.,
k8s.v1.cni.cncf.io/networkswith a specific static IP) - Deployment B: Uses a different static IP annotation or network interface for its Pods
- Both deployments run the same application but require different network configurations
- During upgrades, you want to ensure at least one deployment is always available
When using GitOps tools (like FluxCD or ArgoCD) combined with automated dependency updates (like Renovate), multiple deployments can receive updates simultaneously. This can cause downtime for the application.
The kube-deployment-coordinator safeguards that when Deployment A is rolling out, Deployment B remains stable, and vice versa, maintaining service availability.
Other alternatives for coordinating sequential rollouts often involve adopting more complex or heavyweight deployment platforms, such as Argo Rollouts or Spinnaker. These platforms provide advanced deployment strategies and rollout controls, but can introduce significant operational overhead and may require re-architecting existing workflows.
This approach conflicts with organizations that have already established CI/CD pipelines using tools like FluxCD, ArgoCD, or native Kubernetes deployments, and want to avoid introducing a separate toolchain or controller solely for sequential deployment coordination. In these cases, kube-deployment-coordinator offers a lightweight, Kubernetes-native solution focused specifically on this use case with minimal disruption to existing processes.
- Webhook: A mutating webhook automatically pauses deployments that match a
DeploymentCoordinationresource but are not the active deployment - Controller: The controller manages the coordination by:
- Identifying deployments that match the label selector
- Activating one deployment at a time (unpausing it)
- Tracking rollout progress using replica counts
- Waiting for
MinReadySecondsafter a deployment completes before activating the next one - Updating status with rollout timestamps and deployment states
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: OCIRepository
metadata:
name: kube-deployment-coordinator-crds
namespace: kube-system
spec:
interval: 160m
url: oci://ghcr.io/containerinfra/charts/kube-deployment-coordinator-crds
ref:
semver: ">= 0.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-deployment-coordinator-crds
namespace: kube-system
spec:
chartRef:
kind: OCIRepository
name: kube-deployment-coordinator-crds
namespace: kube-system
interval: 1h
values: {}
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: OCIRepository
metadata:
name: kube-deployment-coordinator
namespace: kube-system
spec:
interval: 160m
url: oci://ghcr.io/containerinfra/charts/kube-deployment-coordinator
ref:
semver: ">= 0.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-deployment-coordinator
namespace: kube-system
spec:
chartRef:
kind: OCIRepository
name: kube-deployment-coordinator
namespace: kube-system
interval: 1h
install:
crds: Skip
values:
enableServiceMonitor: false
replicaCount: 1
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoScheduleDefine which deployments should be coordinated using a label selector:
apiVersion: apps.containerinfra.nl/v1
kind: DeploymentCoordination
metadata:
name: nginx-coordination
namespace: default
spec:
labelSelector:
matchLabels:
component: nginx
minReadySeconds: 10Fields:
labelSelector: Kubernetes label selector to match deployments that should be coordinatedminReadySeconds: (Optional) Minimum number of seconds the active deployment must be ready before the next one can start. Defaults to 0.
Ensure your deployments have labels that match the selector:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-1
labels:
component: nginx # Matches the labelSelector above
spec:
replicas: 1
# ... rest of deployment specOnce you have a DeploymentCoordination resource and matching deployments:
- Initial State: All matching deployments are automatically paused by the webhook or controller
- Activation: The controller activates the first deployment (in alphabetical order by namespace/name)
- Rollout: The active deployment rolls out while others remain paused
- Completion: Once the active deployment finishes rolling out and has been ready for
MinReadySeconds, it's cleared - Next Deployment: The next deployment with pending changes is activated
- Repeat: This continues until all deployments have finished rolling out
View the coordination status:
kubectl get deploymentcoordination nginx-coordination -o yamlThe status includes:
activeDeployment: The currently active deployment (namespace/name format)deployments: List of all coordinated deploymentsdeploymentStates: Detailed state for each deployment including:hasPendingChanges: Whether the deployment needs rolloutlastRolloutStarted: Timestamp when the last rollout startedlastRolloutFinished: Timestamp when the last rollout finished
conditions: Standard Kubernetes conditions (Ready, Progressing, Degraded)
Example status output:
status:
activeDeployment: default/deployment-1
deployments:
- default/deployment-1
- default/deployment-2
- default/deployment-3
deploymentStates:
- name: default/deployment-1
hasPendingChanges: false
generation: 2
lastRolloutStarted: "2025-01-20T10:00:00Z"
lastRolloutFinished: "2025-01-20T10:05:00Z"
- name: default/deployment-2
hasPendingChanges: true
generation: 1
conditions:
- type: Ready
status: "False"
reason: DeploymentsInProgress
- type: Progressing
status: "True"
reason: DeploymentRollingOutMonitor coordination events:
kubectl get events --field-selector involvedObject.name=nginx-coordination --sort-by='.lastTimestamp'Common events:
DeploymentActivated: A deployment was activated for rolloutDeploymentPaused: A deployment was pausedDeploymentUnpaused: A deployment was unpausedActiveDeploymentReady: The active deployment finished rolling outActiveDeploymentCleared: The active deployment was clearedActiveDeploymentChanged: The active deployment changed
# 1. Create the coordination resource
apiVersion: apps.containerinfra.nl/v1
kind: DeploymentCoordination
metadata:
name: nginx
namespace: default
spec:
labelSelector:
matchLabels:
component: nginx
minReadySeconds: 30
---
# 2. Create multiple deployments with matching labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-frontend
labels:
component: nginx
spec:
replicas: 3
template:
metadata:
labels:
app: nginx-frontend
spec:
containers:
- name: nginx
image: nginx:1.21
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-backend
labels:
component: nginx
spec:
replicas: 2
template:
metadata:
labels:
app: nginx-backend
spec:
containers:
- name: nginx
image: nginx:1.21When you update the image in nginx-frontend, it will roll out first. Once it completes and has been ready for 30 seconds, nginx-backend will automatically start rolling out if it has pending changes.
- Check label selector match: Ensure your deployments have labels that match the
labelSelectorin theDeploymentCoordinationresource - Verify webhook is working: Check if deployments are being paused automatically
- Check controller logs:
kubectl logs -n kube-deployment-coordinator-system deployment/kube-deployment-coordinator-controller-manager
- Check if deployment is paused:
kubectl get deployment <name> -o jsonpath='{.spec.paused}' - Check coordination status:
kubectl get deploymentcoordination <name> -o yaml - Verify active deployment: Check if another deployment is currently active
- Check events: Look for error events on the
DeploymentCoordinationresource
This should not happen, but if it does:
- Check if multiple
DeploymentCoordinationresources match the same deployments - Verify the webhook is enabled and functioning
- Check controller logs for errors
make build# Run unit tests
make test
# Run e2e tests (requires Kind cluster)
make test-e2e# Generate webhook certificates (if needed)
./hack/generate-webhook-certs.sh
# Run the controller
go run cmd/main.go --webhook-cert-dir=/tmp/k8s-webhook-server/serving-certsCopyright 2025 ContainerInfra.
Licensed under the Apache License, Version 2.0.