Skip to content

containerinfra/kube-deployment-coordinator

Repository files navigation

kube-deployment-coordinator

A Kubernetes controller that coordinates the rollout of multiple deployments to ensure only one deployment rolls out at a time. This prevents resource contention and ensures predictable, sequential rollouts across related deployments.

Note: This project targets a niche coordination problem in Kubernetes: ensuring sequential rollouts across multiple Deployments that should not be updated in parallel.

Most users of Kubernetes do not need this—native Deployments already support rolling updates, readiness checks, and gradual rollout. If you are not explicitly running multiple Deployments for the same app (for example, with different network or node configurations) and struggling with them being updated at the same time (usually by GitOps tools), you are likely solving the wrong problem, or employing an anti-pattern in your architecture.

If this situation fits your use-case, you'll know; otherwise, consider revisiting your deployment strategy before adopting this solution.

Overview

The kube-deployment-coordinator is a Kubernetes operator that manages the coordinated rollout of deployments using a Custom Resource Definition (CRD). It ensures that deployments matching a label selector roll out one at a time, preventing simultaneous rollouts that could cause resource contention or service disruption.

Key Features

  • Rolls out deployments one at a time
  • Coordinates deployments using label selectors
  • Waits for deployments to be ready before continuing
  • Tracks rollout progress and status
  • Pauses non-active deployments automatically
  • Records Kubernetes events

Architecture

The following diagram illustrates how kube-deployment-coordinator operates within a Kubernetes cluster:

graph TB
    subgraph "Kubernetes API Server"
        API[API Server]
    end

    subgraph "kube-deployment-coordinator-system namespace"
        Controller[Controller Pod<br/>DeploymentCoordination Controller]
        Webhook[Webhook Server<br/>Port 9443]
    end

    subgraph "User Namespace"
        DC[DeploymentCoordination<br/>CRD Resource]
        D1[Deployment 1<br/>Labels: component=nginx]
        D2[Deployment 2<br/>Labels: component=nginx]
        D3[Deployment 3<br/>Labels: component=nginx]
    end

    %% API interactions
    API -->|"CREATE/UPDATE<br/>Deployment"| Webhook
    Webhook -->|"Mutate:<br/>Set spec.paused=true<br/>(if not active)"| API

    %% Controller watching
    API -.->|"Watch DeploymentCoordination"| Controller
    API -.->|"Watch Deployments"| Controller
    Controller -->|"Update status"| API
    Controller -->|"Unpause active<br/>Pause others"| API

    %% Coordination resource
    DC -.->|"Defines coordination<br/>via labelSelector"| Controller

    %% Deployment matching
    D1 -.->|"Matches selector"| DC
    D2 -.->|"Matches selector"| DC
    D3 -.->|"Matches selector"| DC

    style Controller fill:#e1f5ff
    style Webhook fill:#fff4e1
    style DC fill:#e8f5e9
    style D1 fill:#f3e5f5
    style D2 fill:#f3e5f5
    style D3 fill:#f3e5f5
    style API fill:#ffebee
Loading

Component Interactions

  1. Mutating Webhook: Intercepts all Deployment CREATE and UPDATE operations. If a deployment matches a DeploymentCoordination resource's label selector but is not the active deployment, it automatically sets spec.paused=true.

  2. Controller:

    • Watches DeploymentCoordination resources and matching Deployment resources
    • Manages the coordination state by updating the DeploymentCoordination status
    • Activates one deployment at a time by unpausing it
    • Tracks rollout progress using replica counts
    • Waits for MinReadySeconds before activating the next deployment
  3. DeploymentCoordination CRD: Defines which deployments should be coordinated using a label selector and optional minReadySeconds setting.

  4. Coordinated Deployments: Deployments with labels matching the DeploymentCoordination selector are automatically managed by the controller.

Use Cases

Multiple Deployments with Different Pod Configurations

When you need to run the same application with different pod configurations (e.g., different network annotations for Multus CNI, different node selectors, or different resource requirements), you typically create separate deployments. However, during upgrades, you want to maintain high availability by ensuring only one deployment rolls out at a time.

Example Scenario:

  • Deployment A: Uses static IP annotation for Pod network interface (e.g., k8s.v1.cni.cncf.io/networks with a specific static IP)
  • Deployment B: Uses a different static IP annotation or network interface for its Pods
  • Both deployments run the same application but require different network configurations
  • During upgrades, you want to ensure at least one deployment is always available

When using GitOps tools (like FluxCD or ArgoCD) combined with automated dependency updates (like Renovate), multiple deployments can receive updates simultaneously. This can cause downtime for the application.

The kube-deployment-coordinator safeguards that when Deployment A is rolling out, Deployment B remains stable, and vice versa, maintaining service availability.

Alternative Solutions

Other alternatives for coordinating sequential rollouts often involve adopting more complex or heavyweight deployment platforms, such as Argo Rollouts or Spinnaker. These platforms provide advanced deployment strategies and rollout controls, but can introduce significant operational overhead and may require re-architecting existing workflows.

This approach conflicts with organizations that have already established CI/CD pipelines using tools like FluxCD, ArgoCD, or native Kubernetes deployments, and want to avoid introducing a separate toolchain or controller solely for sequential deployment coordination. In these cases, kube-deployment-coordinator offers a lightweight, Kubernetes-native solution focused specifically on this use case with minimal disruption to existing processes.

How It Works

  1. Webhook: A mutating webhook automatically pauses deployments that match a DeploymentCoordination resource but are not the active deployment
  2. Controller: The controller manages the coordination by:
    • Identifying deployments that match the label selector
    • Activating one deployment at a time (unpausing it)
    • Tracking rollout progress using replica counts
    • Waiting for MinReadySeconds after a deployment completes before activating the next one
    • Updating status with rollout timestamps and deployment states

Installation

Using Helm (FluxCD)

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: OCIRepository
metadata:
  name: kube-deployment-coordinator-crds
  namespace: kube-system
spec:
  interval: 160m
  url: oci://ghcr.io/containerinfra/charts/kube-deployment-coordinator-crds
  ref:
    semver: ">= 0.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kube-deployment-coordinator-crds
  namespace: kube-system
spec:
  chartRef:
    kind: OCIRepository
    name: kube-deployment-coordinator-crds
    namespace: kube-system
  interval: 1h
  values: {}
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: OCIRepository
metadata:
  name: kube-deployment-coordinator
  namespace: kube-system
spec:
  interval: 160m
  url: oci://ghcr.io/containerinfra/charts/kube-deployment-coordinator
  ref:
    semver: ">= 0.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kube-deployment-coordinator
  namespace: kube-system
spec:
  chartRef:
    kind: OCIRepository
    name: kube-deployment-coordinator
    namespace: kube-system
  interval: 1h
  install:
    crds: Skip
  values:
    enableServiceMonitor: false
    replicaCount: 1
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule

Usage

1. Create a DeploymentCoordination Resource

Define which deployments should be coordinated using a label selector:

apiVersion: apps.containerinfra.nl/v1
kind: DeploymentCoordination
metadata:
  name: nginx-coordination
  namespace: default
spec:
  labelSelector:
    matchLabels:
      component: nginx
  minReadySeconds: 10

Fields:

  • labelSelector: Kubernetes label selector to match deployments that should be coordinated
  • minReadySeconds: (Optional) Minimum number of seconds the active deployment must be ready before the next one can start. Defaults to 0.

2. Label Your Deployments

Ensure your deployments have labels that match the selector:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-1
  labels:
    component: nginx  # Matches the labelSelector above
spec:
  replicas: 1
  # ... rest of deployment spec

3. How Coordination Works

Once you have a DeploymentCoordination resource and matching deployments:

  1. Initial State: All matching deployments are automatically paused by the webhook or controller
  2. Activation: The controller activates the first deployment (in alphabetical order by namespace/name)
  3. Rollout: The active deployment rolls out while others remain paused
  4. Completion: Once the active deployment finishes rolling out and has been ready for MinReadySeconds, it's cleared
  5. Next Deployment: The next deployment with pending changes is activated
  6. Repeat: This continues until all deployments have finished rolling out

4. Checking Status

View the coordination status:

kubectl get deploymentcoordination nginx-coordination -o yaml

The status includes:

  • activeDeployment: The currently active deployment (namespace/name format)
  • deployments: List of all coordinated deployments
  • deploymentStates: Detailed state for each deployment including:
    • hasPendingChanges: Whether the deployment needs rollout
    • lastRolloutStarted: Timestamp when the last rollout started
    • lastRolloutFinished: Timestamp when the last rollout finished
  • conditions: Standard Kubernetes conditions (Ready, Progressing, Degraded)

Example status output:

status:
  activeDeployment: default/deployment-1
  deployments:
    - default/deployment-1
    - default/deployment-2
    - default/deployment-3
  deploymentStates:
    - name: default/deployment-1
      hasPendingChanges: false
      generation: 2
      lastRolloutStarted: "2025-01-20T10:00:00Z"
      lastRolloutFinished: "2025-01-20T10:05:00Z"
    - name: default/deployment-2
      hasPendingChanges: true
      generation: 1
  conditions:
    - type: Ready
      status: "False"
      reason: DeploymentsInProgress
    - type: Progressing
      status: "True"
      reason: DeploymentRollingOut

5. Viewing Events

Monitor coordination events:

kubectl get events --field-selector involvedObject.name=nginx-coordination --sort-by='.lastTimestamp'

Common events:

  • DeploymentActivated: A deployment was activated for rollout
  • DeploymentPaused: A deployment was paused
  • DeploymentUnpaused: A deployment was unpaused
  • ActiveDeploymentReady: The active deployment finished rolling out
  • ActiveDeploymentCleared: The active deployment was cleared
  • ActiveDeploymentChanged: The active deployment changed

Examples

Example: Coordinating Multiple Nginx Deployments

# 1. Create the coordination resource
apiVersion: apps.containerinfra.nl/v1
kind: DeploymentCoordination
metadata:
  name: nginx
  namespace: default
spec:
  labelSelector:
    matchLabels:
      component: nginx
  minReadySeconds: 30

---
# 2. Create multiple deployments with matching labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-frontend
  labels:
    component: nginx
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx-frontend
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-backend
  labels:
    component: nginx
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-backend
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

When you update the image in nginx-frontend, it will roll out first. Once it completes and has been ready for 30 seconds, nginx-backend will automatically start rolling out if it has pending changes.

Troubleshooting

Deployments Not Being Coordinated

  1. Check label selector match: Ensure your deployments have labels that match the labelSelector in the DeploymentCoordination resource
  2. Verify webhook is working: Check if deployments are being paused automatically
  3. Check controller logs:
    kubectl logs -n kube-deployment-coordinator-system deployment/kube-deployment-coordinator-controller-manager

Deployment Stuck

  1. Check if deployment is paused: kubectl get deployment <name> -o jsonpath='{.spec.paused}'
  2. Check coordination status: kubectl get deploymentcoordination <name> -o yaml
  3. Verify active deployment: Check if another deployment is currently active
  4. Check events: Look for error events on the DeploymentCoordination resource

Multiple Deployments Rolling Out Simultaneously

This should not happen, but if it does:

  1. Check if multiple DeploymentCoordination resources match the same deployments
  2. Verify the webhook is enabled and functioning
  3. Check controller logs for errors

Development

Building

make build

Testing

# Run unit tests
make test

# Run e2e tests (requires Kind cluster)
make test-e2e

Running Locally

# Generate webhook certificates (if needed)
./hack/generate-webhook-certs.sh

# Run the controller
go run cmd/main.go --webhook-cert-dir=/tmp/k8s-webhook-server/serving-certs

License

Copyright 2025 ContainerInfra.

Licensed under the Apache License, Version 2.0.