Skip to content

Bug: Operator incompatible with Kubernetes 1.21 due to StatefulSet availableReplicas field #949

@ZhicongZheng

Description

@ZhicongZheng

Bug: Operator incompatible with Kubernetes 1.21 due to StatefulSet availableReplicas field

Description

The RisingWave Operator claims to support Kubernetes 1.21+ in the compatibility matrix, but it fails on Kubernetes 1.21 due to the use of the availableReplicas field in StatefulSet status checks.

This causes the operator to never set the RisingWave cluster running status to true on Kubernetes 1.21, even when all pods are healthy and ready.

Environment

  • Kubernetes Version: v1.21.13
  • Operator Version: v0.13.0 (also affects latest)
  • RisingWave Version: nightly-20251028

Reproduction Steps

  1. Deploy RisingWave Operator on a Kubernetes 1.21 cluster
  2. Create a RisingWave cluster
  3. Verify that all pods are running and ready:
    kubectl get pods -n <namespace>
    # All pods show 1/1 Running
  4. Check the RisingWave cluster status:
    kubectl get risingwave -n <namespace>
    # RUNNING shows "False" forever

Root Cause

The status.availableReplicas field was added to StatefulSet in Kubernetes 1.22 as an alpha feature (KEP-2599). On Kubernetes 1.21, this field does not exist and defaults to 0.

The operator's readiness check in pkg/utils/apps.go uses this field:

// Line 76-78
if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
    return false
}

On Kubernetes 1.21:

  • availableReplicas = 0 (field doesn't exist, defaults to zero value)
  • updatedReplicas = 1 (actual replica count)
  • Check evaluates to: 0 < 1 → returns false
  • Operator logs show: Found not-ready groups, keep waiting... action=WaitBeforeMetaStatefulSetsReady

This causes the operator to perpetually wait for StatefulSets to become ready, never progressing to set the Running condition to true.

Evidence

1. StatefulSet Status (K8s 1.21)

{
  "status": {
    "replicas": 1,
    "readyReplicas": 1,
    "currentReplicas": 1,
    "updatedReplicas": 1
    // availableReplicas field is missing
  }
}

2. Operator Logs

INFO Found not-ready groups, keep waiting... action=WaitBeforeMetaStatefulSetsReady component=meta group=""
INFO Found not-ready groups, keep waiting... action=WaitBeforeComputeStatefulSetsReady component=compute group=""

3. RisingWave Status

status:
  conditions:
  - type: Initializing
    status: "True"
  - type: Running
    status: "False"  # Never becomes True
  componentReplicas:
    meta:
      running: 1
      target: 1
    compute:
      running: 1
      target: 1

Impact

  • Users on Kubernetes 1.21 cannot use the operator despite documentation claiming support
  • Clusters appear unhealthy even when fully functional
  • Breaks automated workflows that depend on the Running status

Expected Behavior

The operator should correctly detect StatefulSet readiness on Kubernetes 1.21 as documented in the compatibility matrix.

Proposed Solution

Fix the code to support K8s 1.21 by falling back to readyReplicas (available since K8s 1.9) when availableReplicas is not available:

// Use availableReplicas if available (K8s 1.22+), otherwise fall back to readyReplicas
readyCount := statefulSet.Status.AvailableReplicas
if readyCount == 0 && statefulSet.Status.ReadyReplicas > 0 {
    // Fall back to readyReplicas for K8s 1.21
    readyCount = statefulSet.Status.ReadyReplicas
}
if readyCount < statefulSet.Status.UpdatedReplicas {
    return false
}

This ensures backward compatibility with K8s 1.21 while maintaining optimal behavior on K8s 1.22+.

Additional Context

The same issue affects similar checks for:

  • IsStatefulSetRolledOut() - line 76
  • IsAdvancedStatefulSetRolledOut() - line 146 (for OpenKruise)

Note: availableReplicas has been available in Deployment since K8s 1.14, so only the StatefulSet checks are affected.

Related Files

  • pkg/utils/apps.go - Contains the problematic readiness checks
  • README.md - Documents K8s 1.21+ compatibility

Timeline

  • K8s 1.21: No availableReplicas field in StatefulSet
  • K8s 1.22: availableReplicas added as alpha (KEP-2599, turned on by default)
  • K8s 1.23: Beta
  • K8s 1.25: GA/Stable

I'm happy to submit a PR to fix this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions