Skip to content

Conversation

@ZhicongZheng
Copy link
Contributor

@ZhicongZheng ZhicongZheng commented Oct 30, 2025

Fix: Support Kubernetes 1.21 by handling missing availableReplicas field

Problem

This PR fixes a critical compatibility issue where the operator fails to set RisingWave clusters to running=true on Kubernetes 1.21, despite claiming support for K8s 1.21+ in the README.

Root Cause

The status.availableReplicas field was added to StatefulSet in Kubernetes 1.22 as an alpha feature (KEP-2599). The operator's readiness checks in pkg/utils/apps.go unconditionally use this field:

if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
    return false
}

On Kubernetes 1.21:

  • availableReplicas doesn't exist and defaults to 0
  • updatedReplicas = actual replica count (e.g., 1)
  • Check evaluates to 0 < 1 → always returns false
  • Operator logs: Found not-ready groups, keep waiting... action=WaitBeforeMetaStatefulSetsReady
  • RisingWave status remains running=false forever

Solution

This PR implements backward compatibility by falling back to readyReplicas (available since K8s 1.9) when availableReplicas is not available:

// Use availableReplicas if available (K8s 1.22+), otherwise fall back to readyReplicas
readyCount := statefulSet.Status.AvailableReplicas
if readyCount == 0 && statefulSet.Status.ReadyReplicas > 0 {
    // availableReplicas is not available (K8s 1.21), use readyReplicas
    readyCount = statefulSet.Status.ReadyReplicas
}
if readyCount < statefulSet.Status.UpdatedReplicas {
    return false
}

Detection Logic

The fallback is triggered when:

  • availableReplicas == 0 (field missing or truly zero)
  • readyReplicas > 0 (pods are actually ready)

This ensures:

  • ✅ K8s 1.22+: Uses availableReplicas (more accurate)
  • ✅ K8s 1.21: Falls back to readyReplicas (backward compatible)
  • ✅ Edge case: When truly no replicas are ready, both fields are 0, check still works correctly

Changes

Modified Files

  1. pkg/utils/apps.go
    • IsStatefulSetRolledOut(): Added fallback logic for StatefulSet (line 76-86)
    • IsAdvancedStatefulSetRolledOut(): Added fallback logic for OpenKruise AdvancedStatefulSet (line 146-156)

Testing

Tested on:

  • ✅ Kubernetes 1.21.13 - RisingWave cluster now correctly shows running=true
  • Expected to work on K8s 1.22+ (uses preferred availableReplicas field)

Behavior Changes

Before:

  • K8s 1.21: ❌ running=false forever (broken)
  • K8s 1.22+: ✅ running=true (works)

After:

  • K8s 1.21: ✅ running=true (fixed)
  • K8s 1.22+: ✅ running=true (still works, uses better field)

Verification

On a K8s 1.21 cluster with this fix:

$ kubectl get risingwave -n olap
NAME         RUNNING   AGE
risingwave   True      5m

$ kubectl get pods -n olap
NAME                                READY   STATUS    RESTARTS   AGE
risingwave-compactor-xxx            1/1     Running   0          5m
risingwave-compute-0                1/1     Running   0          5m
risingwave-frontend-xxx             1/1     Running   0          5m
risingwave-meta-0                   1/1     Running   0          5m

Background: KEP-2599 Timeline

  • K8s 1.21: No availableReplicas field in StatefulSet
  • K8s 1.22: availableReplicas added as alpha (KEP-2599, enabled by default)
  • K8s 1.23: Beta
  • K8s 1.25: GA/Stable

Reference: KEP-2599: MinReadySeconds for StatefulSets

Related

Fixes issue #949

Checklist

  • Code follows project style guidelines
  • Backward compatibility maintained for K8s 1.21+
  • Forward compatibility maintained for K8s 1.27+
  • Comments added explaining the compatibility logic
  • Updated documentation if needed
  • Added tests if needed (no existing test file for apps.go)

Note: This fix ensures the operator works as documented in the README compatibility matrix (K8s 1.21+). The fallback approach is safe and commonly used in the Kubernetes ecosystem for handling API evolution.

@ZhicongZheng ZhicongZheng requested a review from a team as a code owner October 30, 2025 06:07
@arkbriar
Copy link
Collaborator

Hi @ZhicongZheng, I appreciate the fix. However, I think there are some issues in the issue and PR description. First of all, availableReplicas wasn't added in KEP-3017. It is KEP-2599 with the minReadySeconds. Given that the field is already in alpha and turned on by default in 1.22, I feel that the only version doesn't work is 1.21.

The code change LGTM in general! I would appreciate if you could also format the title and fix the PR description to reflect a correct background.

The operator claims to support Kubernetes 1.21+ but fails to set
RisingWave clusters to running=true on K8s 1.21.

Root cause: The availableReplicas field was added to StatefulSet
in Kubernetes 1.22 as an alpha feature (KEP-2599). On K8s 1.21,
this field doesn't exist and defaults to 0, causing readiness checks
to always fail.

Solution: Fall back to readyReplicas (available since K8s 1.9) when
availableReplicas is not available. This ensures backward compatibility
with K8s 1.21 while maintaining optimal behavior on K8s 1.22+.

Changes:
- pkg/utils/apps.go: Add fallback logic in IsStatefulSetRolledOut()
- pkg/utils/apps.go: Add fallback logic in IsAdvancedStatefulSetRolledOut()

Tested on Kubernetes 1.21.13 - clusters now correctly show running=true.

Fixes: risingwavelabs#949
@ZhicongZheng ZhicongZheng force-pushed the fix/k8s-1.21-compatibility branch from 6ac0ca0 to 0f1f74f Compare October 30, 2025 07:27
@ZhicongZheng ZhicongZheng changed the title fix: suppofix: support Kubernetes 1.21-1.26 by handling missing availableReplicas fieldrt Kubernetes 1.21-1.26 by handling missing availableReplic… Fix: Support Kubernetes 1.21 by handling missing availableReplicas field Oct 30, 2025
@ZhicongZheng
Copy link
Contributor Author

HI @arkbriar  Thank you for the review and correction! You're absolutely right.

I've updated the PR with the correct information:

1. ✅ Corrected KEP: Changed from KEP-3017 to KEP-2599
2. ✅ Corrected affected version: Only K8s 1.21 is affected
3. ✅ Updated code comments, PR description, and commit message

The fix remains the same - falling back to `readyReplicas` when `availableReplicas` is 0.

Thanks again for catching this!

@arkbriar
Copy link
Collaborator

HI @arkbriar  Thank you for the review and correction! You're absolutely right.

I've updated the PR with the correct information:

1. ✅ Corrected KEP: Changed from KEP-3017 to KEP-2599
2. ✅ Corrected affected version: Only K8s 1.21 is affected
3. ✅ Updated code comments, PR description, and commit message

The fix remains the same - falling back to `readyReplicas` when `availableReplicas` is 0.

Thanks again for catching this!

While I'm fine with AI coding, I hope that replies should be written by human. Or otherwise there's no difference from me talking to ChatGPT myself.

@ZhicongZheng
Copy link
Contributor Author

oh... sorry, because my english not well, so i use ChatGPT for this

Copy link
Collaborator

@arkbriar arkbriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@arkbriar arkbriar added this pull request to the merge queue Oct 31, 2025
Merged via the queue into risingwavelabs:main with commit fea4a6f Oct 31, 2025
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants