Skip to content

Rollout Doesn't Rollback During A syncHandler/Reconciliation Error. Gets Stuck in a Infinite Progressing State #4504

@satyamsareen007

Description

@satyamsareen007

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

I tried updating an argument for my analysis template using the fieldRef/ valueFrom way. Argo Rollouts controller was not able to find the field under the provided path and the Rollout is stuck in an infinte Progressing state.
The controller has alreay retired close to 5000 times.
This looks like a bug, if the controller is not able find the field due to missing a value or a incorrectly typed path, controller should retry only upto a configurable finite number of times before rolling back and entering a degraded state.
Also, there are not statuses being emited abouth this by the Rollout, only way to check this is through logs.
If this happens in an actual deployment, there should also be a quick way to check it for eg. a status field gets updated which we can look out for/get notifications on.

Also, a separate question; does fieldRef only work for labels on the Rollout resource itself, does it not work for the labels under the template section which get added to the pod.

  args:
  - name: rollout-version
    valueFrom:
      fieldRef:
        fieldPath: spec.template.metadata.labels['version']

To Reproduce

Use the below Rollout config:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: usage-events-ingester
  namespace: usage-events-ingester
  labels:
    app: usage-events-ingester
spec:
  replicas: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 5}
      - analysis:
          templates:
            - templateName: example-template
          args:
          - name: rollout-version
            valueFrom:
              fieldRef:
                fieldPath: spec.template.metadata.labels['version']
      - setWeight: 100
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: usage-events-ingester
  template:
    metadata:
      labels:
        app: usage-events-ingester
        version: "3"
    spec:
      containers:
      - name: rollouts-demo-analysis
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m


Expected behavior

Rollout should have rolled back after a configurable number of retries and entered into a "Degraded" state

Screenshots

Image

Version

1.8.3

Logs

time="2025-10-22T07:28:22Z" level=info msg="rollout syncHandler queue retries: 3968 : key \"usage-events-ingester/usage-events-ingester\"" namespace=usage-events-ingester rollout=usage-events-ingester
time="2025-10-22T07:28:22Z" level=error msg="invalid path spec.template.metadata.labels['version'] in rollout" error="<nil>"
time="2025-10-22T07:28:32Z" level=info msg="Started syncing rollout" generation=18 namespace=usage-events-ingester resourceVersion=1119184795 rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=info msg="No TrafficRouting Reconcilers found" namespace=usage-events-ingester rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=info msg="Reconciling analysis step (stepIndex: 2)" namespace=usage-events-ingester rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=error msg="roCtx.reconcile err invalid path spec.template.metadata.labels['version'] in rollout" generation=18 namespace=usage-events-ingester resourceVersion=1119184795 rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=info msg="Reconciliation completed" generation=18 namespace=usage-events-ingester resourceVersion=1119184795 rollout=usage-events-ingester time_ms=3.248486
time="2025-10-22T07:28:32Z" level=error msg="rollout syncHandler error: invalid path spec.template.metadata.labels['version'] in rollout" namespace=usage-events-ingester rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=info msg="rollout syncHandler queue retries: 3969 : key \"usage-events-ingester/usage-events-ingester\"" namespace=usage-events-ingester rollout=usage-events-ingester
time="2025-10-22T07:28:32Z" level=error msg="invalid path spec.template.metadata.labels['version'] in rollout" error="<nil>"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions