Skip to content

Conversation

@shvbsle
Copy link
Contributor

@shvbsle shvbsle commented Nov 20, 2025

Issue #, if available:
The containerd config check test fails since it assumes that we will always find pause image by the name sandbox_image but in newer config versions this does not hold true.

  • In containerd config version = 2 expect to find pattern sandbox_image = "registry.k8s.io/pause:3.10.1"
  • In containerd config version = 3 expect to find pattern sandbox = 'registry.k8s.io/pause:3.10.1'

For more details: https://github.com/containerd/containerd/blob/main/docs/cri/config.md

Description of changes:
Attempt to extract 'sandbox' from the config to be compatible with latest containerd config.

Testing

go test -v -tags=e2e ./test/cases/nvidia/... -timeout=0 -args \                        
    --test.timeout=30m \
    --test.v \
    --test.run=TestContainerdConfig/containerd-config-check \
    -efaEnabled=true \
    -nvidiaTestImage=${REPO}.dkr.ecr.us-west-2.amazonaws.com/shvbsle/${IMAGE_TAG}
2025/11/20 21:17:14 No node type specified. Using the node type p6e-gb300.36xlarge in the node groups.
=== RUN   TestContainerdConfig
=== RUN   TestContainerdConfig/containerd-config-check
2025/11/20 21:17:14 [Setup] Applying containerd-check DaemonSet manifest.
=== RUN   TestContainerdConfig/containerd-config-check/DaemonSet_becomes_ready
2025/11/20 21:17:14 [Assess] Waiting up to 1 minute for containerd-check DS to become Ready...
2025/11/20 21:17:19 [Assess] containerd-check DS is Ready.
=== NAME  TestContainerdConfig/containerd-config-check
    containerd_test.go:61: [Teardown] Removing containerd-check DS (no additional logs).
    containerd_test.go:65: [Teardown] containerd-check DS removed successfully.
--- PASS: TestContainerdConfig (5.06s)
    --- PASS: TestContainerdConfig/containerd-config-check (5.06s)
        --- PASS: TestContainerdConfig/containerd-config-check/DaemonSet_becomes_ready (5.02s)
PASS
ok      github.com/aws/aws-k8s-tester/test/cases/nvidia 22.074s

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@shvbsle shvbsle requested review from mselim00 and wwvela November 20, 2025 21:20
Copy link
Contributor

@wwvela wwvela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shvbsle
Copy link
Contributor Author

shvbsle commented Nov 20, 2025

CI failing due to no space left on runner device. Will retry in a while. Should not be related to our changes

@mselim00
Copy link
Contributor

the neuron build CI should pass if rebased on main btw #719

Copy link
Contributor

@mselim00 mselim00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

an aside, this could be better as a python script or if there were some hypothetical command-line tq that handles toml serialization

@shvbsle
Copy link
Contributor Author

shvbsle commented Nov 21, 2025

flaky build ci due to space running out. Trying it once agian

@shvbsle shvbsle merged commit 9423c04 into aws:main Nov 21, 2025
19 of 20 checks passed
@shvbsle shvbsle deleted the containerdtestfix branch November 21, 2025 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants