test(job/state): add unit tests for job controller state machine#5244
test(job/state): add unit tests for job controller state machine#5244kube-gopher wants to merge 13 commits into
Conversation
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of unit tests for the job state machine, covering various states (Aborted, Finished, Terminating), the state factory, and utility functions. The feedback focuses on improving test maintainability by unifying duplicated test helpers across files and utilizing existing constants for pod phase retention checks instead of hardcoding values.
| // capturedKillJob records one KillJob invocation and preserves the updateFn | ||
| // so tests can invoke it directly to verify the state-transition logic. | ||
| type capturedKillJob struct { | ||
| job *apis.JobInfo | ||
| podRetainPhase PhaseMap | ||
| updateFn UpdateStatusFn | ||
| } | ||
|
|
||
| // captureKillJob replaces KillJob with a stub that records its arguments and | ||
| // returns returnErr. The original value is restored via t.Cleanup. | ||
| func captureKillJob(t *testing.T, returnErr error) *capturedKillJob { | ||
| t.Helper() | ||
| original := KillJob | ||
| cap := &capturedKillJob{} | ||
| KillJob = func(job *apis.JobInfo, podRetainPhase PhaseMap, fn UpdateStatusFn) error { | ||
| cap.job = job | ||
| cap.podRetainPhase = podRetainPhase | ||
| cap.updateFn = fn | ||
| return returnErr | ||
| } | ||
| t.Cleanup(func() { KillJob = original }) | ||
| return cap | ||
| } |
There was a problem hiding this comment.
The test helper struct capturedKillJob and the function captureKillJob are duplicated with slight variations (e.g., killJobCall and stubKillJob in finished_test.go) across multiple test files in the same package. Since these files share the same package namespace, it is recommended to unify these helpers into a single implementation (e.g., in a common_test.go file or by defining it once in a primary test file) to improve maintainability and reduce code duplication.
| for _, phase := range []v1.PodPhase{v1.PodSucceeded, v1.PodFailed} { | ||
| if _, ok := call.podRetainPhase[phase]; !ok { | ||
| t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase) | ||
| } | ||
| } | ||
| if len(call.podRetainPhase) != 2 { | ||
| t.Errorf("podRetainPhase should have exactly 2 entries, got %d", len(call.podRetainPhase)) | ||
| } |
There was a problem hiding this comment.
The test hardcodes the pod phases (PodSucceeded, PodFailed) and the expected count (2) when verifying the podRetainPhase map. It is better to iterate over the PodRetainPhaseSoft constant defined in factory.go to ensure consistency and avoid brittleness if the set of retained phases changes in the future. This approach is already used in aborted_test.go.
| for _, phase := range []v1.PodPhase{v1.PodSucceeded, v1.PodFailed} { | |
| if _, ok := call.podRetainPhase[phase]; !ok { | |
| t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase) | |
| } | |
| } | |
| if len(call.podRetainPhase) != 2 { | |
| t.Errorf("podRetainPhase should have exactly 2 entries, got %d", len(call.podRetainPhase)) | |
| } | |
| for phase := range PodRetainPhaseSoft { | |
| if _, ok := call.podRetainPhase[phase]; !ok { | |
| t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase) | |
| } | |
| } | |
| if len(call.podRetainPhase) != len(PodRetainPhaseSoft) { | |
| t.Errorf("podRetainPhase has %d entries, want %d", len(call.podRetainPhase), len(PodRetainPhaseSoft)) | |
| } |
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
|
/test all |
|
@kube-gopher: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Pull request overview
This PR adds unit test coverage for pkg/controllers/job/state, validating the Volcano Job lifecycle state-machine behavior (action routing and phase transitions) without needing a fake k8s client.
Changes:
- Adds table-driven unit tests for state
Execute()methods across phases (Pending/Running/Restarting/Completing/Terminating/Aborting/Aborted/Finished). - Adds unit tests for shared helpers (
TotalTasks,TotalTaskMinAvailable) and forNewStatefactory behavior / pod-retain phase maps. - Introduces lightweight stubs for package-level action functions (
KillJob,KillTarget,SyncJob) to assert call arguments and exercise update functions.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/controllers/job/state/util_test.go | Adds tests for task counting helpers (TotalTasks, TotalTaskMinAvailable). |
| pkg/controllers/job/state/factory_test.go | Adds tests for NewState phase-to-state mapping and PodRetainPhase* constants. |
| pkg/controllers/job/state/finished_test.go | Adds tests ensuring finished-state always calls KillJob with soft retention and nil updateFn. |
| pkg/controllers/job/state/pending_test.go | Adds tests for Pending state action routing and SyncJob updateFn behavior. |
| pkg/controllers/job/state/running_test.go | Adds tests for Running state action branches and default SyncJob updateFn phase decisions. |
| pkg/controllers/job/state/restarting_test.go | Adds tests for restartingUpdateStatus and Restarting state action routing. |
| pkg/controllers/job/state/completing_test.go | Adds tests for Completing state KillJob usage and drain-to-Completed transition logic. |
| pkg/controllers/job/state/terminating_test.go | Adds tests for Terminating state KillJob usage and drain-to-Terminated transition logic. |
| pkg/controllers/job/state/aborting_test.go | Adds tests for Aborting state Resume vs default behavior and Aborted transition logic. |
| pkg/controllers/job/state/aborted_test.go | Adds tests for Aborted state Resume behavior vs default (nil updateFn) behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| originalPhase := tc.status.State.Phase | ||
| changed := cap.updateFn(&tc.status) | ||
|
|
||
| if changed { | ||
| t.Error("updateFn should return false while alive pods exist") | ||
| } | ||
| if tc.status.State.Phase != originalPhase { | ||
| t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase) |
There was a problem hiding this comment.
In this “alive pods” test, tc.status.State.Phase is never initialized (defaults to ""), so asserting it remains equal to originalPhase doesn’t verify that the phase stays Terminating. Set tc.status.State.Phase to vcbatch.Terminating before calling updateFn and assert it remains Terminating when changed==false.
| originalPhase := tc.status.State.Phase | |
| changed := cap.updateFn(&tc.status) | |
| if changed { | |
| t.Error("updateFn should return false while alive pods exist") | |
| } | |
| if tc.status.State.Phase != originalPhase { | |
| t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase) | |
| tc.status.State.Phase = vcbatch.Terminating | |
| changed := cap.updateFn(&tc.status) | |
| if changed { | |
| t.Error("updateFn should return false while alive pods exist") | |
| } | |
| if tc.status.State.Phase != vcbatch.Terminating { | |
| t.Errorf("phase should remain %q: got %q", vcbatch.Terminating, tc.status.State.Phase) |
| originalPhase := tc.status.State.Phase | ||
| changed := cap.updateFn(&tc.status) | ||
|
|
||
| if changed { | ||
| t.Error("updateFn should return false while alive pods exist") | ||
| } | ||
| if tc.status.State.Phase != originalPhase { | ||
| t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase) |
There was a problem hiding this comment.
Same issue as terminatingState alive-pod tests: tc.status.State.Phase is left as the zero value, so the “phase unchanged” assertion can pass even if updateFn incorrectly clears/changes the phase. Initialize the status phase to vcbatch.Completing before invoking updateFn and assert it stays Completing when changed==false.
| originalPhase := tc.status.State.Phase | |
| changed := cap.updateFn(&tc.status) | |
| if changed { | |
| t.Error("updateFn should return false while alive pods exist") | |
| } | |
| if tc.status.State.Phase != originalPhase { | |
| t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase) | |
| tc.status.State.Phase = vcbatch.Completing | |
| changed := cap.updateFn(&tc.status) | |
| if changed { | |
| t.Error("updateFn should return false while alive pods exist") | |
| } | |
| if tc.status.State.Phase != vcbatch.Completing { | |
| t.Errorf("phase should remain %q: got %q", vcbatch.Completing, tc.status.State.Phase) |
| if err := s.Execute(Action{Action: v1alpha1.SyncJobAction}); err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Here originalPhase is taken from tc.status.State.Phase, but the test cases don’t set an initial phase (so it’s ""). To actually verify abortingState’s updateFn keeps the phase Aborting while pods are still alive, set tc.status.State.Phase = vcbatch.Aborting before calling updateFn and assert it remains Aborting when changed==false.
| tc.status.State.Phase = vcbatch.Aborting |
| originalPhase := tc.status.State.Phase | ||
| changed := cap.updateFn(&tc.status) | ||
|
|
There was a problem hiding this comment.
These “stays pending” cases never set tc.status.State.Phase to Pending, so the “phase unchanged” assertion is comparing against the zero value ("") rather than vcbatch.Pending. Initialize each test status with State.Phase = vcbatch.Pending so the test verifies the updateFn truly preserves the Pending phase when MinAvailable is not met.
| if status.State.Phase != vcbatch.Running && status.State.Phase != "" { | ||
| t.Errorf("phase should remain unchanged, got %q", status.State.Phase) | ||
| } |
There was a problem hiding this comment.
Allowing status.State.Phase to remain "" weakens this assertion: in real usage JobStatus.State.Phase should already be Running, and the updateFn should leave it Running when no transition is needed. Consider setting the initial status.State.Phase = vcbatch.Running in these test cases and then asserting it remains Running when changed==false.
|
|
||
| // stubKillJob replaces the KillJob package variable with a stub that records | ||
| // its arguments and returns the given error. It restores the original value | ||
| // via t.Cleanup so parallel tests are not affected. |
There was a problem hiding this comment.
The comment is misleading: restoring a package-level variable in t.Cleanup prevents cross-test pollution when tests run sequentially, but it does not make the stub safe for parallel tests (t.Parallel), since other tests could observe the stub concurrently. Consider rewording to avoid implying parallel-safety, or explicitly note that tests must not run in parallel while stubbing globals.
| // via t.Cleanup so parallel tests are not affected. | |
| // via t.Cleanup to avoid leaking the stub into later sequential tests; tests | |
| // that use this helper must not run in parallel because KillJob is global. |
|
This PR appears to contain a lot of AI-generated code, including the description. Please explain your use of AI. |
|
@JesseStutler Thanks for pointing this out. I did use AI assistance to help write parts of the code and description. That said, I’ve reviewed and tested the code myself and understand what it does. |
|
@JesseStutler After using AI assistance to write code, I will carefully review every detail written by the AI to ensure code quality. |
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
702d972 to
3e402bf
Compare
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: kube-gopher <jzlyy68@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@JesseStutler I've carefully reviewed the quality of the relevant code, found and fixed some minor errors. Could you please take a look and let me know if there are any other areas that could be improved? |
Summary
Add unit tests for
pkg/controllers/job/state/package, which previouslyhad 0% test coverage. This package implements the Volcano Job lifecycle
state machine (Pending → Running → Completing/Terminating/Aborting/
Restarting → terminal states).
Changes
Test Coverage
Before:
0.0%After:
98.6%How to Verify
go test ./pkg/controllers/job/state/... -v -coverprofile=cover.out
go tool cover -func=cover.out