test(job/state): add unit tests for job controller state machine by kube-gopher · Pull Request #5244 · volcano-sh/volcano

kube-gopher · 2026-04-26T09:19:34Z

Summary

Add unit tests for pkg/controllers/job/state/ package, which previously
had 0% test coverage. This package implements the Volcano Job lifecycle
state machine (Pending → Running → Completing/Terminating/Aborting/
Restarting → terminal states).

Changes

File	Test File	Coverage
util.go	util_test.go	100%
factory.go	factory_test.go	100%
finished.go	finished_test.go	100%
aborted.go	aborted_test.go	100%
terminating.go	terminating_test.go	100%
aborting.go	aborting_test.go	100%
pending.go	pending_test.go	100%
completing.go	completing_test.go	100%
restarting.go	restarting_test.go	100%
running.go	running_test.go	100%

Test Coverage

Before: 0.0%
After: 98.6%

How to Verify

go test ./pkg/controllers/job/state/... -v -coverprofile=cover.out
go tool cover -func=cover.out

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

volcano-sh-bot · 2026-04-26T09:19:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign shinytang6 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/controllers/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces a comprehensive suite of unit tests for the job state machine, covering various states (Aborted, Finished, Terminating), the state factory, and utility functions. The feedback focuses on improving test maintainability by unifying duplicated test helpers across files and utilizing existing constants for pod phase retention checks instead of hardcoding values.

gemini-code-assist · 2026-04-26T09:21:59Z

+// capturedKillJob records one KillJob invocation and preserves the updateFn
+// so tests can invoke it directly to verify the state-transition logic.
+type capturedKillJob struct {
+	job            *apis.JobInfo
+	podRetainPhase PhaseMap
+	updateFn       UpdateStatusFn
+}
+
+// captureKillJob replaces KillJob with a stub that records its arguments and
+// returns returnErr. The original value is restored via t.Cleanup.
+func captureKillJob(t *testing.T, returnErr error) *capturedKillJob {
+	t.Helper()
+	original := KillJob
+	cap := &capturedKillJob{}
+	KillJob = func(job *apis.JobInfo, podRetainPhase PhaseMap, fn UpdateStatusFn) error {
+		cap.job = job
+		cap.podRetainPhase = podRetainPhase
+		cap.updateFn = fn
+		return returnErr
+	}
+	t.Cleanup(func() { KillJob = original })
+	return cap
+}


The test helper struct capturedKillJob and the function captureKillJob are duplicated with slight variations (e.g., killJobCall and stubKillJob in finished_test.go) across multiple test files in the same package. Since these files share the same package namespace, it is recommended to unify these helpers into a single implementation (e.g., in a common_test.go file or by defining it once in a primary test file) to improve maintainability and reduce code duplication.

gemini-code-assist · 2026-04-26T09:21:59Z

+	for _, phase := range []v1.PodPhase{v1.PodSucceeded, v1.PodFailed} {
+		if _, ok := call.podRetainPhase[phase]; !ok {
+			t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase)
+		}
+	}
+	if len(call.podRetainPhase) != 2 {
+		t.Errorf("podRetainPhase should have exactly 2 entries, got %d", len(call.podRetainPhase))
+	}


The test hardcodes the pod phases (PodSucceeded, PodFailed) and the expected count (2) when verifying the podRetainPhase map. It is better to iterate over the PodRetainPhaseSoft constant defined in factory.go to ensure consistency and avoid brittleness if the set of retained phases changes in the future. This approach is already used in aborted_test.go.

Suggested change

for _, phase := range []v1.PodPhase{v1.PodSucceeded, v1.PodFailed} {

if _, ok := call.podRetainPhase[phase]; !ok {

t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase)

}

}

if len(call.podRetainPhase) != 2 {

t.Errorf("podRetainPhase should have exactly 2 entries, got %d", len(call.podRetainPhase))

}

for phase := range PodRetainPhaseSoft {

if _, ok := call.podRetainPhase[phase]; !ok {

t.Errorf("podRetainPhase missing %q; got %v", phase, call.podRetainPhase)

}

}

if len(call.podRetainPhase) != len(PodRetainPhaseSoft) {

t.Errorf("podRetainPhase has %d entries, want %d", len(call.podRetainPhase), len(PodRetainPhaseSoft))

}

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

kube-gopher · 2026-04-27T01:50:58Z

/test all

volcano-sh-bot · 2026-04-27T01:51:23Z

@kube-gopher: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot

Pull request overview

This PR adds unit test coverage for pkg/controllers/job/state, validating the Volcano Job lifecycle state-machine behavior (action routing and phase transitions) without needing a fake k8s client.

Changes:

Adds table-driven unit tests for state Execute() methods across phases (Pending/Running/Restarting/Completing/Terminating/Aborting/Aborted/Finished).
Adds unit tests for shared helpers (TotalTasks, TotalTaskMinAvailable) and for NewState factory behavior / pod-retain phase maps.
Introduces lightweight stubs for package-level action functions (KillJob, KillTarget, SyncJob) to assert call arguments and exercise update functions.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pkg/controllers/job/state/util_test.go	Adds tests for task counting helpers (`TotalTasks`, `TotalTaskMinAvailable`).
pkg/controllers/job/state/factory_test.go	Adds tests for `NewState` phase-to-state mapping and `PodRetainPhase*` constants.
pkg/controllers/job/state/finished_test.go	Adds tests ensuring finished-state always calls `KillJob` with soft retention and nil updateFn.
pkg/controllers/job/state/pending_test.go	Adds tests for Pending state action routing and SyncJob updateFn behavior.
pkg/controllers/job/state/running_test.go	Adds tests for Running state action branches and default SyncJob updateFn phase decisions.
pkg/controllers/job/state/restarting_test.go	Adds tests for `restartingUpdateStatus` and Restarting state action routing.
pkg/controllers/job/state/completing_test.go	Adds tests for Completing state KillJob usage and drain-to-Completed transition logic.
pkg/controllers/job/state/terminating_test.go	Adds tests for Terminating state KillJob usage and drain-to-Terminated transition logic.
pkg/controllers/job/state/aborting_test.go	Adds tests for Aborting state Resume vs default behavior and Aborted transition logic.
pkg/controllers/job/state/aborted_test.go	Adds tests for Aborted state Resume behavior vs default (nil updateFn) behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T01:53:44Z

+			originalPhase := tc.status.State.Phase
+			changed := cap.updateFn(&tc.status)
+
+			if changed {
+				t.Error("updateFn should return false while alive pods exist")
+			}
+			if tc.status.State.Phase != originalPhase {
+				t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase)


In this “alive pods” test, tc.status.State.Phase is never initialized (defaults to ""), so asserting it remains equal to originalPhase doesn’t verify that the phase stays Terminating. Set tc.status.State.Phase to vcbatch.Terminating before calling updateFn and assert it remains Terminating when changed==false.

Suggested change

originalPhase := tc.status.State.Phase

changed := cap.updateFn(&tc.status)

if changed {

t.Error("updateFn should return false while alive pods exist")

}

if tc.status.State.Phase != originalPhase {

t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase)

tc.status.State.Phase = vcbatch.Terminating

changed := cap.updateFn(&tc.status)

if changed {

t.Error("updateFn should return false while alive pods exist")

}

if tc.status.State.Phase != vcbatch.Terminating {

t.Errorf("phase should remain %q: got %q", vcbatch.Terminating, tc.status.State.Phase)

Copilot · 2026-04-27T01:53:44Z

+			originalPhase := tc.status.State.Phase
+			changed := cap.updateFn(&tc.status)
+
+			if changed {
+				t.Error("updateFn should return false while alive pods exist")
+			}
+			if tc.status.State.Phase != originalPhase {
+				t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase)


Same issue as terminatingState alive-pod tests: tc.status.State.Phase is left as the zero value, so the “phase unchanged” assertion can pass even if updateFn incorrectly clears/changes the phase. Initialize the status phase to vcbatch.Completing before invoking updateFn and assert it stays Completing when changed==false.

Suggested change

originalPhase := tc.status.State.Phase

changed := cap.updateFn(&tc.status)

if changed {

t.Error("updateFn should return false while alive pods exist")

}

if tc.status.State.Phase != originalPhase {

t.Errorf("phase should not change: got %q, want %q", tc.status.State.Phase, originalPhase)

tc.status.State.Phase = vcbatch.Completing

changed := cap.updateFn(&tc.status)

if changed {

t.Error("updateFn should return false while alive pods exist")

}

if tc.status.State.Phase != vcbatch.Completing {

t.Errorf("phase should remain %q: got %q", vcbatch.Completing, tc.status.State.Phase)

Copilot · 2026-04-27T01:53:45Z

+			if err := s.Execute(Action{Action: v1alpha1.SyncJobAction}); err != nil {
+				t.Fatalf("unexpected error: %v", err)
+			}
+


Here originalPhase is taken from tc.status.State.Phase, but the test cases don’t set an initial phase (so it’s ""). To actually verify abortingState’s updateFn keeps the phase Aborting while pods are still alive, set tc.status.State.Phase = vcbatch.Aborting before calling updateFn and assert it remains Aborting when changed==false.

Suggested change

tc.status.State.Phase = vcbatch.Aborting

Copilot · 2026-04-27T01:53:45Z

+			originalPhase := tc.status.State.Phase
+			changed := cap.updateFn(&tc.status)
+


These “stays pending” cases never set tc.status.State.Phase to Pending, so the “phase unchanged” assertion is comparing against the zero value ("") rather than vcbatch.Pending. Initialize each test status with State.Phase = vcbatch.Pending so the test verifies the updateFn truly preserves the Pending phase when MinAvailable is not met.

Copilot · 2026-04-27T01:53:45Z

+			if status.State.Phase != vcbatch.Running && status.State.Phase != "" {
+				t.Errorf("phase should remain unchanged, got %q", status.State.Phase)
+			}


Allowing status.State.Phase to remain "" weakens this assertion: in real usage JobStatus.State.Phase should already be Running, and the updateFn should leave it Running when no transition is needed. Consider setting the initial status.State.Phase = vcbatch.Running in these test cases and then asserting it remains Running when changed==false.

Copilot · 2026-04-27T01:53:45Z

+
+// stubKillJob replaces the KillJob package variable with a stub that records
+// its arguments and returns the given error. It restores the original value
+// via t.Cleanup so parallel tests are not affected.


The comment is misleading: restoring a package-level variable in t.Cleanup prevents cross-test pollution when tests run sequentially, but it does not make the stub safe for parallel tests (t.Parallel), since other tests could observe the stub concurrently. Consider rewording to avoid implying parallel-safety, or explicitly note that tests must not run in parallel while stubbing globals.

Suggested change

// via t.Cleanup so parallel tests are not affected.

// via t.Cleanup to avoid leaking the stub into later sequential tests; tests

// that use this helper must not run in parallel because KillJob is global.

JesseStutler · 2026-04-27T02:26:40Z

This PR appears to contain a lot of AI-generated code, including the description. Please explain your use of AI.

kube-gopher · 2026-04-27T02:36:09Z

@JesseStutler Thanks for pointing this out. I did use AI assistance to help write parts of the code and description. That said, I’ve reviewed and tested the code myself and understand what it does.

kube-gopher · 2026-04-27T02:40:50Z

@JesseStutler After using AI assistance to write code, I will carefully review every detail written by the AI to ensure code quality.

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kube-gopher · 2026-04-28T03:27:51Z

@JesseStutler I've carefully reviewed the quality of the relevant code, found and fixed some minor errors. Could you please take a look and let me know if there are any other areas that could be improved?

kube-gopher added 5 commits April 26, 2026 17:01

test(job/state): add unit tests for util.go

4ed6307

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for factory.go

7806b15

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for finished state

9c91b82

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for aborted state

2dc4811

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for terminating state

887e206

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

volcano-sh-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2026

volcano-sh-bot requested review from JesseStutler and wangyang0616 April 26, 2026 09:19

volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 26, 2026

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

kube-gopher added 5 commits April 27, 2026 09:14

test(job/state): add unit tests for aborting state

af0ff32

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for pending state

09d0e7c

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for completing state

8a4629a

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for restarting state

766a041

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

test(job/state): add unit tests for running state

e3e0f3b

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

kube-gopher marked this pull request as ready for review April 27, 2026 01:50

Copilot AI review requested due to automatic review settings April 27, 2026 01:50

volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026

volcano-sh-bot requested a review from kingeasternsun April 27, 2026 01:50

Copilot started reviewing on behalf of kube-gopher April 27, 2026 01:50 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

test(job/state): fix gofmt formatting and copyright year in state tests

3e402bf

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

kube-gopher force-pushed the test/job-state-controller-unit-tests branch from 702d972 to 3e402bf Compare April 27, 2026 08:22

test(job/state): fix code quality issues in state unit tests

72af77e

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

kube-gopher requested a review from Copilot April 27, 2026 09:30

Copilot started reviewing on behalf of kube-gopher April 27, 2026 09:41 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

kube-gopher requested a review from Copilot April 27, 2026 12:24

Copilot started reviewing on behalf of kube-gopher April 27, 2026 12:24 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

test(job/state): fix code quality issues

75bb6e7

Signed-off-by: kube-gopher <jzlyy68@gmail.com>

kube-gopher requested a review from Copilot April 28, 2026 01:18

Copilot started reviewing on behalf of kube-gopher April 28, 2026 01:18 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

kube-gopher closed this May 11, 2026

kube-gopher deleted the test/job-state-controller-unit-tests branch May 11, 2026 04:17

		originalPhase := tc.status.State.Phase
		changed := cap.updateFn(&tc.status)

	// via t.Cleanup so parallel tests are not affected.
	// via t.Cleanup to avoid leaking the stub into later sequential tests; tests
	// that use this helper must not run in parallel because KillJob is global.

Conversation

kube-gopher commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Coverage

How to Verify

Uh oh!

volcano-sh-bot commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

kube-gopher commented Apr 27, 2026

Uh oh!

volcano-sh-bot commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

JesseStutler commented Apr 27, 2026

Uh oh!

kube-gopher commented Apr 27, 2026

Uh oh!

kube-gopher commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

kube-gopher commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kube-gopher commented Apr 26, 2026 •

edited

Loading