Skip to content

[Bug][CI] Intermittent unit test failures across multiple modules after #18213 split install and verify steps #18220

@leocook

Description

@leocook

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

After #18213 ([Chore] Unit-Test performance optimize) split the CI unit test into separate install and verify steps, multiple modules fail intermittently with missing transitive dependencies. The failures are not reproducible locally.

Each CI run fails in a different module with dependency resolution errors:

Commit Failed module Error
756f2f248b dolphinscheduler-spi NoClassDefFoundError: com/fasterxml/jackson/core/type/TypeReference
0055001703 dolphinscheduler-dao package org.apache.commons.collections4 does not exist

Meanwhile, runs in between (e.g. 5f11cc2d28, dd8857b84c65, 5c85e15a8b96) pass successfully — indicating the issue is intermittent.

What you expected to happen

All module unit tests should pass consistently in CI.

How to reproduce

Cannot be reproduced locally. Running the same two-step command locally always succeeds:

./mvnw install -pl dolphinscheduler-spi -am -Dmaven.test.skip=true
./mvnw verify -pl dolphinscheduler-spi
# Passes locally — the transitive dependencies resolve correctly from ~/.m2

The failure only occurs in GitHub Actions, suggesting a CI-specific environmental issue.

Anything else

CI change introduced by #18213:

Before #18213, the CI ran tests in a single step with -am:

./mvnw verify -pl "$module" -am

After #18213, it was split into two steps:

# Step 1: install dependencies
./mvnw install -pl "$module" -am -Dmaven.test.skip=true
# Step 2: run tests (without -am)
./mvnw verify -pl "$module"

Analysis:

  • The install -am step succeeds and installs all upstream modules (jar + pom) into ~/.m2/repository.
  • The verify step (without -am) should resolve transitive dependencies from the local repo — and it does locally.
  • In CI, actions/cache@v5 restores a Maven cache keyed by hashFiles('**/pom.xml'). When the cache is stale or partially restored (e.g. due to a cache key change from pom.xml modifications), the verify step may fail to resolve transitive dependencies that were not freshly downloaded.
  • The intermittent nature (different modules fail on different runs, with successes in between) strongly suggests the issue is related to Maven cache state inconsistency on CI runners, not a deterministic build configuration bug.

Affected CI runs:

Suggested fix:

  • Restore -am in the verify step to ensure Maven resolves the full dependency graph in reactor mode, independent of cache state, or
  • Add a cache invalidation / cleanup step before the verify phase

Version

dev

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions