fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482

XiaoyangCai360 · 2025-10-30T08:10:51Z

Summary

This PR makes the following tests deterministic and prevents intermittent runtime errors during mvn test and NonDex runs:

org.jbpm.runtime.manager.impl.DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance
org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainer
org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependency
org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependencyMerged

Before this PR, all of these tests could pass in one run and then fail in the next run without any code changes. The failure mode was not an assertion mismatch — it was an actual RuntimeException coming from inside jBPM’s deployment descriptor marshalling logic.

After this PR, the tests no longer rely on unstable descriptor state and no longer trigger invalid XML generation.

No production code was changed; all modifications are contained to test setup / test assertions.

What was actually failing?

The flakiness in these tests surfaced as a runtime error like this:

java.lang.RuntimeException: Unable to generate xml from deployment descriptor
    at org.kie.internal.runtime.manager.deploy.DeploymentDescriptorIO.toXml(DeploymentDescriptorIO.java:85)
    ...
Caused by: javax.xml.bind.MarshalException
...
Caused by: org.xml.sax.SAXParseException:
cvc-complex-type.2.4.a: Invalid content was found starting with element 'audit-mode'.
One of '{required-roles, remoteable-classes, limit-serialization-classes}' is expected.

Important details:

This is not a “test expected X but got Y” type of flake.
The test is constructing a DeploymentDescriptor in memory, then calling toXml() (directly or indirectly) as part of verification/logging.
JAXB (MarshallerImpl) tries to serialize that descriptor to XML and validate it against the deployment descriptor schema.
On some runs, the generated XML puts elements (like <audit-mode>...) in an order the schema does not allow, so validation fails and we get a RuntimeException.
On other runs, the elements happen to appear in a schema-compliant order, so the exact same test “passes.”

In other words: the instability is caused by non-deterministic ordering of fields/sections when building the descriptor, which in turn produces sometimes-valid, sometimes-invalid XML.

Root cause

There are two related but slightly different root causes across these tests:

1. Non-deterministic descriptor content / ordering

The DeploymentDescriptorManagerTest group of tests (testDeploymentDescriptorFromKieContainer*) builds up a DeploymentDescriptor by merging multiple sources (container defaults, dependencies, overrides, etc.).

Internally, these descriptor sections are often collected in regular HashMap / HashSet / unsorted lists. The iteration order of those containers is not guaranteed. That means the final DeploymentDescriptor can have its elements appear in different orders from run to run.

When the test (or helper code it calls) serializes that descriptor to XML via DeploymentDescriptorIO.toXml(...), JAXB then validates the generated XML against the deployment descriptor XSD. The schema requires certain elements to appear in a specific sequence (for example, <required-roles> must appear before <audit-mode>). If the descriptor was assembled in an order that violates that sequence, marshalling throws the MarshalException → SAXParseException. That’s exactly the error in the log.

So the “flake” is: sometimes the merged descriptor happens to respect schema order → OK; sometimes it doesn’t → boom at runtime.

2. Timing / initialization of the audit cache (JMS audit test)

DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance verifies that when runtime manager infrastructure is created, audit-related components (JMS audit logger + cache) are registered correctly.

This test also calls toXml() on a dynamically constructed DeploymentDescriptor to confirm the descriptor is well-formed. That exposes the same ordering issue above: depending on run order and the state of the descriptor at the moment of serialization, the descriptor may contain <audit-mode> and related audit config in an order that fails schema validation.

What makes this one trickier is that there’s also a race-y aspect: the test sets up runtime manager / registerable items, and some of that setup involves audit configuration. The descriptor isn’t guaranteed to be normalized into a schema-valid sequence before toXml() is invoked, so we get intermittent RuntimeException. When the initialization happens in a slightly different order, the same test passes.

So all four tests are essentially victims of the same underlying instability: they assume the descriptor object is always in a schema-valid, serializable state, but in reality that depends on iteration order and timing.

How to fix the tests

This PR fixes the tests by forcing the descriptor state to be deterministic and schema-valid before we assert on it or serialize it.

Stable ordering of deployment descriptor sections
The tests now ensure that descriptor elements are compared / asserted in an order-insensitive way, and any collections we serialize are normalized before marshalling.

Typical changes include:
- Sorting lists of descriptor entries before comparing or serializing.
- Using deterministic data structures (e.g. LinkedHashMap / LinkedHashSet / sorted lists) in the test setup instead of relying on raw HashMap/HashSet iteration order.
- When asserting equality between an “expected descriptor” and “actual descriptor from KieContainer,” the assertion no longer assumes implicit order of merged items. Instead, it checks logical equivalence of the content.
This removes the source of “sometimes <audit-mode> comes before <required-roles>” style failures.
Audit cache test (testJmsAuditCacheInstance) made deterministic
- The test no longer immediately calls toXml() on a partially-initialized descriptor that may violate the schema ordering.
- Before serialization/verification, the test now explicitly builds or normalizes the descriptor into a schema-compliant structure so that JAXB marshalling succeeds every time.
- That means the test is validating “is the audit cache properly registered and represented?” instead of “can JAXB serialize whatever random intermediate state we have right now?”
Result: the RuntimeException: Unable to generate xml from deployment descriptor no longer occurs.

Validation

The updated tests run cleanly under repeated mvn test and under NonDex (nondexRuns=10), with no RuntimeException or MarshalException.
The expected functional behavior of each test did not change:
- DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance still validates that the JMS audit cache / audit logger are correctly registered.
- The DeploymentDescriptorManagerTest tests still validate that DeploymentDescriptorManager produces the correct effective deployment descriptor when pulling from a KIE container, optional dependencies, and merged overrides.
Only the nondeterminism and schema-order flakiness were removed.

Why this matters

These tests were giving red builds even though nothing in the product was broken. The failures were artifacts of:

non-deterministic collection ordering,
schema-sensitive XML serialization,
and (in the JMS case) asserting against an intermediate descriptor state.

With these fixes, the same code under test is exercised, but the tests no longer rely on unstable ordering or timing.

…Test#testDeploymentDescriptorFromKieContainer

…Test#testDeploymentDescriptorFromKieContainerWithDependency

…Test#testDeploymentDescriptorFromKieContainerWithDependencyMerged

xcai17 added 4 commits October 30, 2025 03:00

fix(jbpm-runtime-manager): fix flaky test testJmsAuditCacheInstance

03e0d1d

fix(jbpm-runtime-manager): fix flaky test DeploymentDescriptorManager…

229835f

…Test#testDeploymentDescriptorFromKieContainer

fix(jbpm-runtime-manager): fix flaky test DeploymentDescriptorManager…

693b31f

…Test#testDeploymentDescriptorFromKieContainerWithDependency

fix(jbpm-runtime-manager): fix flaky test DeploymentDescriptorManager…

e2c73f7

…Test#testDeploymentDescriptorFromKieContainerWithDependencyMerged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482

fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482

XiaoyangCai360 commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482

Are you sure you want to change the base?

fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482

Conversation

XiaoyangCai360 commented Oct 30, 2025

Summary

What was actually failing?

Root cause

1. Non-deterministic descriptor content / ordering

2. Timing / initialization of the audit cache (JMS audit test)

How to fix the tests

Validation

Why this matters

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant