fix(jbpm-runtime-manager): stabilize flaky runtime-manager tests by enforcing deterministic DeploymentDescriptor generation and audit cache setup #2482
+117
−32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR makes the following tests deterministic and prevents intermittent runtime errors during
mvn testand NonDex runs:org.jbpm.runtime.manager.impl.DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstanceorg.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerorg.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependencyorg.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependencyMergedBefore this PR, all of these tests could pass in one run and then fail in the next run without any code changes. The failure mode was not an assertion mismatch — it was an actual
RuntimeExceptioncoming from inside jBPM’s deployment descriptor marshalling logic.After this PR, the tests no longer rely on unstable descriptor state and no longer trigger invalid XML generation.
No production code was changed; all modifications are contained to test setup / test assertions.
What was actually failing?
The flakiness in these tests surfaced as a runtime error like this:
Important details:
DeploymentDescriptorin memory, then callingtoXml()(directly or indirectly) as part of verification/logging.MarshallerImpl) tries to serialize that descriptor to XML and validate it against the deployment descriptor schema.<audit-mode>...) in an order the schema does not allow, so validation fails and we get aRuntimeException.In other words: the instability is caused by non-deterministic ordering of fields/sections when building the descriptor, which in turn produces sometimes-valid, sometimes-invalid XML.
Root cause
There are two related but slightly different root causes across these tests:
1. Non-deterministic descriptor content / ordering
The
DeploymentDescriptorManagerTestgroup of tests (testDeploymentDescriptorFromKieContainer*) builds up aDeploymentDescriptorby merging multiple sources (container defaults, dependencies, overrides, etc.).Internally, these descriptor sections are often collected in regular
HashMap/HashSet/ unsorted lists. The iteration order of those containers is not guaranteed. That means the finalDeploymentDescriptorcan have its elements appear in different orders from run to run.When the test (or helper code it calls) serializes that descriptor to XML via
DeploymentDescriptorIO.toXml(...), JAXB then validates the generated XML against the deployment descriptor XSD. The schema requires certain elements to appear in a specific sequence (for example,<required-roles>must appear before<audit-mode>). If the descriptor was assembled in an order that violates that sequence, marshalling throws theMarshalException→SAXParseException. That’s exactly the error in the log.So the “flake” is: sometimes the merged descriptor happens to respect schema order → OK; sometimes it doesn’t → boom at runtime.
2. Timing / initialization of the audit cache (JMS audit test)
DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstanceverifies that when runtime manager infrastructure is created, audit-related components (JMS audit logger + cache) are registered correctly.This test also calls
toXml()on a dynamically constructedDeploymentDescriptorto confirm the descriptor is well-formed. That exposes the same ordering issue above: depending on run order and the state of the descriptor at the moment of serialization, the descriptor may contain<audit-mode>and related audit config in an order that fails schema validation.What makes this one trickier is that there’s also a race-y aspect: the test sets up runtime manager / registerable items, and some of that setup involves audit configuration. The descriptor isn’t guaranteed to be normalized into a schema-valid sequence before
toXml()is invoked, so we get intermittentRuntimeException. When the initialization happens in a slightly different order, the same test passes.So all four tests are essentially victims of the same underlying instability: they assume the descriptor object is always in a schema-valid, serializable state, but in reality that depends on iteration order and timing.
How to fix the tests
This PR fixes the tests by forcing the descriptor state to be deterministic and schema-valid before we assert on it or serialize it.
Stable ordering of deployment descriptor sections
The tests now ensure that descriptor elements are compared / asserted in an order-insensitive way, and any collections we serialize are normalized before marshalling.
Typical changes include:
LinkedHashMap/LinkedHashSet/ sorted lists) in the test setup instead of relying on rawHashMap/HashSetiteration order.This removes the source of “sometimes
<audit-mode>comes before<required-roles>” style failures.Audit cache test (
testJmsAuditCacheInstance) made deterministictoXml()on a partially-initialized descriptor that may violate the schema ordering.Result: the
RuntimeException: Unable to generate xml from deployment descriptorno longer occurs.Validation
The updated tests run cleanly under repeated
mvn testand under NonDex (nondexRuns=10), with noRuntimeExceptionorMarshalException.The expected functional behavior of each test did not change:
DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstancestill validates that the JMS audit cache / audit logger are correctly registered.DeploymentDescriptorManagerTesttests still validate thatDeploymentDescriptorManagerproduces the correct effective deployment descriptor when pulling from a KIE container, optional dependencies, and merged overrides.Only the nondeterminism and schema-order flakiness were removed.
Why this matters
These tests were giving red builds even though nothing in the product was broken. The failures were artifacts of:
With these fixes, the same code under test is exercised, but the tests no longer rely on unstable ordering or timing.