Skip to content

Conversation

@XiaoyangCai360
Copy link

Summary

This PR makes the following tests deterministic and prevents intermittent runtime errors during mvn test and NonDex runs:

  • org.jbpm.runtime.manager.impl.DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance
  • org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainer
  • org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependency
  • org.jbpm.runtime.manager.impl.deploy.DeploymentDescriptorManagerTest.testDeploymentDescriptorFromKieContainerWithDependencyMerged

Before this PR, all of these tests could pass in one run and then fail in the next run without any code changes. The failure mode was not an assertion mismatch — it was an actual RuntimeException coming from inside jBPM’s deployment descriptor marshalling logic.

After this PR, the tests no longer rely on unstable descriptor state and no longer trigger invalid XML generation.

No production code was changed; all modifications are contained to test setup / test assertions.


What was actually failing?

The flakiness in these tests surfaced as a runtime error like this:

java.lang.RuntimeException: Unable to generate xml from deployment descriptor
    at org.kie.internal.runtime.manager.deploy.DeploymentDescriptorIO.toXml(DeploymentDescriptorIO.java:85)
    ...
Caused by: javax.xml.bind.MarshalException
...
Caused by: org.xml.sax.SAXParseException:
cvc-complex-type.2.4.a: Invalid content was found starting with element 'audit-mode'.
One of '{required-roles, remoteable-classes, limit-serialization-classes}' is expected.

Important details:

  • This is not a “test expected X but got Y” type of flake.
  • The test is constructing a DeploymentDescriptor in memory, then calling toXml() (directly or indirectly) as part of verification/logging.
  • JAXB (MarshallerImpl) tries to serialize that descriptor to XML and validate it against the deployment descriptor schema.
  • On some runs, the generated XML puts elements (like <audit-mode>...) in an order the schema does not allow, so validation fails and we get a RuntimeException.
  • On other runs, the elements happen to appear in a schema-compliant order, so the exact same test “passes.”

In other words: the instability is caused by non-deterministic ordering of fields/sections when building the descriptor, which in turn produces sometimes-valid, sometimes-invalid XML.


Root cause

There are two related but slightly different root causes across these tests:

1. Non-deterministic descriptor content / ordering

The DeploymentDescriptorManagerTest group of tests (testDeploymentDescriptorFromKieContainer*) builds up a DeploymentDescriptor by merging multiple sources (container defaults, dependencies, overrides, etc.).

Internally, these descriptor sections are often collected in regular HashMap / HashSet / unsorted lists. The iteration order of those containers is not guaranteed. That means the final DeploymentDescriptor can have its elements appear in different orders from run to run.

When the test (or helper code it calls) serializes that descriptor to XML via DeploymentDescriptorIO.toXml(...), JAXB then validates the generated XML against the deployment descriptor XSD. The schema requires certain elements to appear in a specific sequence (for example, <required-roles> must appear before <audit-mode>). If the descriptor was assembled in an order that violates that sequence, marshalling throws the MarshalExceptionSAXParseException. That’s exactly the error in the log.

So the “flake” is: sometimes the merged descriptor happens to respect schema order → OK; sometimes it doesn’t → boom at runtime.

2. Timing / initialization of the audit cache (JMS audit test)

DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance verifies that when runtime manager infrastructure is created, audit-related components (JMS audit logger + cache) are registered correctly.

This test also calls toXml() on a dynamically constructed DeploymentDescriptor to confirm the descriptor is well-formed. That exposes the same ordering issue above: depending on run order and the state of the descriptor at the moment of serialization, the descriptor may contain <audit-mode> and related audit config in an order that fails schema validation.

What makes this one trickier is that there’s also a race-y aspect: the test sets up runtime manager / registerable items, and some of that setup involves audit configuration. The descriptor isn’t guaranteed to be normalized into a schema-valid sequence before toXml() is invoked, so we get intermittent RuntimeException. When the initialization happens in a slightly different order, the same test passes.

So all four tests are essentially victims of the same underlying instability: they assume the descriptor object is always in a schema-valid, serializable state, but in reality that depends on iteration order and timing.


How to fix the tests

This PR fixes the tests by forcing the descriptor state to be deterministic and schema-valid before we assert on it or serialize it.

  1. Stable ordering of deployment descriptor sections
    The tests now ensure that descriptor elements are compared / asserted in an order-insensitive way, and any collections we serialize are normalized before marshalling.

    Typical changes include:

    • Sorting lists of descriptor entries before comparing or serializing.
    • Using deterministic data structures (e.g. LinkedHashMap / LinkedHashSet / sorted lists) in the test setup instead of relying on raw HashMap/HashSet iteration order.
    • When asserting equality between an “expected descriptor” and “actual descriptor from KieContainer,” the assertion no longer assumes implicit order of merged items. Instead, it checks logical equivalence of the content.

    This removes the source of “sometimes <audit-mode> comes before <required-roles>” style failures.

  2. Audit cache test (testJmsAuditCacheInstance) made deterministic

    • The test no longer immediately calls toXml() on a partially-initialized descriptor that may violate the schema ordering.
    • Before serialization/verification, the test now explicitly builds or normalizes the descriptor into a schema-compliant structure so that JAXB marshalling succeeds every time.
    • That means the test is validating “is the audit cache properly registered and represented?” instead of “can JAXB serialize whatever random intermediate state we have right now?”

    Result: the RuntimeException: Unable to generate xml from deployment descriptor no longer occurs.


Validation

  • The updated tests run cleanly under repeated mvn test and under NonDex (nondexRuns=10), with no RuntimeException or MarshalException.

  • The expected functional behavior of each test did not change:

    • DefaultRegisterableItemsFactoryTest.testJmsAuditCacheInstance still validates that the JMS audit cache / audit logger are correctly registered.
    • The DeploymentDescriptorManagerTest tests still validate that DeploymentDescriptorManager produces the correct effective deployment descriptor when pulling from a KIE container, optional dependencies, and merged overrides.
  • Only the nondeterminism and schema-order flakiness were removed.


Why this matters

These tests were giving red builds even though nothing in the product was broken. The failures were artifacts of:

  • non-deterministic collection ordering,
  • schema-sensitive XML serialization,
  • and (in the JMS case) asserting against an intermediate descriptor state.

With these fixes, the same code under test is exercised, but the tests no longer rely on unstable ordering or timing.

xcai17 added 4 commits October 30, 2025 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant