Skip to content

fix(server): keep schema ~create_time userdata as Date after reload#3026

Open
dpol1 wants to merge 5 commits into
apache:masterfrom
dpol1:fix/3013-create-time-userdata-roundtrip
Open

fix(server): keep schema ~create_time userdata as Date after reload#3026
dpol1 wants to merge 5 commits into
apache:masterfrom
dpol1:fix/3013-create-time-userdata-roundtrip

Conversation

@dpol1
Copy link
Copy Markdown
Contributor

@dpol1 dpol1 commented May 14, 2026

  • close [Bug] schema ~create_time userdata round-trips Date as String after cache reload #3013

    Userdata.CREATE_TIME ("~create_time") is written as java.util.Date
    by SchemaTransaction.setCreateTimeIfNeeded(), then persisted through backend
    serializers as JSON. On reload, userdata is deserialized through raw
    Map.class paths, so Jackson cannot restore the value type and
    schemaElement.userdata().get(Userdata.CREATE_TIME) can come back as String.

    That breaks callers and existing tests that expect ~create_time to keep the
    server-side Date contract after a schema is reloaded from backend storage.

    Main Changes

    • Added a single SchemaElement.normalizeUserdataValue(key, value) helper.

      • Converts Userdata.CREATE_TIME string values back with DateUtil.parse(...).
      • Leaves existing Date values and all other userdata keys unchanged.
      • Throws a clearer IllegalArgumentException if a ~create_time string is invalid.
    • Routed both userdata setters through the helper:

      • userdata(String, Object) covers serializer reload paths such as
        TextSerializer, BinarySerializer, MysqlSerializer, and
        CassandraSerializer.
      • userdata(Userdata) covers schema fromMap() hydration paths for
        PropertyKey, VertexLabel, EdgeLabel, and IndexLabel.
    • Added a null check to userdata(Userdata) for consistent argument validation.

    • Added regression coverage for schema userdata normalization and serializer
      round trips.

    This PR is intentionally limited to the server-side SchemaElement. The mirror
    class in hugegraph-struct has a similar userdata shape and can be handled in a
    separate follow-up PR.

    Verifying these changes

    • Trivial rework / code cleanup without any test coverage. (No Need)
    • Already covered by existing tests, such as (please modify tests here).
    • Need tests and can be verified as follows:
      • SchemaElementTest covers single-key normalization, bulk normalization,
        idempotency for Date, invalid ~create_time strings, null bulk userdata,
        and non-CREATE_TIME keys.
      • TextSerializerTest#testPropertyKeyUserdataCreateTimeRoundTripsAsDate
        verifies the text serializer schema round trip keeps CREATE_TIME as Date.
      • BinarySerializerTest#testPropertyKeyUserdataCreateTimeRoundTripsAsDate
        verifies the binary serializer schema round trip keeps CREATE_TIME as Date.
      • The new test class is registered in UnitTestSuite.

    Does this PR potentially affect the following parts?

    The public SchemaElement.userdata(...) signatures are unchanged. The behavior
    change is limited to internal userdata normalization for Userdata.CREATE_TIME.

    Documentation Status

    • Doc - TODO
    • Doc - Done
    • Doc - No Need

Notes

  • This PR fixes the server-side schema reload path for Userdata.CREATE_TIME (~create_time) only.
  • ~create_time is a server-reserved userdata key; valid string values are normalized back to java.util.Date, while malformed values now fail with IllegalArgumentException.
  • Follow-up: [Bug] Preserve typed DEFAULT_VALUE and struct userdata after JSON reload #3028 tracks preserving typed Userdata.DEFAULT_VALUE (~default_value) reloads and checking the same raw userdata deserialization gap in hugegraph-struct.

Normalize server-side schema ~create_time userdata in SchemaElement so serializer reloads and fromMap paths keep the Date contract.

Add SchemaElement, TextSerializer, and BinarySerializer coverage.

Closes apache#3013
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working tests Add or improve test cases labels May 14, 2026
@imbajin imbajin requested a review from Copilot May 14, 2026 14:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a schema reload regression where internal schema userdata Userdata.CREATE_TIME ("~create_time") can round-trip through backend JSON serializers as a String and come back as String after reload, violating the server-side contract that it should be a java.util.Date.

Changes:

  • Added SchemaElement.normalizeUserdataValue(key, value) to re-type ~create_time string values back into Date (and throw a clearer error on invalid values).
  • Routed both SchemaElement.userdata(String, Object) and SchemaElement.userdata(Userdata) through the normalization logic (and added a null-argument check for the bulk setter).
  • Added regression tests for normalization behavior and for Text/Binary serializer round-trips, and registered them in UnitTestSuite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/schema/SchemaElement.java Normalizes ~create_time userdata on set (single and bulk), ensuring reload paths restore Date instead of String.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/core/SchemaElementTest.java Unit coverage for normalization behavior, idempotency, invalid strings, and null bulk userdata validation.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/serializer/TextSerializerTest.java Regression test verifying TextSerializer schema round-trip keeps CREATE_TIME as Date.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/serializer/BinarySerializerTest.java Regression test verifying BinarySerializer schema round-trip keeps CREATE_TIME as Date.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/UnitTestSuite.java Registers the new unit tests in the suite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed correctness, design, and tests. No blocking issues — fix is correct and narrowly scoped, all backend reload paths (Text / Binary / Mysql / Cassandra / HBase / Postgres / Palo / Hstore + the four fromMap paths) route through the new helper, and DateUtil.parse is symmetric with the "yyyy-MM-dd HH:mm:ss.SSS" format used by JsonUtil/HugeGraphSONModule. Inline comments cover important refinements (helper placement, public setter docs, test depth). A few additional minor notes below that don't have a clean inline target:

  • 🧹 removeUserdata(Userdata) (SchemaElement.java:114-118) lacks a null check — adding E.checkArgumentNotNull(userdata, ...) keeps the bulk setters symmetric with the new check on userdata(Userdata).
  • 🧹 Behavior change worth a release note: any REST client that posts ~create_time as a JSON string in user-data now silently receives a Date (or an IllegalArgumentException if malformed). ~create_time is a server-reserved key (~ prefix, written by SchemaTransaction.setCreateTimeIfNeeded), so risk is small, but flagging it in the PR body / changelog is helpful.
  • 🧹 Follow-up tracking: Userdata.DEFAULT_VALUE has the same Jackson type-erasure problem (e.g., a property key whose data type is Date will reload its default value as String). The PR description already defers the hugegraph-struct mirror — please add DEFAULT_VALUE to the same follow-up list, since the root cause and fix shape are identical.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.71%. Comparing base (e108076) to head (11adabf).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ain/java/org/apache/hugegraph/schema/Userdata.java 77.77% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (e108076) and HEAD (11adabf). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (e108076) HEAD (11adabf)
3 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3026      +/-   ##
============================================
- Coverage     35.85%   29.71%   -6.14%     
+ Complexity      338      264      -74     
============================================
  Files           802      803       +1     
  Lines         67995    68052      +57     
  Branches       8902     8908       +6     
============================================
- Hits          24381    20225    -4156     
- Misses        41008    45506    +4498     
+ Partials       2606     2321     -285     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dpol1 dpol1 requested a review from imbajin May 15, 2026 11:12
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues. The core fix (normalizing CREATE_TIME String→Date on deserialization) is correct and the main round-trip paths are tested. Four non-blocking observations below.

"was " + (value == null ? "null" : value.getClass()),
value instanceof Date);
Assert.assertEquals(created, value);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The PR description states that MysqlSerializer and CassandraSerializer are also affected by the same CREATE_TIME deserialization gap:

"userdata(String, Object) covers serializer reload paths such as TextSerializer, BinarySerializer, MysqlSerializer, and CassandraSerializer"

Only TextSerializer and BinarySerializer have round-trip tests added. Adding equivalent testPropertyKeyUserdataCreateTimeRoundTripsAsDate tests for MysqlSerializer and CassandraSerializer would close this coverage gap for the stated fix surface.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off, with reasoning:

MysqlSerializer.readUserdata (L167) and CassandraSerializer.readUserdata
(L222) both call the same schema.userdata(key, value) setter as
TextSerializer/BinarySerializer. That setter now routes through the
single Userdata.put normalization path, which is already exercised by:

  • TextSerializerTest / BinarySerializerTest round-trips
  • SchemaElementTest (constructor, single setter, bulk, fromMap)

There's no existing unit-test harness for the SQL serializers — adding one
means backend/mock infra for zero new code paths. The marginal coverage
doesn't justify it.

Better closure for the shared logic: a direct UserdataTest.normalizeValue
unit test, serializer-agnostic. Happy to add that as a follow-up; proposing
not to gate this PR on it. Pushback welcome if you see a path difference
I'm missing.

"was " + (value == null ? "null" : value.getClass()),
value instanceof Date);
Assert.assertEquals(created, value);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Only PropertyKey is round-tripped here. VertexLabel, EdgeLabel, and IndexLabel each have their own writeXxx/readXxx methods and reach the normalization through the bulk setter (fromMapuserdata(Userdata)), a different path from PropertyKey (single-key setter). A regression in those paths would not be caught by the existing tests.

Would it be possible to add at minimum one VertexLabel round-trip test here as the representative bulk-setter path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imbajin not quite — readUserdata is a single shared method (L909).
readVertexLabel/readEdgeLabel/readPropertyKey/readIndexLabel all call it, and it uses the single-key setter for every type. So a VertexLabel round-trip here runs the exact same path as PropertyKey, not the bulk one.

The bulk path (fromMap → userdata(Userdata)) is the schema-builder side, already pinned by SchemaElementTest#testVertexLabelFromMapNormalizesCreateTimeString. And after this PR both setters go through Userdata.put, so it's one normalization point. IMHO a VertexLabel serializer test would just re-run the identical loop, no new path covered. I'll still add one if you want the readVertexLabel wiring guarded — just say so.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree that the serializer round-trip path is shared here. A VertexLabel serializer round-trip would mostly re-run the same readUserdata() loop, so I do not think we need to insist on that test.

I found a separate edge case while re-checking the current implementation: moving normalization into Userdata.put() also affects eliminate inputs, where the value is only a placeholder and removeUserdata(Userdata) deletes by key.

schema.xxx(...)
  .userdata(Userdata.CREATE_TIME, "")
  .eliminate()
      |
      v
Userdata.put("~create_time", "")
      |
      v
DateUtil.parse("") -> IllegalArgumentException
      |
      v
never reaches removeUserdata(...)

Existing eliminate tests use placeholder values like .userdata("icon", "").eliminate() / .userdata("max", "").eliminate(), so this seems like a real behavior change for deleting ~create_time via the builder or REST remove action.

Could we either skip normalization for eliminate/delete inputs, or explicitly reject removal of Userdata.CREATE_TIME with a dedicated check? Right now it fails as a date parse error before the eliminate path can apply its key-only semantics.

@dpol1 dpol1 requested a review from imbajin May 18, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files. tests Add or improve test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] schema ~create_time userdata round-trips Date as String after cache reload

3 participants