Skip to content

Conversation

@Pigsy-Monk
Copy link
Contributor

What does this PR do?

This PR adds variable-length encoding serializers for long[] arrays in Java, which provides more space-efficient serialization for arrays containing many small values.

Changes:

  • Enhance LongArraySerializer with variable-length encoding support: Added supportVarLenEncoding parameter to LongArraySerializer constructor, allowing it to optionally use variable-length encoding when enabled.

  • Add comprehensive test cases:

    • testVariableLengthLongArray(): Tests serialization/deserialization of long arrays with various value ranges (empty, small, mixed, large, negative values)
    • testVariableLengthEncodingEfficiencyForSmallValues(): Demonstrates that variable-length encoding produces significantly smaller serialized data (50%+ reduction) for arrays containing many small values

Test Details:

  • testVariableLengthLongArray:

    • Tests empty arrays
    • Tests arrays with small values (0-255)
    • Tests arrays with mixed small and large values (including Long.MAX_VALUE and Long.MIN_VALUE)
    • Tests arrays with negative values
    • Tests large arrays (1000 elements) with many small values
  • testVariableLengthEncodingEfficiencyForSmallValues:

    • Compares serialization size between fixed-length encoding (8 bytes per long element) and variable-length encoding (1-2 bytes per small element)
    • Tests with arrays containing values 0-127 (optimal for variable-length encoding)
    • Tests with arrays containing values 0-1023 (still benefits from variable-length encoding)
    • Verifies at least 50% size reduction for small values
    • Outputs detailed efficiency metrics (bytes, bytes per element, percentage reduction)

Performance Benefits:

For arrays containing many small values:

  • Fixed-length encoding: 8 bytes per long element + overhead
  • Variable-length encoding: 1-2 bytes per small long element + overhead
  • Space savings: Up to 75%+ reduction for arrays with values in the 0-127 range

Related Issues:

  • Addresses the need for more efficient serialization of primitive arrays with small values
  • Enables space-optimized serialization for use cases like sparse arrays, indices, counters, etc.

Does this PR introduce any user-facing change?

No, this PR only adds new serializer classes and test cases. The default serializers remain unchanged. Users can opt-in to variable-length encoding by using the enhanced LongArraySerializer with supportVarLenEncoding=true.

@Pigsy-Monk Pigsy-Monk changed the title long array serializer support var len encoding feat(java): long array serializer support var len encoding Jan 8, 2026
@chaokunyang
Copy link
Collaborator

@Pigsy-Monk We have org.apache.fory.serializer.CompressedArraySerializers, could you use this serializer instead?

@Pigsy-Monk
Copy link
Contributor Author

Pigsy-Monk commented Jan 8, 2026

Hi @chaokunyang , I have researched the CompressedLongArraySerializer algorithm. My understanding is that it assumes all elements in the array are compressible, and that the achievable compression ratio is limited by the largest element in the array. In my use case, the majority of the long values are small, with only a small fraction being large.
In addition, CompressedLongArraySerializer is recommended for compressing large arrays (size > 512), whereas in my use case there are many small arrays, often well below that size.

@chaokunyang chaokunyang changed the title feat(java): long array serializer support var len encoding feat(java): long array serializer support varint encoding Jan 8, 2026
@chaokunyang
Copy link
Collaborator

@Pigsy-Monk Could you fix code styel errors?

@Pigsy-Monk
Copy link
Contributor Author

Sure I am trying to figure out where it went wrong.

this(fory, false);
}

public LongArraySerializer(Fory fory, boolean supportVarLenEncoding) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForyConfig has compressIntArray and compressLongArray options, how about use that directly, and for LongArraySerializer, we compress based on LongEncoding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds better. I will submit a commit.

}

if(fory.getConfig().compressLongArray()){
return readVarLongs(buffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you switch on LongEncoding to invoke different functions to use different comrpession algorithms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am working on it.

int length = value.length;
buffer.writeVarUint32Small7(length);
for (int i = 0; i < length; i++) {
PrimitiveSerializers.LongSerializer.writeInt64(buffer, value[i], longEncoding);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have two functions to move the switch on LongEncoding outside the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private void writeInt64s(MemoryBuffer buffer, long[] value, LongEncoding longEncoding) {
  int length = value.length;
  buffer.writeVarUint32Small7(length);
  
  if(longEncoding == LongEncoding.SLI){
    for (int i = 0; i < length; i++) {
      buffer.writeSliInt64(value[i]);
    }
    return;
  }      
  for (int i = 0; i < length; i++) {
    buffer.writeVarInt64(value[i]);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I get you right. Do you mean something like this?

Copy link
Collaborator

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would you like to open another PR to add varint compression to int[] array?

@Pigsy-Monk
Copy link
Contributor Author

Yes that's what i plan to do after merge this one.

@chaokunyang
Copy link
Collaborator

@Pigsy-Monk Please fix checkstyle error

@Pigsy-Monk
Copy link
Contributor Author

Sure, I am trying to figure out where it went wrong.

@Pigsy-Monk
Copy link
Contributor Author

It would been easier if the error message indicates the line number.

@Pigsy-Monk
Copy link
Contributor Author

@chaokunyang Could you please tell me which plugin you use to check the code style?

@chaokunyang
Copy link
Collaborator

chaokunyang commented Jan 9, 2026

@chaokunyang Could you please tell me which plugin you use to check the code style?

You can just run ci/format.sh --java

And see maven pom.xml for detailed plugin we used. We use spotless For code format

@Pigsy-Monk
Copy link
Contributor Author

Thanks. I run ci/format.sh --java. Hope it work this time.

@Pigsy-Monk
Copy link
Contributor Author

Hope it works this time.

@chaokunyang chaokunyang merged commit 3b3f17d into apache:main Jan 9, 2026
53 checks passed
@Pigsy-Monk Pigsy-Monk deleted the zwsong branch January 10, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants