Skip to content

Commit b7a6058

Browse files
committed
Merge branch 'master' of github.com:pmmp/ext-encoding
2 parents 7ec650a + 62b10f3 commit b7a6058

File tree

1 file changed

+53
-54
lines changed

1 file changed

+53
-54
lines changed

README.md

Lines changed: 53 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,86 +1,59 @@
11
# ext-encoding
2-
This extension implements a `ByteBuffer` class, a high-performance alternative for [`pocketmine/binaryutils`](https://github.com/pmmp/BinaryUtils).
2+
This extension provides high-performance raw data encoding & decoding utilities for PHP.
33

4-
## :warning: This extension is EXPERIMENTAL
4+
It was designed to supersede [`pocketmine/binaryutils`](https://github.com/pmmp/BinaryUtils) and the painfully slow PHP functions [`pack()`](https://www.php.net/manual/en/function.pack.php) and [`unpack()`](https://www.php.net/manual/en/function.unpack.php).
55

6-
There is a high likelihood that the extension's classes may crash or behave incorrectly.
7-
Do not use this extension for anything you don't want to get horribly corrupted.
6+
## Real-world performance tests
7+
- [`pocketmine/nbt`](https://github.com/pmmp/NBT) was tested with release 0.2.1, and showed 1.5x read and 2x write performance with some basic synthetic tests.
88

99
## API
1010
A recent IDE stub can usually be found in our [custom stubs repository](https://github.com/pmmp/phpstorm-stubs/blob/fork/encoding/encoding.php).
1111

12-
#### :warning: The API design is not yet finalized. Everything is still subject to change prior to the 1.0.0 release.
13-
14-
### FAQs about API design
15-
#### Why are there SO MANY functions? Why not just accept something like `bool $signed, ByteOrder $order` parameters?
16-
17-
Runtime parameters would mean that these hot encoding paths would need to branch to decide how to encode everything. Branching is slow, so we want to avoid that.
18-
19-
Internally, we actually only have a handful number of functions (defined in `Serializers.h`), which use C++ templates to inject type, signedness, and byte order arguments.
20-
This allows the compiler to expand these templates into optimised branchless native functions for each `(type, signed, byte order)` combination.
21-
So, the C++ side is actually quite clean.
12+
> [!WARNING]
13+
> The API design is not yet finalized. Everything is still subject to change prior to the 1.0.0 release.
2214
23-
Now about the PHP part. Parsing arguments in PHP is slow, because we have to do dynamic type verification everywhere. This adds a significant extra amount of potential branching per argument.
24-
Since PHP doesn't have anything akin to C++ templates (or generics more generally), the only option is to generate a separate PHP function for every combination of `(type, signed, byte order)`.
25-
This way, the knowledge about signedness and byte-order is baked into the function name (basically a worse version of C++ templates).
26-
27-
The downside of this is that we can't use `.stub.php` files to generate arginfo, so the IDE stubs have to be generated from the extension using [extension-stub-generator](https://github.com/pmmp/extension-stub-generator).
28-
Also, you'll probably need eye bleach after seeing the [macros that generate the function matrix](https://github.com/pmmp/ext-encoding/blob/bfcc8243f1037d37efea53444dc17c11bd2d47df/classes/Types.cpp#L246-L365).
29-
30-
However, considering how critical binary data handling is to performance in PocketMine-MP, this is a trade absolutely worth making.
15+
> [!NOTE]
16+
> Although `ext-encoding` was built as a replacement to `pocketmine/binaryutils`, it is *not* a drop-in replacement.
17+
> Its API is completely different and incompatible.
3118
32-
#### Why static methods instead of `ByteBuffer` instance methods?
19+
The new API has been designed with the lessons learned from `pocketmine/binaryutils` in mind. Most notably:
20+
- Readers and writers have fully separated APIs - no more accidentally writing while intending to read or vice versa
21+
- Endian-reversible types are implemented in `LE::` and `BE::` static methods, which avoids accidentally using the wrong byte order
22+
- All integer-accepting and returning functions explicitly state whether they work with `Signed` or `Unsigned` integers
3323

34-
Two reasons:
35-
- As described above, the static `read`/`write` methods can't be generated using `.stub.php` files. If we put the generated functions in `ByteBuffer` itself, we'd be unable to use a `.stub.php` file to define the rest of its non-generated API.
36-
- For reversible fixed-size types in particular: Since byte order is decided by the very first character you type (and you have to import `BE` or `LE` to use those functions), you won't get caught out by IDE auto completions and accidentally mix byte order without noticing. This is a lesson hard learned from years of `BinaryStream` use, where a sneaky misplaced `L` can make or break a packet.
37-
38-
I may yet change the design again, anyway, so don't get too comfortable with it :^)
39-
40-
#### Why fully specify `Signed` or `Unsigned` in every function name? Why not just have e.g. `readInt()` and `readUint()`?
41-
42-
I still might change this yet.
43-
44-
This library's first users will be people moving from `BinaryStream`, where the API is infamous for being inconsistent about signedness when not specified (https://github.com/pmmp/BinaryUtils/issues/15). For example, `getShort()` is unsigned, and `getInt()` is signed.
45-
46-
I felt that it was better to be verbose to force developers to think about whether to use a signed or an unsigned type when migrating old code.
47-
48-
## Real-world performance tests
49-
- [`pocketmine/nbt`](https://github.com/pmmp/NBT) was tested with release 0.2.1, and showed 1.5x read and 2x write performance with some basic synthetic tests.
50-
51-
## Why is `BinaryStream` and generally `pocketmine/binaryutils` so slow?
24+
## FAQs
25+
### Why are `BinaryStream` and generally `pocketmine/binaryutils` so slow?
5226
- [VarInt encode/decode](#varint-encodedecode)
5327
- [`pack()` and `unpack()`](#pack-and-unpack)
5428
- [Linear buffer reallocations](#linear-buffer-reallocations)
55-
- [Array-of-type](#foreach-array-of-type)
29+
- [Array-of-type](#array-of-type)
5630

57-
### VarInt encode/decode
58-
VarInts are heavily used by the Bedrock protocol, the theory being to reduce the size of integer types on the wire.
59-
This format is borrowed from [protobuf](https://developers.google.com/protocol-buffers/docs/encoding).
31+
#### VarInt encode/decode
32+
VarInts are heavily used by the Bedrock protocol. This format is borrowed from [protobuf](https://developers.google.com/protocol-buffers/docs/encoding).
6033

61-
Implemented in PHP, it's abysmally slow, due to repeated calls to `chr()` and `ord()` in a loop, as well as needing workarounds for PHP's lack of logical rightshift.
34+
There's no fast way to implement them in pure PHP. They require repeated calls to `chr()` and `ord()` in a loop, as well as needing workarounds for PHP's lack of logical rightshift.
6235

63-
Compared to `BinaryStream`, `ByteBuffer` offers a performance improvement of 5-10x (depending on the size of the value and other conditions, YMMV) with both signed and unsigned varints.
36+
Compared to `BinaryStream`, this extension's `VarInt::` functions offer a performance improvement of 5-10x (depending on the size of the value and other conditions, YMMV) with both signed and unsigned varints.
6437

65-
This is extremely significant for PocketMine-MP due to the number of hot paths affected by such a performance gain (e.g. chunk encoding will benefit significantly).
38+
This will significantly improve performance in PocketMine-MP when integrated. For example, chunk encoding will become significantly faster, and encoding & decoding of almost all packets will benefit too.
6639

67-
### `pack()` and `unpack()`
68-
Under a profiler, it becomes obvious that PHP's `pack()` and `unpack()` functions are abysmally slow, due to the cost of parsing the formatting code argument.
69-
This parsing takes over 90% of the time spent in `pack()` and `unpack()`.
40+
#### `pack()` and `unpack()`
41+
PHP's [`pack()`](https://www.php.net/manual/en/function.pack.php) and [`unpack()`](https://www.php.net/manual/en/function.unpack.php) functions are abysmally slow.
42+
Parsing the formatting code argument takes over 90% of the time spent in these functions.
7043
This overhead can be easily avoided when the types of data used are known in advance.
7144

7245
This extension implements specialized functions for writing big and little endian byte/short/int/long/float/double.
7346
Depending on the type and other factors, these functions typically show a 3-4x performance improvement compared to `BinaryStream`.
7447

75-
### Linear buffer reallocations
48+
#### Linear buffer reallocations
7649
`BinaryStream` and similar PHP-land byte-buffer implementations often use strings and use the `.=` concatenation operator.
7750
This is problematic, because the entire string will be reallocated every time something is appended to it.
7851
While this isn't a big issue for small buffers, the performance of writing to large buffers progressively degrades.
7952

80-
`ByteBuffer` uses exponential scaling (factor of 2) to minimize buffer reallocations at the cost of potentially wasting some memory.
53+
`ByteBufferWriter` uses exponential scaling (factor of 2) to minimize buffer reallocations at the cost of potentially wasting some memory.
8154
This means that the internal buffer size is doubled when the buffer runs out of space.
8255

83-
### Array-of-type
56+
#### Array-of-type
8457
All the above problems contribute to this one, in addition to:
8558
- Extra function call overhead
8659
- Dealing with PHP `HashTable` structures is generally slow (a problem not solved by this extension currently)
@@ -90,3 +63,29 @@ The most obvious cases where this will benefit PocketMine-MP are in `LevelChunkP
9063

9164
In the future it'll probably make sense to add PHP wrappers for native array-of-type (e.g. `IntArray`, `LongArray` etc) so that we can avoid the performance and memory usage penalties
9265
of dealing with large primitive arrays at runtime.
66+
67+
### Why are there SO MANY functions? Why not just accept something like `bool $signed, ByteOrder $order` parameters?
68+
69+
Runtime parameters would mean that these hot encoding paths would need to branch to decide how to encode everything. Branching is slow, so we want to avoid that.
70+
71+
Internally, we only have a handful of functions (defined in `Serializers.h`), which use C++ templates to inject type, signedness, and byte order arguments.
72+
The compiler expands these templates into optimised branchless native functions for each `(type, signed, byte order)` combination.
73+
74+
In addition, parsing arguments in PHP is slow, and since PHP doesn't have anything akin to C++ templates (or generics more generally), the only option to get compile-time knowledge of byte order and signedness is to bake them into the function name. There is a function for every combination of `(type, signed, byte order)`.
75+
76+
The downside of this is that we can't use `.stub.php` files to generate arginfo, so the IDE stubs have to be generated from the extension using [extension-stub-generator](https://github.com/pmmp/extension-stub-generator).
77+
Also, you'll probably need eye bleach after seeing the [macros that generate the function matrix](https://github.com/pmmp/ext-encoding/blob/bfcc8243f1037d37efea53444dc17c11bd2d47df/classes/Types.cpp#L246-L365).
78+
79+
However, considering how critical binary data handling is to performance in PocketMine-MP, this is a trade absolutely worth making.
80+
81+
### Why static methods instead of `ByteBuffer(Reader|Writer)` instance methods?
82+
83+
Two reasons:
84+
- As described above, the static `read`/`write` methods can't be generated using `.stub.php` files. If we put the generated functions in `ByteBufferReader`/`ByteBufferWriter`, we'd be unable to use a `.stub.php` file to define the rest of its non-generated API.
85+
- I've made too many mistakes with byte order due to IDE auto complete. With this API design, byte order is decided by the very first character you type, so auto complete can't trip you up (and you have to import `BE` or `LE`).
86+
87+
### Why fully specify `Signed` or `Unsigned` in every function name? Why not just have e.g. `readInt()` and `readUint()`?
88+
89+
This library's first users will be people moving from `BinaryStream`, where the API is infamous for being inconsistent about signedness when not specified (https://github.com/pmmp/BinaryUtils/issues/15). For example, `getShort()` is unsigned, and `getInt()` is signed.
90+
91+
I felt that it was better to be verbose to force developers to think about whether to use a signed or an unsigned type when migrating old code.

0 commit comments

Comments
 (0)