You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This extension implements a `ByteBuffer` class, a high-performance alternative for [`pocketmine/binaryutils`](https://github.com/pmmp/BinaryUtils).
2
+
This extension provides high-performance raw data encoding & decoding utilities for PHP.
3
3
4
-
## :warning: This extension is EXPERIMENTAL
4
+
It was designed to supersede [`pocketmine/binaryutils`](https://github.com/pmmp/BinaryUtils) and the painfully slow PHP functions [`pack()`](https://www.php.net/manual/en/function.pack.php) and [`unpack()`](https://www.php.net/manual/en/function.unpack.php).
5
5
6
-
There is a high likelihood that the extension's classes may crash or behave incorrectly.
7
-
Do not use this extension for anything you don't want to get horribly corrupted.
6
+
## Real-world performance tests
7
+
-[`pocketmine/nbt`](https://github.com/pmmp/NBT) was tested with release 0.2.1, and showed 1.5x read and 2x write performance with some basic synthetic tests.
8
8
9
9
## API
10
10
A recent IDE stub can usually be found in our [custom stubs repository](https://github.com/pmmp/phpstorm-stubs/blob/fork/encoding/encoding.php).
11
11
12
-
#### :warning: The API design is not yet finalized. Everything is still subject to change prior to the 1.0.0 release.
13
-
14
-
### FAQs about API design
15
-
#### Why are there SO MANY functions? Why not just accept something like `bool $signed, ByteOrder $order` parameters?
16
-
17
-
Runtime parameters would mean that these hot encoding paths would need to branch to decide how to encode everything. Branching is slow, so we want to avoid that.
18
-
19
-
Internally, we actually only have a handful number of functions (defined in `Serializers.h`), which use C++ templates to inject type, signedness, and byte order arguments.
20
-
This allows the compiler to expand these templates into optimised branchless native functions for each `(type, signed, byte order)` combination.
21
-
So, the C++ side is actually quite clean.
12
+
> [!WARNING]
13
+
> The API design is not yet finalized. Everything is still subject to change prior to the 1.0.0 release.
22
14
23
-
Now about the PHP part. Parsing arguments in PHP is slow, because we have to do dynamic type verification everywhere. This adds a significant extra amount of potential branching per argument.
24
-
Since PHP doesn't have anything akin to C++ templates (or generics more generally), the only option is to generate a separate PHP function for every combination of `(type, signed, byte order)`.
25
-
This way, the knowledge about signedness and byte-order is baked into the function name (basically a worse version of C++ templates).
26
-
27
-
The downside of this is that we can't use `.stub.php` files to generate arginfo, so the IDE stubs have to be generated from the extension using [extension-stub-generator](https://github.com/pmmp/extension-stub-generator).
28
-
Also, you'll probably need eye bleach after seeing the [macros that generate the function matrix](https://github.com/pmmp/ext-encoding/blob/bfcc8243f1037d37efea53444dc17c11bd2d47df/classes/Types.cpp#L246-L365).
29
-
30
-
However, considering how critical binary data handling is to performance in PocketMine-MP, this is a trade absolutely worth making.
15
+
> [!NOTE]
16
+
> Although `ext-encoding` was built as a replacement to `pocketmine/binaryutils`, it is *not* a drop-in replacement.
17
+
> Its API is completely different and incompatible.
31
18
32
-
#### Why static methods instead of `ByteBuffer` instance methods?
19
+
The new API has been designed with the lessons learned from `pocketmine/binaryutils` in mind. Most notably:
20
+
- Readers and writers have fully separated APIs - no more accidentally writing while intending to read or vice versa
21
+
- Endian-reversible types are implemented in `LE::` and `BE::` static methods, which avoids accidentally using the wrong byte order
22
+
- All integer-accepting and returning functions explicitly state whether they work with `Signed` or `Unsigned` integers
33
23
34
-
Two reasons:
35
-
- As described above, the static `read`/`write` methods can't be generated using `.stub.php` files. If we put the generated functions in `ByteBuffer` itself, we'd be unable to use a `.stub.php` file to define the rest of its non-generated API.
36
-
- For reversible fixed-size types in particular: Since byte order is decided by the very first character you type (and you have to import `BE` or `LE` to use those functions), you won't get caught out by IDE auto completions and accidentally mix byte order without noticing. This is a lesson hard learned from years of `BinaryStream` use, where a sneaky misplaced `L` can make or break a packet.
37
-
38
-
I may yet change the design again, anyway, so don't get too comfortable with it :^)
39
-
40
-
#### Why fully specify `Signed` or `Unsigned` in every function name? Why not just have e.g. `readInt()` and `readUint()`?
41
-
42
-
I still might change this yet.
43
-
44
-
This library's first users will be people moving from `BinaryStream`, where the API is infamous for being inconsistent about signedness when not specified (https://github.com/pmmp/BinaryUtils/issues/15). For example, `getShort()` is unsigned, and `getInt()` is signed.
45
-
46
-
I felt that it was better to be verbose to force developers to think about whether to use a signed or an unsigned type when migrating old code.
47
-
48
-
## Real-world performance tests
49
-
-[`pocketmine/nbt`](https://github.com/pmmp/NBT) was tested with release 0.2.1, and showed 1.5x read and 2x write performance with some basic synthetic tests.
50
-
51
-
## Why is `BinaryStream` and generally `pocketmine/binaryutils` so slow?
24
+
## FAQs
25
+
### Why are `BinaryStream` and generally `pocketmine/binaryutils` so slow?
VarInts are heavily used by the Bedrock protocol, the theory being to reduce the size of integer types on the wire.
59
-
This format is borrowed from [protobuf](https://developers.google.com/protocol-buffers/docs/encoding).
31
+
#### VarInt encode/decode
32
+
VarInts are heavily used by the Bedrock protocol. This format is borrowed from [protobuf](https://developers.google.com/protocol-buffers/docs/encoding).
60
33
61
-
Implemented in PHP, it's abysmally slow, due to repeated calls to `chr()` and `ord()` in a loop, as well as needing workarounds for PHP's lack of logical rightshift.
34
+
There's no fast way to implement them in pure PHP. They require repeated calls to `chr()` and `ord()` in a loop, as well as needing workarounds for PHP's lack of logical rightshift.
62
35
63
-
Compared to `BinaryStream`, `ByteBuffer` offers a performance improvement of 5-10x (depending on the size of the value and other conditions, YMMV) with both signed and unsigned varints.
36
+
Compared to `BinaryStream`, this extension's `VarInt::` functions offer a performance improvement of 5-10x (depending on the size of the value and other conditions, YMMV) with both signed and unsigned varints.
64
37
65
-
This is extremely significant for PocketMine-MP due to the number of hot paths affected by such a performance gain (e.g. chunk encoding will benefit significantly).
38
+
This will significantly improve performance in PocketMine-MP when integrated. For example, chunk encoding will become significantly faster, and encoding & decoding of almost all packets will benefit too.
66
39
67
-
### `pack()` and `unpack()`
68
-
Under a profiler, it becomes obvious that PHP's `pack()` and `unpack()` functions are abysmally slow, due to the cost of parsing the formatting code argument.
69
-
This parsing takes over 90% of the time spent in `pack()` and `unpack()`.
40
+
####`pack()` and `unpack()`
41
+
PHP's [`pack()`](https://www.php.net/manual/en/function.pack.php) and [`unpack()`](https://www.php.net/manual/en/function.unpack.php) functions are abysmally slow.
42
+
Parsing the formatting code argument takes over 90% of the time spent in these functions.
70
43
This overhead can be easily avoided when the types of data used are known in advance.
71
44
72
45
This extension implements specialized functions for writing big and little endian byte/short/int/long/float/double.
73
46
Depending on the type and other factors, these functions typically show a 3-4x performance improvement compared to `BinaryStream`.
74
47
75
-
### Linear buffer reallocations
48
+
####Linear buffer reallocations
76
49
`BinaryStream` and similar PHP-land byte-buffer implementations often use strings and use the `.=` concatenation operator.
77
50
This is problematic, because the entire string will be reallocated every time something is appended to it.
78
51
While this isn't a big issue for small buffers, the performance of writing to large buffers progressively degrades.
79
52
80
-
`ByteBuffer` uses exponential scaling (factor of 2) to minimize buffer reallocations at the cost of potentially wasting some memory.
53
+
`ByteBufferWriter` uses exponential scaling (factor of 2) to minimize buffer reallocations at the cost of potentially wasting some memory.
81
54
This means that the internal buffer size is doubled when the buffer runs out of space.
82
55
83
-
### Array-of-type
56
+
####Array-of-type
84
57
All the above problems contribute to this one, in addition to:
85
58
- Extra function call overhead
86
59
- Dealing with PHP `HashTable` structures is generally slow (a problem not solved by this extension currently)
@@ -90,3 +63,29 @@ The most obvious cases where this will benefit PocketMine-MP are in `LevelChunkP
90
63
91
64
In the future it'll probably make sense to add PHP wrappers for native array-of-type (e.g. `IntArray`, `LongArray` etc) so that we can avoid the performance and memory usage penalties
92
65
of dealing with large primitive arrays at runtime.
66
+
67
+
### Why are there SO MANY functions? Why not just accept something like `bool $signed, ByteOrder $order` parameters?
68
+
69
+
Runtime parameters would mean that these hot encoding paths would need to branch to decide how to encode everything. Branching is slow, so we want to avoid that.
70
+
71
+
Internally, we only have a handful of functions (defined in `Serializers.h`), which use C++ templates to inject type, signedness, and byte order arguments.
72
+
The compiler expands these templates into optimised branchless native functions for each `(type, signed, byte order)` combination.
73
+
74
+
In addition, parsing arguments in PHP is slow, and since PHP doesn't have anything akin to C++ templates (or generics more generally), the only option to get compile-time knowledge of byte order and signedness is to bake them into the function name. There is a function for every combination of `(type, signed, byte order)`.
75
+
76
+
The downside of this is that we can't use `.stub.php` files to generate arginfo, so the IDE stubs have to be generated from the extension using [extension-stub-generator](https://github.com/pmmp/extension-stub-generator).
77
+
Also, you'll probably need eye bleach after seeing the [macros that generate the function matrix](https://github.com/pmmp/ext-encoding/blob/bfcc8243f1037d37efea53444dc17c11bd2d47df/classes/Types.cpp#L246-L365).
78
+
79
+
However, considering how critical binary data handling is to performance in PocketMine-MP, this is a trade absolutely worth making.
80
+
81
+
### Why static methods instead of `ByteBuffer(Reader|Writer)` instance methods?
82
+
83
+
Two reasons:
84
+
- As described above, the static `read`/`write` methods can't be generated using `.stub.php` files. If we put the generated functions in `ByteBufferReader`/`ByteBufferWriter`, we'd be unable to use a `.stub.php` file to define the rest of its non-generated API.
85
+
- I've made too many mistakes with byte order due to IDE auto complete. With this API design, byte order is decided by the very first character you type, so auto complete can't trip you up (and you have to import `BE` or `LE`).
86
+
87
+
### Why fully specify `Signed` or `Unsigned` in every function name? Why not just have e.g. `readInt()` and `readUint()`?
88
+
89
+
This library's first users will be people moving from `BinaryStream`, where the API is infamous for being inconsistent about signedness when not specified (https://github.com/pmmp/BinaryUtils/issues/15). For example, `getShort()` is unsigned, and `getInt()` is signed.
90
+
91
+
I felt that it was better to be verbose to force developers to think about whether to use a signed or an unsigned type when migrating old code.
0 commit comments