|
17 | 17 |
|
18 | 18 | # Roydl.Text |
19 | 19 |
|
20 | | -The idea was to create a comfortable way of binary-to-text encoding. |
| 20 | +Roydl.Text provides a simple, generic way to encode and decode binary data as text. Extension methods are available for `string` and `byte[]`, and a growing set of encodings is offered — all of which are performance-optimized and parallelized across available CPU cores, with AVX2 and AVX-512 SIMD acceleration where applicable. |
21 | 21 |
|
22 | | -You can easily create instances of any type to translate `Stream`, `byte[]` or `string` data. Extension methods are also provided for all types. |
| 22 | +## Table of Contents |
23 | 23 |
|
24 | | -## Install: |
| 24 | +- [Prerequisites](#prerequisites) |
| 25 | +- [Install](#install) |
| 26 | +- [Binary-To-Text Encodings](#binary-to-text-encodings) |
| 27 | +- [Encoding Performance](#encoding-performance) |
| 28 | +- [Usage](#usage) |
| 29 | +- [Would you like to help?](#would-you-like-to-help) |
25 | 30 |
|
26 | | -```julia |
| 31 | +--- |
27 | 32 |
|
28 | | -$ dotnet add package Roydl.Text |
| 33 | +## Prerequisites |
| 34 | + |
| 35 | +- [.NET 10 LTS](https://dotnet.microsoft.com/download/dotnet/10.0) or higher |
| 36 | +- Supported platforms: Windows, Linux, macOS |
| 37 | +- Hardware acceleration (optional): AVX2 or AVX-512 capable CPU |
29 | 38 |
|
| 39 | +--- |
| 40 | + |
| 41 | +## Install |
| 42 | +``` |
| 43 | +$ dotnet add package Roydl.Text |
30 | 44 | ``` |
31 | 45 |
|
32 | | -## Binary-To-Text Encoding |
| 46 | +--- |
33 | 47 |
|
34 | | -| Type | Encoding | |
35 | | -| ---- | ---- | |
36 | | -| Base-2 | Binary character set: `0` and `1` | |
37 | | -| Base-8 | Octal character set: `0-7` | |
38 | | -| Base-10 | Decimal character set: `0-9` | |
39 | | -| Base-16 | Hexadecimal character set: `0-9` and `a-f` | |
40 | | -| Base-32 | Standard 32-character set: `A–Z` and `2–7`; `=` for padding | |
41 | | -| Base-64 | Standard 64-character set: `A–Z`, `a–z`, `0–9`, `+` and `/`; `=` for padding | |
42 | | -| Base-85 | Standard 85-character set: `!"#$%&'()*+,-./`, `0-9`, `:;<=>?@`, `A-Z`, <code>[]^_`</code> and `a-u` | |
43 | | -| Base-91 | Standard 91-character set: `A–Z`, `a–z`, `0–9`, and <code>!#$%&()*+,-.:;<=>?@[]^_`{|}~"</code> | |
| 48 | +## Binary-To-Text Encodings |
44 | 49 |
|
45 | | -### Usage: |
46 | | -```cs |
47 | | -// The `value` must be type `string` or `byte[]`, if `BinToTextEncoding` is |
48 | | -// not set, `Base64` is used by default. |
49 | | -string base85text = value.Encode(BinToTextEncoding.Base85); |
50 | | -byte[] original = value.Decode(BinToTextEncoding.Base85); // if `value` to decode is `byte[]` |
51 | | -string original = value.DecodeString(BinToTextEncoding.Base85); // if `value` to decode is `string` |
52 | | -
|
53 | | -// The `value` of type `string` can also be a file path, which is not |
54 | | -// recommended for large files, in this case you should create a |
55 | | -// `Base85` instance and use `FileStream` to read and write. |
56 | | -string base85text = value.EncodeFile(BinToTextEncoding.Base85); |
57 | | -byte[] original = value.DecodeFile(BinToTextEncoding.Base85); |
58 | | -``` |
| 50 | +| Type | Character Set | Output Ratio | Hardware Support | |
| 51 | +| :---- | :---- | ----: | :----: | |
| 52 | +| Base-2 | `0` and `1` | 8× | AVX-512BW<br>AVX2 | |
| 53 | +| Base-8 | `0–7` | 3× | AVX-512BW<br>AVX2 | |
| 54 | +| Base-10 | `0–9` | 3× | AVX-512BW<br>AVX2 | |
| 55 | +| Base-16 | `0–9` and `a–f` | 2× | AVX-512BW<br>AVX2 | |
| 56 | +| Base-32 | `A–Z` and `2–7`; `=` for padding | 1.6× | AVX2 ¹ | |
| 57 | +| Base-64 | `A–Z`, `a–z`, `0–9`, `+` and `/`; `=` for padding | 1.33× | AVX2 ² | |
| 58 | +| Base-85 | ASCII printable range `!`–`u`; `z` shortcut for null groups | 1.25× | AVX2 | |
| 59 | +| Base-91 | `A–Z`, `a–z`, `0–9` and <code>!#$%&()*+,-.:;<=>?@[]^_`{|}~"</code> | ~1.23× | None ³ | |
| 60 | + |
| 61 | +> ¹ AVX2 is used for the alphabet lookup phase only. The non-power-of-two 5-bit group width prevents full SIMD vectorization of the bit-extraction phase. |
| 62 | +> |
| 63 | +> ² Delegates to .NET's built-in `System.Buffers.Text.Base64` which is internally AVX2-accelerated. Parallelization and double-buffered I/O are layered on top. |
| 64 | +> |
| 65 | +> ³ The algorithm maintains a serial bit-accumulator state across every byte, making it fundamentally incompatible with SIMD vectorization or parallel processing. Any optimization that would break this dependency chain would also break compatibility with existing encoded data. |
| 66 | +
|
| 67 | +> For general binary-to-text encoding, Base-85 and Base-91 offer better compactness than Base-64 — Base-85 produces ~6% smaller output, and Base-91 ~9% smaller. Base-85 is the better practical choice of the two: it is over 7× faster than Base-91 while sacrificing only marginal compactness. |
59 | 68 |
|
| 69 | +--- |
| 70 | + |
| 71 | +## Encoding Performance |
| 72 | + |
| 73 | +_Base-64 and Base-16 are the fastest encodings in this library. Base-91 is a known outlier — its serial design makes parallelization impossible without breaking the algorithm._ |
| 74 | + |
| 75 | +| Encoding | Throughput | |
| 76 | +| :---- | ----: | |
| 77 | +| Base-2 | **2.2 GiB/s** | |
| 78 | +| Base-8 | **1.0 GiB/s** | |
| 79 | +| Base-10 | **1.2 GiB/s** | |
| 80 | +| Base-16 | **7.5 GiB/s** | |
| 81 | +| Base-32 | **1.3 GiB/s** | |
| 82 | +| Base-64 | **9.6 GiB/s** | |
| 83 | +| Base-85 | **2.8 GiB/s** | |
| 84 | +| Base-91 | **380 MiB/s** | |
| 85 | + |
| 86 | +<details> |
| 87 | +<summary>Benchmark methodology</summary> |
| 88 | + |
| 89 | +| Component | Details | |
| 90 | +| :--- | :--- | |
| 91 | +| CPU | AMD Ryzen 5 7600 (6C/12T, 5.1 GHz boost) | |
| 92 | +| RAM | 32 GB DDR5 | |
| 93 | +| OS | Manjaro Linux (Kernel 6.19.2-1) | |
| 94 | +| Runtime | .NET 10 | |
| 95 | +| Build | Release (`dotnet run -c Release`) | |
| 96 | + |
| 97 | +Each encoding is benchmarked using stream reuse to eliminate allocation overhead. Four input patterns are tested per encoding: random bytes, all-zeros, sequential, and mixed (25% zero groups). Each pattern runs five cycles of three seconds each. The reported throughput is the median across all patterns and cycles, which avoids cache-warmup bias and reflects sustained real-world performance. You can find the benchmark test [here](https://github.com/Roydl/Text/blob/master/test/BenchmarkTests/BinaryToTextPerformanceTests.cs). |
| 98 | + |
| 99 | +</details> |
60 | 100 |
|
61 | 101 | --- |
62 | 102 |
|
| 103 | +## Usage |
| 104 | +```cs |
| 105 | +// Encode — value can be string or byte[] |
| 106 | +// BinToTextEncoding defaults to Base64 if not specified |
| 107 | +string encoded = value.Encode(BinToTextEncoding.Base85); |
| 108 | + |
| 109 | +// Decode |
| 110 | +byte[] original = encoded.Decode(BinToTextEncoding.Base85); |
| 111 | +string original = encoded.DecodeString(BinToTextEncoding.Base85); |
| 112 | + |
| 113 | +// File encoding via extension methods |
| 114 | +// For large files, use the instance-based approach below instead |
| 115 | +string encoded = path.EncodeFile(BinToTextEncoding.Base85); |
| 116 | +byte[] original = path.DecodeFile(BinToTextEncoding.Base85); |
| 117 | + |
| 118 | +// Instance-based — recommended for large files or repeated use |
| 119 | +// GetDefaultInstance() returns a cached singleton per encoding type |
| 120 | +var encoder = BinToTextEncoding.Base85.GetDefaultInstance(); |
| 121 | + |
| 122 | +// Stream-based — most efficient for large files |
| 123 | +using var input = new FileStream(srcPath, FileMode.Open, FileAccess.Read); |
| 124 | +using var output = new FileStream(destPath, FileMode.Create); |
| 125 | +encoder.EncodeStream(input, output); |
| 126 | + |
| 127 | +// Line length — inserts Environment.NewLine after every N encoded chars |
| 128 | +string encoded = value.Encode(BinToTextEncoding.Base64, lineLength: 76); |
| 129 | + |
| 130 | +// All public methods are available on every encoding instance |
| 131 | +string encoded = encoder.EncodeBytes(bytes); |
| 132 | +string encoded = encoder.EncodeString(text); |
| 133 | +string encoded = encoder.EncodeFile(path); |
| 134 | +byte[] original = encoder.DecodeBytes(encoded); |
| 135 | +string original = encoder.DecodeString(encoded); |
| 136 | +byte[] original = encoder.DecodeFile(path); |
| 137 | +``` |
| 138 | + |
| 139 | +--- |
63 | 140 |
|
64 | 141 | ## Would you like to help? |
65 | 142 |
|
|
0 commit comments