Skip to content

Commit 05c6cc3

Browse files
committed
Update readme
1 parent 643b9f0 commit 05c6cc3

1 file changed

Lines changed: 107 additions & 30 deletions

File tree

README.md

Lines changed: 107 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -17,49 +17,126 @@
1717

1818
# Roydl.Text
1919

20-
The idea was to create a comfortable way of binary-to-text encoding.
20+
Roydl.Text provides a simple, generic way to encode and decode binary data as text. Extension methods are available for `string` and `byte[]`, and a growing set of encodings is offered — all of which are performance-optimized and parallelized across available CPU cores, with AVX2 and AVX-512 SIMD acceleration where applicable.
2121

22-
You can easily create instances of any type to translate `Stream`, `byte[]` or `string` data. Extension methods are also provided for all types.
22+
## Table of Contents
2323

24-
## Install:
24+
- [Prerequisites](#prerequisites)
25+
- [Install](#install)
26+
- [Binary-To-Text Encodings](#binary-to-text-encodings)
27+
- [Encoding Performance](#encoding-performance)
28+
- [Usage](#usage)
29+
- [Would you like to help?](#would-you-like-to-help)
2530

26-
```julia
31+
---
2732

28-
$ dotnet add package Roydl.Text
33+
## Prerequisites
34+
35+
- [.NET 10 LTS](https://dotnet.microsoft.com/download/dotnet/10.0) or higher
36+
- Supported platforms: Windows, Linux, macOS
37+
- Hardware acceleration (optional): AVX2 or AVX-512 capable CPU
2938

39+
---
40+
41+
## Install
42+
```
43+
$ dotnet add package Roydl.Text
3044
```
3145

32-
## Binary-To-Text Encoding
46+
---
3347

34-
| Type | Encoding |
35-
| ---- | ---- |
36-
| Base-2 | Binary character set: `0` and `1` |
37-
| Base-8 | Octal character set: `0-7` |
38-
| Base-10 | Decimal character set: `0-9` |
39-
| Base-16 | Hexadecimal character set: `0-9` and `a-f` |
40-
| Base-32 | Standard 32-character set: `A–Z` and `2–7`; `=` for padding |
41-
| Base-64 | Standard 64-character set: `A–Z`, `a–z`, `0–9`, `+` and `/`; `=` for padding |
42-
| Base-85 | Standard 85-character set: `!"#$%&'()*+,-./`, `0-9`, `:;<=>?@`, `A-Z`, <code>[]^_&#96;</code> and `a-u` |
43-
| Base-91 | Standard 91-character set: `A–Z`, `a–z`, `0–9`, and <code>!&#35;$%&amp;()*+,-.:;&lt;=&gt;?@[]^_&#96;{&#124;}~&quot;</code> |
48+
## Binary-To-Text Encodings
4449

45-
### Usage:
46-
```cs
47-
// The `value` must be type `string` or `byte[]`, if `BinToTextEncoding` is
48-
// not set, `Base64` is used by default.
49-
string base85text = value.Encode(BinToTextEncoding.Base85);
50-
byte[] original = value.Decode(BinToTextEncoding.Base85); // if `value` to decode is `byte[]`
51-
string original = value.DecodeString(BinToTextEncoding.Base85); // if `value` to decode is `string`
52-
53-
// The `value` of type `string` can also be a file path, which is not
54-
// recommended for large files, in this case you should create a
55-
// `Base85` instance and use `FileStream` to read and write.
56-
string base85text = value.EncodeFile(BinToTextEncoding.Base85);
57-
byte[] original = value.DecodeFile(BinToTextEncoding.Base85);
58-
```
50+
| Type | Character Set | Output Ratio | Hardware Support |
51+
| :---- | :---- | ----: | :----: |
52+
| Base-2 | `0` and `1` || AVX-512BW<br>AVX2 |
53+
| Base-8 | `0–7` || AVX-512BW<br>AVX2 |
54+
| Base-10 | `0–9` || AVX-512BW<br>AVX2 |
55+
| Base-16 | `0–9` and `a–f` || AVX-512BW<br>AVX2 |
56+
| Base-32 | `A–Z` and `2–7`; `=` for padding | 1.6× | AVX2 ¹ |
57+
| Base-64 | `A–Z`, `a–z`, `0–9`, `+` and `/`; `=` for padding | 1.33× | AVX2 ² |
58+
| Base-85 | ASCII printable range `!``u`; `z` shortcut for null groups | 1.25× | AVX2 |
59+
| Base-91 | `A–Z`, `a–z`, `0–9` and <code>!#$%&()*+,-.:;<=>?@[]^_`{|}~"</code> | ~1.23× | None ³ |
60+
61+
> ¹ AVX2 is used for the alphabet lookup phase only. The non-power-of-two 5-bit group width prevents full SIMD vectorization of the bit-extraction phase.
62+
>
63+
> ² Delegates to .NET's built-in `System.Buffers.Text.Base64` which is internally AVX2-accelerated. Parallelization and double-buffered I/O are layered on top.
64+
>
65+
> ³ The algorithm maintains a serial bit-accumulator state across every byte, making it fundamentally incompatible with SIMD vectorization or parallel processing. Any optimization that would break this dependency chain would also break compatibility with existing encoded data.
66+
67+
> For general binary-to-text encoding, Base-85 and Base-91 offer better compactness than Base-64 — Base-85 produces ~6% smaller output, and Base-91 ~9% smaller. Base-85 is the better practical choice of the two: it is over 7× faster than Base-91 while sacrificing only marginal compactness.
5968
69+
---
70+
71+
## Encoding Performance
72+
73+
_Base-64 and Base-16 are the fastest encodings in this library. Base-91 is a known outlier — its serial design makes parallelization impossible without breaking the algorithm._
74+
75+
| Encoding | Throughput |
76+
| :---- | ----: |
77+
| Base-2 | **2.2 GiB/s** |
78+
| Base-8 | **1.0 GiB/s** |
79+
| Base-10 | **1.2 GiB/s** |
80+
| Base-16 | **7.5 GiB/s** |
81+
| Base-32 | **1.3 GiB/s** |
82+
| Base-64 | **9.6 GiB/s** |
83+
| Base-85 | **2.8 GiB/s** |
84+
| Base-91 | **380 MiB/s** |
85+
86+
<details>
87+
<summary>Benchmark methodology</summary>
88+
89+
| Component | Details |
90+
| :--- | :--- |
91+
| CPU | AMD Ryzen 5 7600 (6C/12T, 5.1 GHz boost) |
92+
| RAM | 32 GB DDR5 |
93+
| OS | Manjaro Linux (Kernel 6.19.2-1) |
94+
| Runtime | .NET 10 |
95+
| Build | Release (`dotnet run -c Release`) |
96+
97+
Each encoding is benchmarked using stream reuse to eliminate allocation overhead. Four input patterns are tested per encoding: random bytes, all-zeros, sequential, and mixed (25% zero groups). Each pattern runs five cycles of three seconds each. The reported throughput is the median across all patterns and cycles, which avoids cache-warmup bias and reflects sustained real-world performance. You can find the benchmark test [here](https://github.com/Roydl/Text/blob/master/test/BenchmarkTests/BinaryToTextPerformanceTests.cs).
98+
99+
</details>
60100

61101
---
62102

103+
## Usage
104+
```cs
105+
// Encode — value can be string or byte[]
106+
// BinToTextEncoding defaults to Base64 if not specified
107+
string encoded = value.Encode(BinToTextEncoding.Base85);
108+
109+
// Decode
110+
byte[] original = encoded.Decode(BinToTextEncoding.Base85);
111+
string original = encoded.DecodeString(BinToTextEncoding.Base85);
112+
113+
// File encoding via extension methods
114+
// For large files, use the instance-based approach below instead
115+
string encoded = path.EncodeFile(BinToTextEncoding.Base85);
116+
byte[] original = path.DecodeFile(BinToTextEncoding.Base85);
117+
118+
// Instance-based — recommended for large files or repeated use
119+
// GetDefaultInstance() returns a cached singleton per encoding type
120+
var encoder = BinToTextEncoding.Base85.GetDefaultInstance();
121+
122+
// Stream-based — most efficient for large files
123+
using var input = new FileStream(srcPath, FileMode.Open, FileAccess.Read);
124+
using var output = new FileStream(destPath, FileMode.Create);
125+
encoder.EncodeStream(input, output);
126+
127+
// Line length — inserts Environment.NewLine after every N encoded chars
128+
string encoded = value.Encode(BinToTextEncoding.Base64, lineLength: 76);
129+
130+
// All public methods are available on every encoding instance
131+
string encoded = encoder.EncodeBytes(bytes);
132+
string encoded = encoder.EncodeString(text);
133+
string encoded = encoder.EncodeFile(path);
134+
byte[] original = encoder.DecodeBytes(encoded);
135+
string original = encoder.DecodeString(encoded);
136+
byte[] original = encoder.DecodeFile(path);
137+
```
138+
139+
---
63140

64141
## Would you like to help?
65142

0 commit comments

Comments
 (0)