improve and run benchmark again for real world testing (#42)

hampsterx · web-flow · commit b1d4f474c21e · 2025-06-21T08:46:05.000+12:00
* improve and run benchmark again for real world testing

* add perf note to explain benchmark results

* tweaks
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -117,4 +117,4 @@ jobs:
       run: isort --check-only --diff kinesis tests
 
     - name: Lint with flake8
-      run: flake8 kinesis tests --max-line-length=88 --extend-ignore=E203,W503
+      run: flake8 kinesis tests --max-line-length=88 --extend-ignore=E203,W503,E501
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # async-kinesis
 
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![PyPI version](https://badge.fury.io/py/async-kinesis.svg)](https://badge.fury.io/py/async-kinesis) [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/) [![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![PyPI version](https://badge.fury.io/py/async-kinesis.svg)](https://badge.fury.io/py/async-kinesis) [![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-3100/) [![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/) [![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120/)
 
 ```
 pip install async-kinesis
@@ -169,9 +169,57 @@ Note:
 
 See [benchmark.py](./benchmark.py) for code
 
-50k items of approx 1k (python) in size, using single shard.
+The benchmark tool allows you to test the performance of different processors with AWS Kinesis streams.
 
-![Benchmark](docs/benchmark.png)
+### Usage
+
+```bash
+# Run with default settings (50k records)
+python benchmark.py
+
+# Dry run without creating AWS resources
+python benchmark.py --dry-run
+
+# Custom parameters
+python benchmark.py --records 10000 --shards 2 --processors json msgpack
+
+# Generate markdown output
+python benchmark.py --markdown
+
+# Run multiple iterations
+python benchmark.py --iterations 3
+```
+
+### Example Results
+
+50k items of approx 1k (python) in size, using single shard:
+
+| Processor | Iteration | Python Bytes | Kinesis Bytes | Time (s) | Records/s | Python MB/s | Kinesis MB/s |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| StringProcessor | 1 | 2.7 MB | 50.0 MB | 51.2 | 977 | 53.9 kB | 1000.0 kB |
+| JsonProcessor | 1 | 2.7 MB | 50.0 MB | 52.1 | 960 | 52.9 kB | 982.5 kB |
+| JsonLineProcessor | 1 | 2.7 MB | 41.5 MB | 43.5 | 1149 | 63.4 kB | 976.7 kB |
+| JsonListProcessor | 1 | 2.7 MB | 2.6 MB | 2.8 | 17857 | 982.1 kB | 946.4 kB |
+| MsgpackProcessor | 1 | 2.7 MB | 33.9 MB | 35.8 | 1397 | 77.0 kB | 969.8 kB |
+
+*Note: MsgpackProcessor performance varies significantly with record size. While it appears slower here with small records due to netstring framing overhead (~9%) and msgpack serialization costs, it can be faster than JSON processors with larger, complex data structures where msgpack's binary efficiency provides benefits.*
+
+### Processor Recommendations
+
+Choose the optimal processor based on your use case:
+
+| Use Case | Recommended Processor | Reason |
+| --- | --- | --- |
+| **High-frequency small messages** (<500 bytes) | JsonLineProcessor | Minimal aggregation overhead, simple parsing |
+| **Individual JSON messages** | JsonProcessor | Maximum compatibility, no aggregation complexity |
+| **Batch processing arrays** | JsonListProcessor | Highest throughput for compatible consumers |
+| **Large complex data** (>1KB) | MsgpackProcessor | Binary efficiency outweighs overhead |
+| **Raw text/logs** | StringProcessor | Minimal processing overhead |
+| **Binary data or deeply nested objects** | MsgpackProcessor | Compact binary representation |
+| **Real-time streaming** | JsonLineProcessor | Simple, fast parsing with good compression |
+| **Bandwidth-constrained environments** | MsgpackProcessor | Smaller payload sizes |
+
+**Performance Testing:** Use the benchmark tool with different `--record-size-kb` and `--processors` options to determine the best processor for your specific data patterns.
 
 
 ## Unit Testing
diff --git a/benchmark.py b/benchmark.py