GitHub - uw-syfi/LLMc: A language-model–powered compressor for natural language text

Blog | Code

LLMc is a language-model–powered compressor for natural language text. It encodes token ranks instead of raw token IDs and stores them with a compact format.

Design

The core idea of LLMc is rank-based encoding. During inference, the LLM provides a probability distribution over possible next tokens. In most cases, the true next token ranks among the top few candidates. Instead of storing the token identity, LLMc stores its rank within the distribution. These ranks are small integers and therefore are compact to encode.

llm-compression.1.mp4

Installation

We recommend using uv to manage the virtual environment.

uv venv -p 3.11
# We use a modified version of vLLM and batch_invariant_ops as backend.
git submodule update --init --recursive --depth 1
export VLLM_USE_PRECOMPILED=1
uv sync

CLI Usage

The CLI exposes three subcommands. The examples below mirror the arguments defined in llmc/entrypoint/cli.py.

1) Compress

Write a .llmc binary (varint+brotli):

llmc compress input.txt output.llmc \
  --model Qwen/Qwen3-8B \
  --threshold 256 \
  --chunk-size 4096 \
  --gpu-mem 0.5

Notes:

--threshold and --chunk-size are required.
output.llmc is a raw byte stream.

2) Decompress

Turn a .llmc file back into text:

llmc decompress output.llmc restored.txt \
  --model Qwen/Qwen3-8B \
  --threshold 256 \
  --chunk-size 4096 \
  --gpu-mem 0.5

3) Serve (Web + API)

Run the FastAPI server and web frontend:

llmc serve \
  --model Qwen/Qwen3-8B \
  --max-threshold 256 \
  --max-chunk-size 4096 \
  --gpu-mem 0.5 \
  --host 0.0.0.0 --port 8000

The server enforces upper bounds via --max-threshold and --max-chunk-size. The web UI sends per-request threshold and chunk_size that must be ≤ these limits.

Web Frontend

Open the browser at http://<host>:8000 after starting the server. Enter Threshold and Chunk size for each operation.

Results

Compression ratio with comparison to various traditional algorithms.

Acknowledgements

Built on a custom vLLM fork and batch_invariant_ops.
Thanks to the open-source ecosystem for models and tooling.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
llmc		llmc
tests		tests
third_party		third_party
web		web
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Design

Installation

CLI Usage

1) Compress

2) Decompress

3) Serve (Web + API)

Web Frontend

Results

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

uw-syfi/LLMc

Folders and files

Latest commit

History

Repository files navigation

Design

Installation

CLI Usage

1) Compress

2) Decompress

3) Serve (Web + API)

Web Frontend

Results

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages