seqx

seqx is an agent-friendly CLI for FASTA/FASTQ sequence processing.

It is designed around streaming I/O, predictable command behavior, and low-memory execution for large files.

Installation

pypi

pip install seqx

cargo

cargo install seqx

prebuilt binaries

Prebuilt binaries for Linux and macOS are available on the releases page

Quick Start

# Show help
seqx --help

# Show guide (agent-friendly help)
seqx guide
seqx guide filter

# Basic stats
seqx stats -i input.fa

# Convert FASTA -> FASTQ
seqx convert -i input.fa -T fastq -o output.fq

# Filter short sequences
seqx filter -i input.fa --min-len 100 -o filtered.fa

Commands

`stats`

seqx stats -i input.fa
seqx stats -i input.fa --gc
seqx stats -i input.fq --qual --min-len 50

`convert`

seqx convert -i input.fa -T fastq -Q 30 -o output.fq
seqx convert -i input.fq -T fasta -o output.fa

`filter`

seqx filter -i input.fa --min-len 100 --max-len 2000
seqx filter -i input.fa --pattern "ATG.*TAA"
seqx filter -i input.fa --exclude-pattern "N{10,}"
seqx filter -i input.fa --id-file ids.txt
seqx filter -i input.fq --min-qual 30

`extract`

seqx extract -i input.fa --id seq1
seqx extract -i input.fa --id-file ids.txt
seqx extract -i input.fa --range 1:100
seqx extract -i input.fa --bed regions.bed -F 20

`search`

seqx search -i input.fa "ATG"
seqx search -i input.fa "ATG.*TAA" --regex
seqx search -i input.fa "ATG" --mismatches 1 --threads 8
seqx search -i input.fa "ATG" --bed --strand

`modify`

seqx modify -i input.fa --upper
seqx modify -i input.fa --lower
seqx modify -i input.fa --slice 10:200
seqx modify -i input.fa --remove-gaps
seqx modify -i input.fa --reverse-complement

`sample`

seqx sample -i input.fa --count 1000 --seed 42
seqx sample -i input.fa --fraction 0.1

`sort`

seqx sort -i input.fa --by-name
seqx sort -i input.fa --by-len --desc
seqx sort -i input.fa --by-gc --max-memory 256 --threads 8

`dedup`

seqx dedup -i input.fa
seqx dedup -i input.fa --by-id
seqx dedup -i input.fa --prefix 12 --ignore-case
seqx dedup -i input.fa --buckets 256 --threads 8

`merge`

seqx merge a.fa b.fa c.fa -o merged.fa
seqx merge a.fa b.fa c.fa --add-prefix --sep ":" -o merged_with_source.fa

`split`

seqx split -i input.fa --parts 10 -o out_dir
seqx split -i input.fa --chunk-size 1000 -o out_dir
seqx split -i input.fa --by-id -o out_dir --prefix seq

`compress`

# Compress using pigz if available, otherwise built-in
seqx compress -i input.fa
seqx compress -i input.fa -o output.fa.gz -l 9

# Decompress
seqx compress -d -i input.fa.gz
seqx compress -d -i input.fa.gz -o output.fa

# Use stdin/stdout
cat input.fa | seqx compress > output.fa.gz
cat input.fa.gz | seqx compress -d > output.fa

# Force built-in implementation
seqx compress -i input.fa --no-pigz

`guide`

# List all commands
seqx guide

# Show detailed help for a specific command
seqx guide filter
seqx guide compress

# Output in JSON format (for programmatic use)
seqx guide --format json
seqx guide filter --format json

# Output in Markdown format
seqx guide --format markdown

Behavior Notes

Input defaults to stdin where supported.
Output defaults to stdout where supported.
Format detection is extension-based (.fa/.fasta/.fq/.fastq, optional .gz).
FASTA/FASTQ parsing uses noodles.
extract currently supports FASTA extraction only.

Nucleotide vs Protein Behavior

Protein FASTA records are supported by all commands.
Nucleotide-only operations are explicitly guarded:
- filter --gc-min/--gc-max
- modify --reverse-complement
- reverse-complement matching in search (enabled only when both record and pattern are nucleotide)

Performance Model

sort: external chunk sort + mmap merge, configurable with --max-memory and --threads.
dedup: disk bucket partitioning + per-bucket dedup + stable merge, configurable with --buckets and --threads.
split --parts: two-pass streaming split (stdin may be materialized to a temp file).
compress: uses pigz if available, otherwise uses gzp (parallel gzip in Rust) with automatic thread detection.
Temp binary record paths use packed_seq_io (2-bit packing for A/C/G/T when applicable).

Bench Script

./scripts/bench_packed_io.sh

# Custom workload
N_RECORDS=1000000 SEQ_LEN=200 DUP_RATE=40 ./scripts/bench_packed_io.sh

Developer Docs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
python/seqx		python/seqx
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOPMENT.md		DEVELOPMENT.md
QUICKREF.md		QUICKREF.md
README.md		README.md
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqx

Installation

pypi

cargo

prebuilt binaries

Quick Start

Commands

`stats`

`convert`

`filter`

`extract`

`search`

`modify`

`sample`

`sort`

`dedup`

`merge`

`split`

`compress`

`guide`

Behavior Notes

Nucleotide vs Protein Behavior

Performance Model

Bench Script

Developer Docs

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

seqx

Installation

pypi

cargo

prebuilt binaries

Quick Start

Commands

stats

convert

filter

extract

search

modify

sample

sort

dedup

merge

split

compress

guide

Behavior Notes

Nucleotide vs Protein Behavior

Performance Model

Bench Script

Developer Docs

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`stats`

`convert`

`filter`

`extract`

`search`

`modify`

`sample`

`sort`

`dedup`

`merge`

`split`

`compress`

`guide`

Packages