seqx is an agent-friendly CLI for FASTA/FASTQ sequence processing.
It is designed around streaming I/O, predictable command behavior, and low-memory execution for large files.
pip install seqxcargo install seqxPrebuilt binaries for Linux and macOS are available on the releases page
# Show help
seqx --help
# Show guide (agent-friendly help)
seqx guide
seqx guide filter
# Basic stats
seqx stats -i input.fa
# Convert FASTA -> FASTQ
seqx convert -i input.fa -T fastq -o output.fq
# Filter short sequences
seqx filter -i input.fa --min-len 100 -o filtered.faseqx stats -i input.fa
seqx stats -i input.fa --gc
seqx stats -i input.fq --qual --min-len 50seqx convert -i input.fa -T fastq -Q 30 -o output.fq
seqx convert -i input.fq -T fasta -o output.faseqx filter -i input.fa --min-len 100 --max-len 2000
seqx filter -i input.fa --pattern "ATG.*TAA"
seqx filter -i input.fa --exclude-pattern "N{10,}"
seqx filter -i input.fa --id-file ids.txt
seqx filter -i input.fq --min-qual 30seqx extract -i input.fa --id seq1
seqx extract -i input.fa --id-file ids.txt
seqx extract -i input.fa --range 1:100
seqx extract -i input.fa --bed regions.bed -F 20seqx search -i input.fa "ATG"
seqx search -i input.fa "ATG.*TAA" --regex
seqx search -i input.fa "ATG" --mismatches 1 --threads 8
seqx search -i input.fa "ATG" --bed --strandseqx modify -i input.fa --upper
seqx modify -i input.fa --lower
seqx modify -i input.fa --slice 10:200
seqx modify -i input.fa --remove-gaps
seqx modify -i input.fa --reverse-complementseqx sample -i input.fa --count 1000 --seed 42
seqx sample -i input.fa --fraction 0.1seqx sort -i input.fa --by-name
seqx sort -i input.fa --by-len --desc
seqx sort -i input.fa --by-gc --max-memory 256 --threads 8seqx dedup -i input.fa
seqx dedup -i input.fa --by-id
seqx dedup -i input.fa --prefix 12 --ignore-case
seqx dedup -i input.fa --buckets 256 --threads 8seqx merge a.fa b.fa c.fa -o merged.fa
seqx merge a.fa b.fa c.fa --add-prefix --sep ":" -o merged_with_source.faseqx split -i input.fa --parts 10 -o out_dir
seqx split -i input.fa --chunk-size 1000 -o out_dir
seqx split -i input.fa --by-id -o out_dir --prefix seq# Compress using pigz if available, otherwise built-in
seqx compress -i input.fa
seqx compress -i input.fa -o output.fa.gz -l 9
# Decompress
seqx compress -d -i input.fa.gz
seqx compress -d -i input.fa.gz -o output.fa
# Use stdin/stdout
cat input.fa | seqx compress > output.fa.gz
cat input.fa.gz | seqx compress -d > output.fa
# Force built-in implementation
seqx compress -i input.fa --no-pigz# List all commands
seqx guide
# Show detailed help for a specific command
seqx guide filter
seqx guide compress
# Output in JSON format (for programmatic use)
seqx guide --format json
seqx guide filter --format json
# Output in Markdown format
seqx guide --format markdown- Input defaults to
stdinwhere supported. - Output defaults to
stdoutwhere supported. - Format detection is extension-based (
.fa/.fasta/.fq/.fastq, optional.gz). - FASTA/FASTQ parsing uses
noodles. extractcurrently supports FASTA extraction only.
- Protein FASTA records are supported by all commands.
- Nucleotide-only operations are explicitly guarded:
filter --gc-min/--gc-maxmodify --reverse-complement- reverse-complement matching in
search(enabled only when both record and pattern are nucleotide)
sort: external chunk sort + mmap merge, configurable with--max-memoryand--threads.dedup: disk bucket partitioning + per-bucket dedup + stable merge, configurable with--bucketsand--threads.split --parts: two-pass streaming split (stdin may be materialized to a temp file).compress: usespigzif available, otherwise usesgzp(parallel gzip in Rust) with automatic thread detection.- Temp binary record paths use
packed_seq_io(2-bit packing for A/C/G/T when applicable).
./scripts/bench_packed_io.sh
# Custom workload
N_RECORDS=1000000 SEQ_LEN=200 DUP_RATE=40 ./scripts/bench_packed_io.shMIT