A set of utilties available as subcommands within a single program, bfqutils.
- FastQ PE read merging: bfqutils merge
- FastQ trimming for single-end reads: bfqutils trimse
- FastQ statistics: bfqutils stats
Requires gcc/clang and GNU Make, tested with macOS and Linux. bfqutils is written in C and comes bundled with the Cloudflare fork of Zlib.
git clone https://github.com/noborilab/bfqutils
cd bfqutils
make libz
make release
make test # optionalTo dynamically link to a system Zlib instead of building a static library:
make z_dyn=1 releaseThis tool is meant to be used as a drop-in replacement for fastp --merge. The only differences from fastp are a smaller default minimum merge length (since this tool is primarily intended for libraries with small inserts) and more aggressive base correction. Apart from that, bfqmerge is intended to be a lightweight replacement with extremely low memory usage (<2 MB) and very good single-thread performance (<15 min on 2x100M PE150 reads, ~10 min without the -z flag). Remember to change the default values in src/bfqmerge.c if that would better suit your primary use case.
bfqutils v1.1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqutils merge [options] R1.fq[.gz] R2.fq[.gz] > merged.fq
-o <int> Required overlap for a merge to occur. Default: 15
-d <int> Maximum number of mismatches between alignments. Default: 5
-p <dbl> Maximum fraction of mismatches between alignments. Default: 0.2
-Q <int> Minimum PHRED+33 quality to consider a base high quality. Default: 15
-u <dbl> Maximum fraction of bases allowed to be low quality. Default: 0.4
-n <int> Maximum number of Ns allowed. Default: 5
-g <int> Number of Gs to trigger polyG tail trimming. Default: 10
-t <int> Mean window quality threshold for trimming 3-prime bases. Default: 20.
-w <int> Window size of 3-prime base trimming. Default: 5
-m <int> Max merged read length.
-z Compress the output as gzip.
-q Make the program quiet.
-v Print the version and exit.
-h Print this help message and exit.
bfqmerge is a simple program, which performs the following operations:
-
Detect and trim low quality bases from the 3' end of the reads (
-t,-w). This helps reduce the possibility of false positive alignments between incorrect read segments, especially when expecting short inserts. -
Detect and trim polyG tails (
-g). Failure to remove these can lead to bfqmerge thinking that sufficiently long stretches of Gs found in both reads of a pair are the overlapping parts of the reads. Due to the way some machines work, sequencing past the end of a read can lead to long stretches of Gs with high quality scores (which won't be trimmed in step 1). -
Find the best overlap, depending on user settings (
-o,-d,-p). -
If no overlap is found, discard the reads. Otherwise, create a new merged read. For each overlapping position, the base and quality score is taken from whichever of the two reads has a better score in that position.
-
Check if the merged read passes quality filters (
-Q,-u,-n). If yes, then compress the output if desired (-z), then write tostdout.
With the exception of -o, all default values are identical to fastp. These generally work quite well, though be aware that it is impossible to get read merging right 100% of the time. False positive merging events and false negative read discards will almost always occur for any combination of settings given a sufficiently diverse set of reads.
A simple FastQ trimming and quality filtering tool for single-end reads. The order of operations is similar to bfqmerge, replacing overlap analysis with adapter sequence matching. Use '-' for stdin.
bfqutils v1.1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqutils trimse [options] reads.fq[.gz] > trimmed.fq
-a <str> Adapter sequence. Default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
-Q <int> Minimum PHRED+33 quality to consider a base high quality. Default: 15
-u <dbl> Maximum fraction of bases allowed to be low quality. Default: 0.4
-n <int> Maximum number of Ns allowed. Default: 5
-g <int> Number of Gs to trigger polyG tail trimming. Default: 10
-t <int> Mean window quality threshold for trimming 3-prime bases. Default: 20.
-w <int> Window size of 3-prime base trimming. Default: 5
-M <int> Min trimmed read length. Default: 15
-m <int> Max trimmed read length.
-z Compress the output as gzip.
-q Make the program quiet.
-v Print the version and exit.
-h Print this help message and exit.
Meant to be used in a pipe with bfqmerge or bfqtrimse. The default top enriched K-mer summary can help spot bad trimming/merging. Use '-' for stdin.
bfqutils v1.1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqutils stats [options] reads.fq[.gz]
-l <file> Read Length histogram.
-g <file> Read GC content histogram.
-q <file> Mean read quality histogram.
-Q <file> Per-position mean quality histogram.
-b <file> Per-position base content histogram.
-k <file> K-mer counts and obs/exp ratios.
-o <file> Send summary stats to a file instead of stderr.
-K <int> K-mer size for -k. Default: 6
-n <int> Only examine this number of reads.
-O Send the reads to stdout.
-z If -O, compress as gzip.
-N Do not print summary stats to stderr.
-v Print the version and exit.
-h Print this help message and exit.