A reproducible and modular Nextflow pipeline for SARS-CoV-2 genome assembly and lineage analysis from Illumina paired-end sequencing data.
This pipeline automates:
- Read quality control
- Reference-based mapping
- Primer clipping
- Variant calling
- Consensus generation
- Lineage assignment
- Phylogenetic analysis
It is based on best-practice tools and developed as part of the SARS-2 Bioinformatics & Data Science course by Freie Universität Berlin and the Robert Koch Institute.
git clone https://github.com/bibymaths/nf-illumina2lineage.git
cd nf-illumina2lineage💡 See docs/quickstart.md for full details.
- Illumina paired-end
.fastq.gzfiles - SARS-CoV-2 reference genome (downloaded automatically)
- QC reports: FastQC, Fastp, MultiQC
- BAM & VCF files
- Consensus FASTA sequences
- Pangolin lineage annotations
- Phylogenetic tree (.treefile)
For a full output structure, see docs/outputs.md.
Managed via mamba or Docker:
- QC:
fastqc,fastp,multiqc - Mapping:
minimap2,samtools,bamclipper - Variant Calling:
freebayes,vcftools,bcftools - Consensus:
vcfR,bcftools,president - Lineage & MSA:
pangolin,mafft,iqtree
Complete documentation is available under the docs/ folder and rendered via MkDocs. Includes:
This project is licensed under the BSD 3-Clause License. See LICENSE.
Abhinav Mishra
Email: [email protected]
Developed during the SARS-2 Bioinformatics & Data Science course at FU Berlin & RKI, under guidance of Max von Kleist and Martin Hölzer.