Skip to content

A Nextflow pipeline for SARS-CoV-2 genome assembly and analysis from Illumina reads—includes QC, mapping, variant calling, consensus generation, lineage annotation, and phylogenetics.

License

Notifications You must be signed in to change notification settings

bibymaths/nf-illumina2lineage

Repository files navigation

nf-illumina2lineage

GitHub last commit

Nextflow

DOI

A reproducible and modular Nextflow pipeline for SARS-CoV-2 genome assembly and lineage analysis from Illumina paired-end sequencing data.

Overview

This pipeline automates:

  • Read quality control
  • Reference-based mapping
  • Primer clipping
  • Variant calling
  • Consensus generation
  • Lineage assignment
  • Phylogenetic analysis

It is based on best-practice tools and developed as part of the SARS-2 Bioinformatics & Data Science course by Freie Universität Berlin and the Robert Koch Institute.

Quickstart

git clone https://github.com/bibymaths/nf-illumina2lineage.git
cd nf-illumina2lineage

💡 See docs/quickstart.md for full details.

Inputs

  • Illumina paired-end .fastq.gz files
  • SARS-CoV-2 reference genome (downloaded automatically)

Outputs

  • QC reports: FastQC, Fastp, MultiQC
  • BAM & VCF files
  • Consensus FASTA sequences
  • Pangolin lineage annotations
  • Phylogenetic tree (.treefile)

For a full output structure, see docs/outputs.md.

Dependencies

Managed via mamba or Docker:

  • QC: fastqc, fastp, multiqc
  • Mapping: minimap2, samtools, bamclipper
  • Variant Calling: freebayes, vcftools, bcftools
  • Consensus: vcfR, bcftools, president
  • Lineage & MSA: pangolin, mafft, iqtree

Documentation

Complete documentation is available under the docs/ folder and rendered via MkDocs. Includes:

License

This project is licensed under the BSD 3-Clause License. See LICENSE.

Author

Abhinav Mishra
Email: [email protected]

Acknowledgments

Developed during the SARS-2 Bioinformatics & Data Science course at FU Berlin & RKI, under guidance of Max von Kleist and Martin Hölzer.


About

A Nextflow pipeline for SARS-CoV-2 genome assembly and analysis from Illumina reads—includes QC, mapping, variant calling, consensus generation, lineage annotation, and phylogenetics.

Topics

Resources

License

Stars

Watchers

Forks