Skip to content

rna-seq pipeline for drosophila melanogaster: qc, trimming, alignment, and read counting (81.7% alignment, ~50m bp processed)

Notifications You must be signed in to change notification settings

trisharaj1/rna-seq-pipeline

Repository files navigation

RNA-seq Workflow on Drosophila

This is my RNA-seq analysis pipeline for Drosophila melanogaster.
It goes from raw reads → QC/trimming → genome index → alignment → counting reads per gene with featureCounts.

i didn’t upload the large fastq/bam files to github because they’re too big.
if you want to run this yourself, you can download the same data i used from the links below. :)

This dataset contained approximately 50 million base pairs of sequence data, with an 81.7% alignment rate to the annotated drosophila melanogaster genome.

Data I used

Paired-end FASTQ files (subsampled)

Reference genome (Ensembl Release 113, BDGP6.46)


Tools I used

  • fastqc – for quality check
  • fastp – for trimming
  • bwa – for alignment
  • samtools – for sorting and indexing BAM files
  • featureCounts (from subread) – for counting reads per gene

How I ran it (short version)

  1. QC + trimming – fastqc + fastp
  2. Genome index – built with bwa index genome.fa
  3. Alignmentbwa mem with paired reads
  4. Sort + indexsamtools sort and samtools index
  5. CountingfeatureCounts -T 4 -p -B -C ...

Output

  • Results/counts/featurecounts.txt – counts per gene
  • Results/counts/featurecounts.txt.summary – mapping/counting summary

About

rna-seq pipeline for drosophila melanogaster: qc, trimming, alignment, and read counting (81.7% alignment, ~50m bp processed)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published