-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Just a head's up to anyone uploading or downloading raw Quantseq (Pool) reads to NCBI's Sequence Read Archive (SRA): SRA discards fastq-file headers, meaning that Quantseq UMI information is lost forever if this is your only archive.
Example of a Quantseq raw read file from SRA:
@SRR11768435.1 1/1
CCTGAATATAGCAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAAAAAAAA
+
AAAAAEEEEEEEEEE6EEEEEEE/EEEEEEEEEEEEEEEEEEE/EEAEEEEEEEE/A<EEEEEEEEAEEA/666<AEEEEEAEE<
Normally a quantseq file should look like this:
@A00383:847:H7TVYDSXF:1:1101:25735:1000_AATGGGTCGG 1:N:0:CGGGAACC+AGGGCCAA+CGGGAACCCGCA
TNGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAATCCTGCGTC
+
F#FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFF:FFFFFFFFF:FFF,FFFFFFFFFF,FF,F,,F:FFF
The discarding of these headers make it impossible to reproduce some public Quantseq datasets.
More info on why they do this: ncbi/sra-tools#130
Consider uploading to the European Nucleotide Archive (ENA), which preserves the headers.
Metadata
Metadata
Assignees
Labels
No labels