Skip to content

taffy norm affected by changes between blocks (that aren't in rows to be merged) #75

@glennhickey

Description

@glennhickey

(edited for real example):

wget -q https://public.gi.ucsc.edu/~hickey/debug/irwin/case2-tiny-h2m.raw.maf.gz
wget -q https://public.gi.ucsc.edu/~hickey/debug/irwin/genome.list

taffy view -i case2-tiny-h2m.raw.maf.gz | taffy norm -k | mafFilter -i hg38,Tarsius_lariang -m - > norm.maf

zcat case2-tiny-h2m.raw.maf.gz | mafRowOrderer -m - --order-file genome.list | taffy view | taffy norm -k | mafFilter -i hg38,Tarsius_lariang -m - > sort.norm.maf

Tarsius is is one row in the first maf

a
s       hg38.chr1       11707   6       +       248956422       GGGGC-C
s       Tarsius_lariang.k141_1886670    1879    6       -       2088    AGGGC-C
s       hg38.chr1       182226  6       +       248956422       GGGGC-C
s       hg38.chr12      9309644 6       +       133275309       GGGGC-C
s       hg38.chr12      123853115       6       -       133275309       GGGGC-C
s       hg38.chr12      31100331        6       +       133275309       GGGGC-C
s       hg38.chr12      11838   6       +       133275309       GGGGC-C
s       hg38.chr15      11930   6       -       101991189       GGGGC-C
s       hg38.chr16      11388   6       +       90338345        GGGGC-C
s       hg38.chr2       128591799       6       -       242193529       GGGGC-C
s       hg38.chr9       11820   6       +       138394717       GGGGC-C
s       hg38.chrX       12546   6       -       156040895       GGGGC-C

but two rows in the second maf

a
s       hg38.chr1       11707   6       +       248956422       GGGG-C-C
s       hg38.chr1       182226  6       +       248956422       GGGG-C-C
s       hg38.chr12      9309644 6       +       133275309       GGGG-C-C
s       hg38.chr12      123853115       6       -       133275309       GGGG-C-C
s       hg38.chr12      31100331        6       +       133275309       GGGG-C-C
s       hg38.chr12      11838   6       +       133275309       GGGG-C-C
s       hg38.chr15      11930   6       -       101991189       GGGG-C-C
s       hg38.chr16      11388   6       +       90338345        GGGG-C-C
s       hg38.chr2       128591799       6       -       242193529       GGGG-C-C
s       hg38.chr9       11820   6       +       138394717       GGGG-C-C
s       hg38.chrX       12546   6       -       156040895       GGGG-C-C
s       Tarsius_lariang.k141_1886670    1882    3       -       2088    ---G-C-C
s       Tarsius_lariang.k141_1886670    1879    3       -       2088    AGG-----

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions