Skip to content

Problems with GTDB taxonomy assigments to MARdb sequences  #40

@micronuria

Description

@micronuria

I have encounter two problems when using the relabeltree.py --taxonomy option to assign taxonomy to MARdb sequeces:

  1. some genomes from the initial MARdb files that Jose sent are missing the MARdb ID and therefore, those sequences cannot be classified with their ID. To solve it, we need to:
    a) First, figure out if all teams are working with the same files (Complete and Partial genomes, QC by Jose)
    b) Check with José if the MAR IDs were removed during the QC he did - I will do that once we figure out the firs part.

  2. For those sequences with MAR IDs the script is assigning the NCBI taxonomy and not the GTDB. For example:

One of the genera that appears in my tree classified as the NCBI (Beta) but in GTDB is Gamma.
This happens with several Betas that have been reclassified as Gammas, for example in the tree

The label color is the GTDB web classification and the square in the right the taxonomy applied by the script. Green: Zproteobacteria, Pink: Gamma, Orange: Beta
I have another example with a Streptomyces that has the NCBI taxonomy in the tree label and not the GTDB.
Are we are using an older release??

Metadata

Metadata

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions