Skip to content

Extending Profile Coverage

Matt Thompson edited this page Jun 30, 2025 · 2 revisions

The following is a basic workflow for updating profiles in TaxonoPy to improve case coverage. Note that it uses paths and resources specific to the Ohio Supercomputer Center, and modifications may be needed for other contexts.

  1. Execute TaxonoPy on a dataset:
taxonopy resolve \
  --input /fs/ess/PAS2136/TreeOfLife/annotations/source_taxa/source=bioscan \
  --output-dir ./temp_out \
  --output-format parquet

The default behavior here will be to force unresolvable entries to use the input data.

  1. Scan outputs for final_status_force_accepted or final_status_failed_forced_input (of statuses by name should be available in resolution_stats.json in the output directory in addition to the printed message). Note that the status will be capital camel case.
parquet cat temp_out/*.parquet | grep "FAILED_FORCED_INPUT" | head | jq

e.g. output (among others):

{
  "uuid": "90a7dba8-702c-48f7-9f9b-b3aaa68009de",
  "scientific_name": "Pheidole sp. TZ01",
  "common_name": "",
  "source_dataset": "bioscan",
  "source_id": "GMNAZ5929-22",
  "resolution_status": "FAILED_FORCED_INPUT",
  "kingdom": "Animalia",
  "phylum": "Arthropoda",
  "class": "Insecta",
  "order": "Hymenoptera",
  "family": "Formicidae",
  "genus": "Pheidole",
  "species": "Pheidole sp. TZ01",
  "resolution_path": "RESOLVED",
  "resolution_strategy": "ForceFailedToInput",
  "final_query_term": "Pheidole sp. TZ01",
  "final_query_rank": "species",
  "final_data_source_id": 11,
# ...
}
  1. Use the gnverifier CLI to investigate the case further. This can be installed using any method outlined in the gnverifier CLI installation instructions or the Docker image. On HPC, an Apptainer image may be retrieved with:
apptainer pull /fs/ess/PAS2136/apps/gnverifier_latest.sif docker://gnames/gnverifier:latest

This is already done on OSC.

For use on OSC, in your ~/.bashrc, add:

alias gnverifier='apptainer --silent exec --bind /etc/ssl:/etc/ssl --bind /etc/pki:/etc/pki /fs/ess/PAS2136/apps/gnverifier_latest.sif gnverifier'

and do source ~/.bashrc. Then you can enter CLI commands that replicate TaxonoPy queries which point to this image to run the container, such as:

gnverifier -s 11 -j 1 --format compact --all_matches --capitalize "Pheidole sp. TZ01" | jq

to untangle what happened in the resolution algorithm and why it failed.

In this case, we get:

{
  "id": "d6c06faa-d170-513c-940e-fe785efb3415",
  "name": "Pheidole sp. TZ01",
  "cardinality": 0,
  "matchType": "Exact",
  "results": [
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "1321654",
      "outlink": "https://gbif.org/species/1321654",
      "entryDate": "2024-01-11",
      "sortScore": 8.630727526355619,
      "matchedNameID": "08355229-3832-5df8-9b7f-8a98a7416537",
      "matchedName": "Pheidole Westwood, 1839",
      "matchedCardinality": 1,
      "matchedCanonicalSimple": "Pheidole",
      "matchedCanonicalFull": "Pheidole",
      "currentRecordId": "1321654",
      "currentNameId": "08355229-3832-5df8-9b7f-8a98a7416537",
      "currentName": "Pheidole Westwood, 1839",
      "currentCardinality": 1,
      "currentCanonicalSimple": "Pheidole",
      "currentCanonicalFull": "Pheidole",
      "taxonomicStatus": "Accepted",
      "isSynonym": false,
      "classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole",
      "classificationRanks": "kingdom|phylum|class|order|family|genus",
      "classificationIds": "1|54|216|1457|4342|1321654",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 0,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 1
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "11559931",
      "outlink": "https://gbif.org/species/11559931",
      "entryDate": "2024-01-11",
      "sortScore": 8.629125946856817,
      "matchedNameID": "9b66c191-984f-5f02-b544-7ad035a0e454",
      "matchedName": "Pheidole spec Santschi, 1933",
      "matchedCardinality": 1,
      "matchedCanonicalSimple": "Pheidole",
      "matchedCanonicalFull": "Pheidole",
      "currentRecordId": "11559931",
      "currentNameId": "9b66c191-984f-5f02-b544-7ad035a0e454",
      "currentName": "Pheidole spec Santschi, 1933",
      "currentCardinality": 1,
      "currentCanonicalSimple": "Pheidole",
      "currentCanonicalFull": "Pheidole",
      "taxonomicStatus": "Accepted",
      "isSynonym": false,
      "classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole|Pheidole spec",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "1|54|216|1457|4342|1321654|11559931",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 0,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 0
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "12044081",
      "outlink": "https://gbif.org/species/12044081",
      "entryDate": "2024-01-11",
      "sortScore": 8.629125946856817,
      "matchedNameID": "814c92d6-ca07-5b70-ae38-1765e550c97a",
      "matchedName": "Pheidole spec Mayr, 1870",
      "matchedCardinality": 1,
      "matchedCanonicalSimple": "Pheidole",
      "matchedCanonicalFull": "Pheidole",
      "currentRecordId": "12044081",
      "currentNameId": "814c92d6-ca07-5b70-ae38-1765e550c97a",
      "currentName": "Pheidole spec Mayr, 1870",
      "currentCardinality": 1,
      "currentCanonicalSimple": "Pheidole",
      "currentCanonicalFull": "Pheidole",
      "taxonomicStatus": "Accepted",
      "isSynonym": false,
      "classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole|Pheidole spec",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "1|54|216|1457|4342|1321654|12044081",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 0,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 0
      }
    }
  ],
  "curation": "AutoCurated"
}

Clone this wiki locally