-
Notifications
You must be signed in to change notification settings - Fork 0
Extending Profile Coverage
Matt Thompson edited this page Jun 30, 2025
·
2 revisions
The following is a basic workflow for updating profiles in TaxonoPy to improve case coverage. Note that it uses paths and resources specific to the Ohio Supercomputer Center, and modifications may be needed for other contexts.
- Execute TaxonoPy on a dataset:
taxonopy resolve \
--input /fs/ess/PAS2136/TreeOfLife/annotations/source_taxa/source=bioscan \
--output-dir ./temp_out \
--output-format parquet
The default behavior here will be to force unresolvable entries to use the input data.
- Scan outputs for
final_status_force_acceptedorfinal_status_failed_forced_input(of statuses by name should be available inresolution_stats.jsonin the output directory in addition to the printed message). Note that the status will be capital camel case.
parquet cat temp_out/*.parquet | grep "FAILED_FORCED_INPUT" | head | jq
e.g. output (among others):
{
"uuid": "90a7dba8-702c-48f7-9f9b-b3aaa68009de",
"scientific_name": "Pheidole sp. TZ01",
"common_name": "",
"source_dataset": "bioscan",
"source_id": "GMNAZ5929-22",
"resolution_status": "FAILED_FORCED_INPUT",
"kingdom": "Animalia",
"phylum": "Arthropoda",
"class": "Insecta",
"order": "Hymenoptera",
"family": "Formicidae",
"genus": "Pheidole",
"species": "Pheidole sp. TZ01",
"resolution_path": "RESOLVED",
"resolution_strategy": "ForceFailedToInput",
"final_query_term": "Pheidole sp. TZ01",
"final_query_rank": "species",
"final_data_source_id": 11,
# ...
}
- Use the
gnverifierCLI to investigate the case further. This can be installed using any method outlined in the gnverifier CLI installation instructions or the Docker image. On HPC, an Apptainer image may be retrieved with:
apptainer pull /fs/ess/PAS2136/apps/gnverifier_latest.sif docker://gnames/gnverifier:latest
This is already done on OSC.
For use on OSC, in your ~/.bashrc, add:
alias gnverifier='apptainer --silent exec --bind /etc/ssl:/etc/ssl --bind /etc/pki:/etc/pki /fs/ess/PAS2136/apps/gnverifier_latest.sif gnverifier'
and do source ~/.bashrc. Then you can enter CLI commands that replicate TaxonoPy queries which point to this image to run the container, such as:
gnverifier -s 11 -j 1 --format compact --all_matches --capitalize "Pheidole sp. TZ01" | jq
to untangle what happened in the resolution algorithm and why it failed.
In this case, we get:
{
"id": "d6c06faa-d170-513c-940e-fe785efb3415",
"name": "Pheidole sp. TZ01",
"cardinality": 0,
"matchType": "Exact",
"results": [
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "1321654",
"outlink": "https://gbif.org/species/1321654",
"entryDate": "2024-01-11",
"sortScore": 8.630727526355619,
"matchedNameID": "08355229-3832-5df8-9b7f-8a98a7416537",
"matchedName": "Pheidole Westwood, 1839",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Pheidole",
"matchedCanonicalFull": "Pheidole",
"currentRecordId": "1321654",
"currentNameId": "08355229-3832-5df8-9b7f-8a98a7416537",
"currentName": "Pheidole Westwood, 1839",
"currentCardinality": 1,
"currentCanonicalSimple": "Pheidole",
"currentCanonicalFull": "Pheidole",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole",
"classificationRanks": "kingdom|phylum|class|order|family|genus",
"classificationIds": "1|54|216|1457|4342|1321654",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 0,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
},
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "11559931",
"outlink": "https://gbif.org/species/11559931",
"entryDate": "2024-01-11",
"sortScore": 8.629125946856817,
"matchedNameID": "9b66c191-984f-5f02-b544-7ad035a0e454",
"matchedName": "Pheidole spec Santschi, 1933",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Pheidole",
"matchedCanonicalFull": "Pheidole",
"currentRecordId": "11559931",
"currentNameId": "9b66c191-984f-5f02-b544-7ad035a0e454",
"currentName": "Pheidole spec Santschi, 1933",
"currentCardinality": 1,
"currentCanonicalSimple": "Pheidole",
"currentCanonicalFull": "Pheidole",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole|Pheidole spec",
"classificationRanks": "kingdom|phylum|class|order|family|genus|species",
"classificationIds": "1|54|216|1457|4342|1321654|11559931",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 0,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 0
}
},
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "12044081",
"outlink": "https://gbif.org/species/12044081",
"entryDate": "2024-01-11",
"sortScore": 8.629125946856817,
"matchedNameID": "814c92d6-ca07-5b70-ae38-1765e550c97a",
"matchedName": "Pheidole spec Mayr, 1870",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Pheidole",
"matchedCanonicalFull": "Pheidole",
"currentRecordId": "12044081",
"currentNameId": "814c92d6-ca07-5b70-ae38-1765e550c97a",
"currentName": "Pheidole spec Mayr, 1870",
"currentCardinality": 1,
"currentCanonicalSimple": "Pheidole",
"currentCanonicalFull": "Pheidole",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Animalia|Arthropoda|Insecta|Hymenoptera|Formicidae|Pheidole|Pheidole spec",
"classificationRanks": "kingdom|phylum|class|order|family|genus|species",
"classificationIds": "1|54|216|1457|4342|1321654|12044081",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 0,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 0
}
}
],
"curation": "AutoCurated"
}