Conversation
Adds bulk_ntr_workflow/ — a 4-stage pipeline for converting HRA ASCTB
unmapped terms into UBERON-ready ROBOT templates with QC reports.
Stages:
1. generate_template.py — reads xlsx/csv, classifies parent IDs
(UBERON/FMA/ASCTB-TEMP), assigns UBERON:99xxxxx NTR IDs, writes
initial template to src/templates/ and TSV reports to <name>-reports/
2. group_terms_by_parent.py — groups by parent for parallel subagents
3. ntr-term-researcher subagent — OLS4 existing-term check, Wikipedia
definition lookup, relationship inference, FMA+ASCTB-TEMP parent resolution
4. merge_definitions.py — merges subagent JSON back into template
Also adds:
- .claude/agents/ntr-term-researcher.md (Stage 3 subagent)
- .claude/agents/ontology-term-lookup.md (OLS4 helper subagent)
- .claude/skills/fetch-wiki-info/ (Wikipedia/Wikidata skill)
- .mcp.json (OLS4, playwright, artl-mcp servers)
- src/templates/hra-muscular.template.tsv (initial Stage 1 output, muscular system)
- src/templates/hra-muscular-reports/ (input, errors, candidates TSVs)
Signed-off-by: @dragon-ai-agent
Adds --limit arg to generate_template.py and runs the 4-stage pipeline on the first 10 muscular-system terms from hra_unmapped-asct-term-list: Stage 1: generated hra-muscular-test template + reports Stage 2: 8 groups (all 3 parent-ID types + wrong parent cases) Stage 3: 8 parallel subagents: OLS4 matching, Wikipedia lookup, definitions Stage 4: merged into 6-row final template; 4 confirmed matches excluded Key findings: - 4 confirmed UBERON matches: ceratoglossus (UBERON:0008428), chondroglossus (UBERON:0011535), costal diaphragm (UBERON:0035831), aryepiglotticus (UBERON:0011025) - FMA parents resolved: FMA:22424→UBERON:0004252, FMA:46583→UBERON:0008575 - Wrong parents corrected: UBERON:0000975 (sternum)→UBERON:0001103 (diaphragm); meningohypophyseal artery→UBERON:0008242 (lower back muscle) - All 6 new terms have content-rich Aristotelian definitions and resolved relationship types (no PENDING/INFER remaining) Also fixes merge_definitions.py to tolerate field-name variations in subagent JSON outputs (ntr_label/label, matched_id/uberon_id, existing_term_match dict form). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new 'xref' column (A oboInOwl:hasDbXref SPLIT=|) to the ROBOT template for direct term annotations, distinct from def_xref which annotates only the definition. The column holds pipe-separated values populated by the subagent: Wikipedia:Article_Title — article page for the specific term FMA:NNNNN — FMA ID from OLS4 cross-references Changes: - generate_template.py: new TEMPLATE_HEADERS/DIRECTIVES entry + blank cell - merge_definitions.py: reads 'xrefs' key from subagent JSON; COL_TERMREF=11 - ntr-term-researcher.md: documents when/how to populate xrefs output key - CLAUDE.md: adds xref row to column reference table Xrefs column is empty in the current test run; subagents on future groups will populate it once they record Wikipedia article URLs + FMA IDs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a term's own IRI in the source table is an FMA IRI, Stage 1 now pre-populates the xref column with the extracted FMA:NNNNN curie rather than leaving it to the subagent alone. Stage 4 merge appends subagent-provided xrefs (Wikipedia URL + FMA IDs found via OLS4) to the pre-existing cell value, deduplicating by pipe-segment, so neither source overwrites the other. This covers the case where future input files have FMA IRIs in the 'as' column. The current ASCTB-TEMP-only dataset produces empty xref cells in Stage 1 as before; subagent xrefs continue to be added in Stage 4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviews all 75 muscular-system terms; classifies ~22 as grouping terms and ~49 as leaf nodes using latent knowledge. Documents linguistic cues, borderline cases, required workflow changes (term_type column, subagent behaviour split), draft rule set, and Phases 3-5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Processes ovarycsv.csv via the bulk NTR workflow: - Stage 1: generates hra-ovary.template.tsv from 13 ASCTB-TEMP ovary terms - Stage 3: 9 subagent groups resolve definitions, correct wrong parent IDs (source CSV had anatomically wrong parent IDs — bones for follicle terms — subagents found correct UBERON parents via OLS4 and biological context) - Stage 4: merges definitions; 0 PENDING, 0 INFER remaining New terms (UBERON:9900100–9900112): corona radiata, corpus luteum granulosa lutein/theca layers, cumulus oophorus oocyte complex, dominant/early/pre-selection/selection antral follicles, hemorrhagic anovulatory follicle, luteinized unruptured follicle, multilayer/transitional primary/primordial ovarian follicles Also: - Fix generate_template.py to handle UTF-8 BOM in CSV input (utf-8-sig) - Add hra-ovary-prefixes.owl (declares foaf:depiction for Wikipedia images) - Register hra_ovary.owl in uberon-odk.yaml and uberon.Makefile Curator notes (see candidates.tsv): - corona radiata: label collision with UBERON:0004682 synonym; consider "corona radiata of cumulus oophorus" as primary label - cumulus oophorus oocyte complex: possible overlap with UBERON:0001306 - multilayer ovarian follicle: possible overlap with UBERON:0000036 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> @dragon-ai-agent
Source input file from HRA ASCTB unmapped terms for the ovary table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> @dragon-ai-agent
|
Claude finished @dosumis's task in 5m 33s —— View job PR Review: Add HRA Ovary NTRReview checklist:
SummaryThis PR adds 13 new ovary NTR terms (UBERON:9900100–9900112) via a ROBOT template, together with the bulk NTR workflow tooling used to generate them. The workflow approach is well-structured, but there are two critical ontological errors in the final template that must be fixed before merge, plus several important hierarchy and metadata issues. 🔴 CRITICAL1. Corpus luteum layers classified as types of corpus luteum (
Fix: move 🟡 IMPORTANT2. The COC is a structure that resides within an antral follicle; it is not a type of ovarian follicle (UBERON:0001305). The correct relationship is 3. Missing real def_xrefs for three terms
4. Parent is 5. Parent is 6. Pathological terms may not belong in UBERON
7. Non-standard term name: The standard anatomical term is dominant antral follicle, not "dominance". The source HRA data used "dominance" but UBERON should normalise to the standard form; "dominance antral follicle" can be kept as a synonym if needed. 🔵 SUGGESTION8.
9. This source file ( 10. Adding 11. Intermediate workflow artifacts in This directory contains many per-group JSON files that are processing intermediates. Consider adding 12. Missing trailing newline in The file still ends without a trailing newline after the PR's additions. Merge recommendation🔴 Request changes — the two corpus-luteum-layer hierarchy errors (#1) and the COC hierarchy error (#2) are logical axiom mistakes that will produce incorrect OWL class assertions once the template is run, and the missing real references (#3) violate UBERON metadata policy. These must be fixed before the template is finalised. |
There was a problem hiding this comment.
This PR adds 13 new HRA ovary NTR terms and bulk NTR workflow tooling. There are two critical ontological errors that must be fixed before merge:
-
Corpus luteum layers in wrong relationship (rows 4–5 of hra-ovary.template.tsv): UBERON:9900101 and UBERON:9900102 use
is_a UBERON:0002512(corpus luteum) but should usepart_of UBERON:0002512— a layer is part of the corpus luteum, not a type of it. -
Cumulus oophorus oocyte complex hierarchy (row 6): UBERON:9900103 is classified
is_a UBERON:0001305(ovarian follicle) but the COC is part of a follicle, not a type of follicle. Should bepart_of UBERON:0001305.
Additionally: three terms lack real definition xrefs (only have ASCTB-TEMP placeholders); transitional primary ovarian follicle should use UBERON:0000035 as parent rather than the generic UBERON:0001305; dominance antral follicle should be renamed to dominant antral follicle; and two terms (hemorrhagic anovulatory follicle, luteinized unruptured follicle) describe pathological states that may not belong in a normal-anatomy ontology.
See full review in the PR comment for details on all issues.
Testing agentic workflow for adding term bulk requests from HRA as ROBOT templated terms.