Add hra muscular ntr by dosumis · Pull Request #3700 · obophenotype/uberon

dosumis · 2026-04-28T09:49:29Z

No description provided.

Four-stage pipeline for generating UBERON new term request ROBOT templates from HRA ASCTB unmapped term tables: Stage 1 (generate_template.py): reads xlsx/csv input, classifies parent IDs (UBERON/FMA/ASCTB-TEMP), assigns UBERON:99xxxxx provisional IDs, writes initial ROBOT template TSV + error and candidate reports. Stage 2 (group_terms_by_parent.py): groups template rows by parent and writes per-group JSON files for parallel subagent processing. Stage 3 (ntr-term-researcher agent): resolves FMA/ASCTB-TEMP parents via OLS4, checks for existing UBERON matches, writes Aristotelian definitions from Wikipedia, resolves is_a vs part_of relationship types. Stage 4 (merge_definitions.py): merges subagent outputs back into the template; appends confirmed/possible OLS4 matches to candidates report. Template columns: ID, LABEL, Definition, def_xref (definition annotation), is_a, part_of, In_subset, Date, Contributor, Present_in_taxon, Wikipedia_image (foaf:depiction), xref (direct oboInOwl:hasDbXref for Wikipedia article URL + FMA ID). Supporting agents/skills: - ntr-term-researcher: Stage 3 subagent spec - ontology-term-lookup: OLS4 structured search - fetch-wiki-info: Wikidata + Wikipedia lookup - .mcp.json: ols4, artl-mcp, playwright MCP servers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@dragon-ai-agent

… plans Phases covered: - Phase 2: grouping vs leaf-node term distinction (linguistic rules, subagent behaviour) - Phase 3: detect UBERON label-ID mismatches in Stage 1; new WRONG_PARENT: placeholder; multi-valued parent column splitting; subagent protocol for mismatch correction (informed by ovary run where 7/13 terms had wrong-domain UBERON parent IDs silently accepted) - Phase 4: scale to full muscular-system table - Phase 5: generalise to other ASCTB anatomical systems Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> @dragon-ai-agent

merge_definitions.py: - Fallback path (parent resolved but rel type unknown) now leaves both is_a and part_of blank rather than double-setting them, and lists affected term labels in the summary output under 'Relationship unresolved' for curator attention - Remove dead 'if jf.parent.name == "input"' guard — glob never matches files in subdirectories generate_template.py: - Remove dead write_tsv call with doubled headers that was immediately overwritten by the block below it - Fix counter order: use counter for ID, then increment (was: increment then use counter-1) - Remove hardcoded CONTRIBUTOR_IRI constant; add --contributor CLI arg with ORCID format validation; prompts interactively if not supplied group_terms_by_parent.py: - Remove derive_wikipedia_urls call and wikipedia_urls field from output JSON — parent_label is always "" so the call always returned []; the subagent derives Wikipedia URLs independently during lookup ntr-term-researcher.md: - Clarify that Wikipedia article page URL (not image URL) goes in xrefs at point of successful lookup, as Wikipedia:Article_Title - Add image relevance check: verify caption/alt text confirms the image illustrates the target structure before storing it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…name flagging Addresses issues found in the ovary branch test run where the agent: - classified layers (corpus luteum granulosa lutein/theca) as is_a parents (should be part_of) - accepted source-provided broad parents instead of finding more specific ones - left ASCTB-TEMP placeholders as the only def_xref (no real PMIDs) - did not flag pathological terms (hemorrhagic, luteinized unruptured) as out of scope - did not normalise non-standard names ('dominance' instead of 'dominant') ntr-term-researcher.md changes: - Step 1 expanded: after confirming source parent, agent must search OLS4 for a more specific parent (e.g. primary/secondary ovarian follicle vs generic ovarian follicle) - New Step 3: scope check (pathological/dysfunctional → out_of_scope) and name check (non-standard → name_corrections with curator-reviewable suggestion) - New Step 5: literature search — must find at least one real PMID/DOI for def_xref; ASCTB-TEMP placeholders explicitly disallowed as the only reference - Step 7 (relationship resolution) rewritten with explicit structural vocabulary: layers, zones, heads, bellies, parts, compartments, walls → ALWAYS part_of subtypes/stages/members of grouping classes → is_a Quick test ('is a kind of' vs 'is part of') with worked examples - Output JSON adds: def_xrefs_to_add, out_of_scope, name_corrections keys - Quality checks expanded with explicit rules for layers, pathology, naming merge_definitions.py changes: - Refactored load_subagent_outputs to return single dict (less argument tuple churn) - New behaviour: out_of_scope terms excluded from template (not just confirmed_matches); written to <name>-reports/out_of_scope.tsv for curator review - New behaviour: name_corrections applied to LABEL column; original-source mapping written to <name>-reports/name_corrections.tsv - New behaviour: def_xrefs_to_add appended to def_xref column with deduplication - Lookup helper accepts both source and corrected labels (agent may key by either) - Summary output extended with new counters CLAUDE.md changes: - Stage 3 description updated to enumerate the new agent responsibilities - QC checklist extended: real def_xref required, layer/part_of rule, out_of_scope and name_corrections review steps - Output Files Reference adds the two new report files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Surveyed 19 existing UBERON 'muscle of X' terms. 14 (74%) use the simple 'genus + part_of some Y' pattern with UBERON:0014892 (skeletal muscle organ, vertebrate) as genus. 3 use attaches_to_part_of, 2 lack logical definition. Decision gate passed: simple part_of pattern covers majority of existing convention. Phase 2 implementation will support genus + part_of only; attaches_to_part_of, innervated_by, and multi-axiom patterns deferred to future phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ates generate_template.py now classifies each input row as 'leaf' or 'group' using linguistic regex rules (GROUP_PATTERNS / LEAF_PART_PATTERNS in classify_term_type). - Leaf rows go to <name>.template.tsv with SC/part_of directives (existing) - Group rows go to <name>-groups.template.tsv with EC genus + EC part_of some location directives (new) — genus and location columns left blank for the agent to fill input.tsv gains a term_type column so curators can see the classification. Smoke-tested on muscular-system: 20 group / 55 leaf rows out of 75 input terms, matching ROADMAP prediction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

group_terms_by_parent.py now reads both template_initial.tsv and template_groups_initial.tsv. Leaf rows are grouped by parent UBERON ID as before. Grouping rows are pooled into a single 'grouping_terms' bucket since their genus + location values are agent-determined per term, not shared by a common parent. Each per-term entry includes term_type ('leaf' or 'group'). Each per-group JSON has a term_counts summary so curators can see the leaf/group split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

merge_definitions.py now merges subagent outputs into both the leaf and groups templates. Common fields (definitions, images, xrefs, def_xrefs) are applied identically; logic columns differ: - Leaf template: resolved_relationships -> is_a/part_of (existing) - Groups template: group_template_rows[label] -> {genus, location} populates the EC genus and EC part_of some location columns Group rows missing the agent's genus+location output are flagged 'EC incomplete' in the summary so curators can investigate. New report: manual_curation.tsv lists group terms the agent punted (couldn't fit the simple genus + part_of some Y pattern); includes proposed definition, reason, and similar UBERON terms found via obo-grep for curator reference. Refactored row processing into _apply_common_fields helper plus per-template merge functions (merge_leaf_template, merge_groups_template) so the two templates share definition/xref/image logic without duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ia obo-grep ntr-term-researcher.md updated to handle the leaf/group split introduced by Stage 1 pre-classification: - New top-of-file 'Term types' paragraph explaining the leaf vs group split - Input section documents term_type field, term_counts, GROUPING_TERMS bucket - Step 6 (Write Definitions) now branches: leaf gets Aristotelian form, group gets collective form ('A group of muscles that...') - Step 7 (Resolve Relationship Types) explicitly LEAF-only - New Step 8 for GROUP terms: use awk over uberon-edit.obo to find similar group terms; if they use 'genus + part_of some Y' pattern, populate group_template_rows[label] with {genus, location}; otherwise punt to manual_curation with similar UBERON stanzas as curator reference - Output JSON gains group_template_rows and manual_curation keys - Quality checks updated: every group term must end up in either group_template_rows OR manual_curation - Tools section notes obo-grep.pl may not be in PATH; awk fallback documented CLAUDE.md updated with the dual-template flow: - Stage 1 documents the term_type pre-classification - Stage 3 enumerates the new agent responsibilities (steps 8 and 9) - QC checklist split: shared / leaf-template / groups-template / reports - Final Delivery registers both templates in uberon-odk.yaml - Output Files Reference includes new groups template + manual_curation.tsv - Column reference table now has separate sections for leaf and groups ROADMAP marks Phase 2 implementation complete (pending end-to-end agent test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

10 input terms processed by the new dual-template flow: - Stage 1: pre-classified into 8 leaf + 2 group rows - Stage 2: 8 group JSONs (7 leaf parent groups + 1 grouping_terms bucket) - Stage 3: 8 ntr-term-researcher agents (3 needed retry due to API stream timeouts; grouping_terms handled inline after retry stalled) - Stage 4: dual-template merge Final output: - 4 NEW leaf terms with Aristotelian definitions, real PMID/DOI refs, resolved is_a/part_of, Wikipedia + FMA xrefs - 0 NEW group terms — both 'anterior vertebral muscle' and 'circular pharyngeal muscle' resolved to existing UBERON terms via Step 2 OLS4 check, validating that confirmed_match takes precedence over Step 8 EC pattern detection - 5 confirmed_match (5 of 10 input terms already exist in UBERON): - ceratoglossus → UBERON:0008428 - chondroglossus muscle → UBERON:0011535 - costal diaphragm → UBERON:0035831 - aryepiglotticus muscle → UBERON:0011025 - constrictor muscle of pharynx → UBERON:0001569 (NEW from group flow) - 1 possible_match: anterior vertebral muscle ≈ UBERON:0008549 (prevertebral muscle) — flagged for curator decision (synonym vs new term) - 1 manual_curation: anterior vertebral muscle — UBERON's similar term uses attaches_to + innervated_by axioms, not the simple genus + part_of pattern, so out of EC template scope The test exercises: leaf+INFER parent, leaf+NEEDS_MAPPING:FMA parent, leaf+UNRESOLVABLE parent (including wrong-domain), group→confirmed_match, group→possible_match+manual_curation. All paths working as designed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ai4c-agent · 2026-04-28T09:50:14Z

Claude finished @dosumis's task in 5m 44s —— View job

PR Review: Add HRA Muscular NTR

Review checklist:

Hierarchy and parents are consistent
Definitions and supporting references look adequate
Obsolete or replacement handling is safe
OBO and ODK conventions appear preserved
No obvious CI, auth, or workflow regression

Summary

This PR adds ROBOT template infrastructure and four new NTR (New Term Request) leaf terms for HRA (Human Reference Atlas) muscular-system structures, along with a bulk NTR workflow toolkit. The term-level research appears careful and the definition quality is high. However, there are two issues that should be addressed before merge, and several important follow-on items.

🔴 CRITICAL

1. New templates not registered in ODK config — they will never compile

src/templates/hra-muscular.template.tsv and src/templates/hra-muscular-groups.template.tsv are added, but neither appears in src/ontology/uberon-odk.yaml nor has a corresponding rule in src/ontology/uberon.Makefile.

The precedent is hra-skeleton.template.tsv, which is listed under use_template: true in uberon-odk.yaml and has a custom $(COMPONENTSDIR)/hra_skeleton.owl rule in uberon.Makefile. Without equivalent registration the four NTR terms will never be built into the ontology.

2. dorsal part of intertransversarii laterales lumborum muscle — relationship type mismatch between workflow output and template

The definition research output (bulk_ntr_workflow/outputs/definitions/unresolvable_meningohypophyseal_artery.json) explicitly says:

"Note that 'dorsal part of' implies a subdivision, hence part_of relationship rather than is_a."

and sets resolved_relationships: "part_of". Yet src/templates/hra-muscular.template.tsv (row 6, UBERON:9900010) places UBERON:0008242 in the is_a column, not the part_of column. The research output and the final template are in direct conflict. Curator should decide the correct relationship and make the template consistent.

🟡 IMPORTANT

3. clavicular head of pectoralis major muscle (UBERON:9900008) — no is_a in template

Row 5 of hra-muscular.template.tsv has an empty is_a column and only part_of UBERON:0002381. In OBO format every term needs at least one explicit is_a (or a logical definition from which the reasoner can infer one). The template currently produces a term with only a BFO:0000050 some pectoralis_major SubClassOf axiom and no direct type. A parent such as "muscle head" (if such a class exists or should be created) or at minimum a generic skeletal muscle parent should be added.

4. ASCTB-TEMP IRIs in the def_xref column

All four terms carry https://purl.org/ccf/ASCTB-TEMP_<label> in the def_xref column (the column mapped to oboInOwl:hasDbXref on the definition). These are ephemeral provenance tracking IRIs from the HRA ASCT+B tables, not stable literature citations. They should be moved to the general xref column (or dropped) so they are not asserted as definition sources.

For example, anterior cervical intertransversarii muscle cites:

https://fipat.library.dal.ca/ta2/|ISBN:9780323393225|https://purl.org/ccf/ASCTB-TEMP_anterior-cervical-intertransversarii-muscle|PMID:12650404

The ASCTB-TEMP IRI is the third token in the definition xref list.

5. Missing term_tracker_item in both templates

Per CLAUDE.md and Uberon conventions, terms should link back to the originating GitHub issue with a term_tracker_item property. Neither template has a column for this. The hra-skeleton precedent and general UBERON practice require it for traceable provenance.

6. hra-muscular-reports/ placed under src/templates/

src/templates/hra-muscular-reports/ contains pipeline outputs (candidates.tsv, errors.tsv, input.tsv, manual_curation.tsv), not template files. These will confuse any tool that iterates src/templates/*.tsv or src/templates/**/*.tsv. Reports should live in bulk_ntr_workflow/outputs/ or similar, not inside the template directory.

🔵 SUGGESTIONS

7. Binary Excel file in repo root

hra_unmapped-asct-term-list-with-refs.xlsx (277 KB) is added to the repository root. Binary blobs in a git repo grow the clone size permanently. If this is source data, consider storing it under src/datasets/ with a brief README, or keeping it outside git (linked from the issue tracker).

8. ISBN:9780323393225 as a def_xref

ISBN references are non-standard in OBO/UBERON; PMID: and doi: are the expected citation forms. The TA2 URL (https://fipat.library.dal.ca/ta2/) is already included and serves as the Terminologia Anatomica reference. The ISBN could be removed from def_xref or moved to a comment.

9. FMA parent for articularis genu (UBERON:0004252) is very broad

FMA:22424 "Muscle of anterior compartment of thigh" has no UBERON equivalent, so the workflow fell back to UBERON:0004252 "hindlimb stylopod muscle". The fallback is documented in the definition output and the errors report, but UBERON:0004252 is a very coarse grouping. The definition output itself suggests considering UBERON:0001377 (quadriceps femoris) and explicitly notes that "articularis genu is anatomically discrete from the quadriceps". Curator should confirm the fallback is acceptable or note it for a follow-up parent term request.

10. @playwright/mcp@latest in .mcp.json is a floating version

Using @latest means the Playwright MCP server version is not pinned. Different contributors will pull different versions, which can affect reproducibility of the research workflow. Pinning to a specific semver is safer for shared infrastructure.

11. bulk_ntr_workflow/ adds substantial non-ontology content

The PR adds 1,700+ lines of Python scripts, JSON outputs, and documentation for the bulk NTR pipeline. This infrastructure is valuable but may belong in a separate utilities repository or be .gitignored when done. If it stays, the bulk_ntr_workflow/CLAUDE.md should be moved to somewhere that does not accidentally extend the project-level CLAUDE.md scope.

Per-term verdict

NTR ID	Label	Parent	Status
UBERON:9900001	anterior cervical intertransversarii muscle	`is_a` UBERON:0002324 (muscle of back)	✅ Reasonable; parent broad but defensible
UBERON:9900003	articularis genu muscle	`is_a` UBERON:0004252 (hindlimb stylopod muscle)	🟡 Fallback parent; FMA:22437 xref looks correct
UBERON:9900008	clavicular head of pectoralis major muscle	`part_of` UBERON:0002381 (pectoralis major)	🔴 No `is_a`
UBERON:9900010	dorsal part of intertransversarii laterales lumborum muscle	template: `is_a` UBERON:0008242; workflow output: `part_of` UBERON:0008242	🔴 Relationship type conflict

Confirmed matches correctly excluded from template: aryepiglottic part of oblique arytenoid muscle (→ UBERON:0011025), costal part of respiratory diaphragm muscle (→ UBERON:0035831), ceratoglossus of hyoglossus muscle (→ UBERON:0008428), chondroglossus of hyoglossus muscle (→ UBERON:0011535), circular pharyngeal muscle (→ UBERON:0001569).

The term research quality and definition writing are solid. The main blockers are the missing ODK registration (templates won't compile without it) and the relationship-type inconsistency on UBERON:9900010. Addressing the missing is_a on UBERON:9900008 and the ASCTB-TEMP def_xrefs would bring the templates into line with Uberon conventions.

ai4c-agent

Two blocking issues before merge:

Templates not wired into the build — hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. Without equivalent entries to the hra-skeleton precedent these terms will never compile into the ontology.
Relationship-type conflict on UBERON:9900010 — the definition workflow output (unresolvable_meningohypophyseal_artery.json) explicitly recommends part_of for dorsal part of intertransversarii laterales lumborum muscle but the template uses is_a. These need to be reconciled.

Also flagged as important (see full comment):

UBERON:9900008 (clavicular head) has no is_a, only part_of
ASCTB-TEMP IRIs in the def_xref column should be in xref
term_tracker_item missing from both templates
Report files under src/templates/hra-muscular-reports/ should not live inside the template directory

ai4c-agent

Two blocking issues before merge:

Templates not wired into the build — hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. Without equivalent entries to the hra-skeleton precedent these terms will never compile into the ontology.
Relationship-type conflict on UBERON:9900010 — the definition workflow output (unresolvable_meningohypophyseal_artery.json) explicitly recommends part_of for dorsal part of intertransversarii laterales lumborum muscle but the template uses is_a. These need to be reconciled.

Also flagged as important (see full comment):

UBERON:9900008 (clavicular head) has no is_a, only part_of
ASCTB-TEMP IRIs in the def_xref column should be in xref
term_tracker_item missing from both templates
Report files under src/templates/hra-muscular-reports/ should not live inside the template directory

ai4c-agent

Two blocking issues before merge. 1) Templates not wired into the build: hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. 2) Relationship-type conflict on UBERON:9900010: workflow output says part_of but template uses is_a. See full review comment for details.

dosumis and others added 10 commits April 27, 2026 15:01

ai4c-agent Bot requested changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hra muscular ntr#3700

Add hra muscular ntr#3700
dosumis wants to merge 10 commits intomasterfrom
add-hra-muscular-ntr

dosumis commented Apr 28, 2026

Uh oh!

ai4c-agent Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

ai4c-agent Bot left a comment

Uh oh!

ai4c-agent Bot left a comment

Uh oh!

ai4c-agent Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dosumis commented Apr 28, 2026

Uh oh!

ai4c-agent Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add HRA Muscular NTR

Summary

🔴 CRITICAL

🟡 IMPORTANT

🔵 SUGGESTIONS

Per-term verdict

Uh oh!

ai4c-agent Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ai4c-agent Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ai4c-agent Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ai4c-agent Bot commented Apr 28, 2026 •

edited

Loading