Skip to content

NEW: add report to extract-reads enabling interrogation of why reads were/were not extracted#222

Open
gregcaporaso wants to merge 5 commits intoqiime2:devfrom
gregcaporaso:add-extract-reads-report
Open

NEW: add report to extract-reads enabling interrogation of why reads were/were not extracted#222
gregcaporaso wants to merge 5 commits intoqiime2:devfrom
gregcaporaso:add-extract-reads-report

Conversation

@gregcaporaso
Copy link
Copy Markdown
Member

Replacing #221.

Adds a second output `read_extraction_stats` (ImmutableMetadata) to the
`extract_reads` action. The new artifact is a per-input-sequence TSV
reporting: extraction outcome, match orientation and method (exact vs
approximate), forward/reverse primer binding positions in the
matched-orientation sequence, per-primer match percentages, amplicon
length before and after trimming, and input sequence length.

Adds two unit tests asserting that _gen_reads returns the same amplicon when given a sequence and its reverse complement (read_orientation='both'), covering both the exact-match and approximate-match code paths and asserting the expected match-method and match-orientation stats.

Updated internal helpers: `_align_primer` now returns a named tuple
with primer binding coordinates; `_exact_match` and `_approx_match`
return richer result tuples. `_gen_reads` always returns a
(amplicon, stats) pair.

Note from @gregcaporaso: Claude was used in the development of the code
and the tests. I did a full review of the code and tests before passing
this along for code review by another human. Claude's tests were a bit
underwhelming - that is where I made most of my modifications. For
example, all values in test_extract_reads_stats_all_extracted were
hand-calculated by me (Claude previously had very general tests of the
output stats).

This is my first use of Claude for new functionality going into a QIIME
2 plugin. The new output table has already been very helpful for me in a
few exploratory analyses that I've run, and I next hope to use this to
validate that some code simplifications that I want to make in
extract_reads don't impact the results in ways that I don't want them to
(this was very hard to assess in my previous recent work on this
action).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gregcaporaso gregcaporaso force-pushed the add-extract-reads-report branch from 3d88165 to 86fb4a2 Compare May 1, 2026 22:29
@gregcaporaso gregcaporaso requested a review from colinvwood May 1, 2026 22:31
@colinvwood colinvwood self-assigned this May 1, 2026
@colinvwood colinvwood moved this to In Review in 2026.7 🐐 May 1, 2026
@github-project-automation github-project-automation Bot moved this from In Review to Ready for Merge in 2026.7 🐐 May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Merge

Development

Successfully merging this pull request may close these issues.

3 participants