Bidirectional AnnData <-> Seurat/SCE conversion (v0.3.0) by Huangy57 · Pull Request #4 · settylab/convert2anndata

Huangy57 · 2026-05-19T19:54:11Z

Summary

Adds bidirectional conversion to convert2anndata (v0.3.0): in addition
to SCE/Seurat -> AnnData, the package now converts AnnData (.h5ad) into
Seurat or SingleCellExperiment objects, with cli_convert() dispatching
on file extension.

New: convert_anndata_to_seurat(), convert_anndata_to_sce(), the
extract_anndata_* family, attach_reductions_seurat(), and the
setup/check/diagnose_anndata_python() reticulate helpers. Includes
Seurat 5 layer/slot robustness and use_raw='auto'. ~10 new test files
plus a standalone eval harness.

This PR also includes PR-polish on top of the feature commits:

Export extract_anndata_obsp() / extract_anndata_raw() for parity
with the rest of the extract_anndata_* family; regenerate
NAMESPACE + man pages.
CI: Python 3.8 -> 3.11, setup-python v2 -> v5, and pin
RETICULATE_PYTHON across all steps so the AnnData tests run instead
of silently skipping.
pkgdown: bidirectional description + grouped reference index.
Add NEWS.md (0.3.0).

Validation

Validated with R 4.4.1, Seurat 5.4.0, SingleCellExperiment 1.28.1,
reticulate 1.45.0, Python anndata 0.12.0:

R CMD check: 0 errors, 0 warnings, 3 benign notes.
testthat: 387 pass, 0 fail, 1 skip (CLI subprocess test, skips
until the package is installed on the libpath).
Eval harness (roundtrip, edge cases, real pbmc3k, new-vs-original
comparison) all pass.
Quantitative roundtrip fidelity: 22 pass / 3 known-gap / 0 fail.

Inspectable evidence (logs + fidelity table) is attached to
[settylab/sarah-nexus#18](https://github.com/settylab/sarah-nexus/issues/18).

Known limitations

The Seurat-mediated path (AnnData -> Seurat -> ...) cannot preserve
arbitrary extra layers or var metadata — a Seurat assay has only
counts/data/scale.data slots. Use the SCE-mediated path when
verbatim layers / var matter.
convert_anndata_to_sce() does not yet attach obsp as colPairs
(the Seurat target does attach obsp as graphs). Planned follow-up.

Test plan

CI green on the updated workflow (Python 3.11, reticulate pinned).
R CMD check clean locally.
pkgdown site builds with the new reference index.

Standalone script merged: convert_anndata_to_seurat() now accepts a .h5ad path or an in-memory AnnData. setup_anndata_python() reproduces the script's conda/RETICULATE_PYTHON/CONDA_PREFIX resolution chain. The standalone convert_anndata2seurat.R becomes a thin shim. Hardcoded knobs made flexible: * counts_layer accepts a vector of candidates (default covers "counts", "raw_counts", "raw_count"). * attach_reductions_seurat() exposes a user-extensible reduction_map; default_reduction_map() covers the common scanpy embeddings; reduction_map = list(<key> = list(name = NA)) disables a default. * assay name override propagates to data layer, var feature meta, and reductions. * orig.ident resolution: explicit arg -> uns$conversion_source -> file basename -> "AnnData". New scanpy-shaped data: * use_raw = c("auto", "always", "never") wires adata.raw into the Seurat counts layer, including the typical scanpy pattern where raw carries the unfiltered gene set. * attach_obsp = TRUE maps adata.obsp graphs (connectivities, distances) to Seurat Graphs. Python env hardening (the user's biggest concern): * check_anndata_python() runs five layered probes: interpreter reachable, PYTHONPATH not pinning a different Python's site-packages (the cryptic HPC numpy-source-directory error), anndata importable, numpy importable, end-to-end smoke probe. Each failure raises a tailored, actionable message naming the selected interpreter and the one-line fix. * diagnose_anndata_python() prints a one-shot env snapshot (python, version, anndata, numpy, RETICULATE_PYTHON, CONDA_PREFIX, PYTHONPATH). * setup_anndata_python() now resolves a name against ~/micromamba/envs/, ~/miniconda3/envs/, ~/anaconda3/envs/, and $MAMBA_ROOT_PREFIX/envs/, and validates by default. * Conversion path entrypoints auto-call check_anndata_python() before read_h5ad(), so the user never sees the original cryptic numpy error from deep inside Python. * .onAttach() prints a one-liner pointing at diagnose_anndata_python(). Real bugs uncovered & fixed: * Seurat sanitises feature names ('gene_01' -> 'gene-01') during CreateSeuratObject; previously SetAssayData(layer = 'data') and the var feature-meta assignment failed with "No feature overlap" whenever inputs contained underscores. Realign rownames/colnames to the seurat object's actual ones after construction. * convert_seurat_to_sce: ref_features/ref_cells were overwritten per iteration with the *current* assay's, defeating the mismatched-assay -> altExp routing. Set them once from the default assay. * convert_to_anndata: process_other_assays(sce) was missing the assayName argument, so the assay used as X was duplicated under layers. Pass assayName through. * anndata_mapping_keys() helper handles both the older-reticulate Python-KeysView and the newer R-character-vector return shapes. Without this, single-key layers/obsm with name "raw_count" got split character-by-character ('r','a','w','_'...) corrupting layer detection and SCE exclude lists. Test coverage (all green): * 35 comprehensive tests covering matrix dtypes, obs/var fidelity with NAs and factors, layer fallback chain, reduction map override and disable, orig.ident precedence ladder, file-path entry, custom assay, full round-trip, edge sizes. * 12 tests for use_raw / obsp / Seurat Graphs. * 4 realistic scanpy-shaped tests (1000 cells, raw + layers + obsm + obsp + uns), including backed='r' read. * 14 tests for check_anndata_python and diagnose_anndata_python covering each failure branch via mocking. * Pre-existing failures fixed in process_layers, ensure_csparse_matrix, process_main_assay, sparse_matrix_conversion, convert_seurat_to_sce, cli_convert (gracefully skips when not installed). R CMD check: 0 errors, 0 warnings, 3 NOTEs (pre-existing housekeeping items: .github dir, sandbox clock, LICENSE not in DESCRIPTION). testthat: 387 expectations, 0 failures, 0 errors. 4 standalone eval scripts (eval_new_pkg, eval_compare, eval_roundtrip, eval_edge_cases) all green. Readme: new "Python environment" section plus a Troubleshooting subsection covering the common reticulate failure modes (no Python found, anndata not installed, PYTHONPATH pollution, silent typos in conda env names). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aw='auto' Two issues surfaced when running the same test suite against a second R + Python configuration (micromamba r_env: R 4.4.1 + Python 3.11 + anndata 0.12 + Seurat 5.x with strict-mode SeuratObject): * SeuratObject >= 5.0.0 made `slot=` defunct (was deprecated in 5.0.0, now errors). Three call sites still passed `slot = "counts"`: extract_counts_matrix, identify_alt_exps_seurat, attach_alt_experiments_sce. Switch to `layer = "counts"` -- accepted by both pre-5.0 and >= 5.0. * use_raw = "auto" was using adata.raw whenever no counts_layer matched. On real scanpy datasets like pbmc3k_processed, raw retains the unfiltered gene set (13714 genes) while adata.X is the analysis-ready filtered set (1838 genes). Auto silently swapped the user's filtered Seurat object for the unfiltered one. Now "auto" only uses raw when raw and adata.X share the gene set (the `adata.raw = adata` "save-raw-then-normalize" pattern). When they differ, prefer adata.X and emit a one-line notice; users who explicitly want the unfiltered raw can pass use_raw = "always". Verified against the real pbmc3k_processed h5ad: 2638 cells x 1838 filtered genes, 8 louvain clusters preserved, all 4 obsm reductions (pca, tsne, umap, draw_graph_fr) attached with exact-match embeddings, both kNN graphs (connectivities + distances) attached as Seurat Graphs, and downstream FindNeighbors -> FindClusters succeeds. testthat: 387 expectations, 0 failures, 0 errors under both configurations (r_env and fhR + scvi_env). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ormly - Export `extract_anndata_obsp()` and `extract_anndata_raw()` for parity with the rest of the `extract_anndata_*` family (X / layers / obsm were already exported). Regenerate NAMESPACE + man pages. - pkgdown: bidirectional description on the home page + grouped reference index (Bidirectional conversion / Python environment / AnnData component extractors / Attaching components / Internal building blocks). All 35 exports covered. - NEWS.md: new 0.3.0 entry covering bidirectional conversion, CLI dispatch by extension, reticulate helpers, Seurat 5 layer/slot robustness, `use_raw='auto'`, expanded tests / eval harness, validation results, and the documented known limitations. - DESCRIPTION: bump RoxygenNote 7.3.1 -> 7.3.3 (regenerated by devtools::document() under roxygen2 7.3.3); update URL to list the GitHub repo + the pkgdown site (required by pkgdown::check_pkgdown).

- Bump `actions/setup-python` v2 -> v5 and pin Python 3.8 (EOL) -> 3.11. Modern `anndata` will not install on 3.8. - Export `RETICULATE_PYTHON` from `steps.setup-python.outputs.python-path` via `$GITHUB_ENV` so it persists to the test/coverage step. Previously it was set only on the "Install Python dependencies" step (and to a hard-coded /opt/hostedtoolcache path that no longer exists), so reticulate had no interpreter pinned at test time and the new AnnData <-> Seurat/SCE tests silently skipped.

codecov · 2026-05-19T20:06:56Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

katosh · 2026-05-19T22:16:15Z

Please ensure good texting coverage and include round trip testing for complex anndatas and Seurat objects.

Add testthat coverage that round-trips COMPLEX objects through the full conversion cycle and asserts structural + numeric fidelity, per the issue #18 ask. Three new files plus a shared fixture/helper: - helper-roundtrip.R: shared skip guard + seeded-RNG (reused from the existing suite), output-quieting + comparison utilities, and complex fixture builders for AnnData, SCE, and Seurat objects (multiple layers/assays, PCA+UMAP, varm, obsp/varp graphs, altExps, raw, sparse+dense matrices, and obs/var with factor/NA/logical/character). - test-roundtrip_complex_anndata.R: AnnData -> SCE -> AnnData (faithful, through on-disk .h5ad) and AnnData -> Seurat -> SCE -> AnnData; asserts X/layers/obsm/obs/var fidelity and obsp connectivity. - test-roundtrip_complex_sce.R: SCE -> AnnData -> SCE; asserts assays, reducedDims, colData/rowData, and colPair->obsp / altExp->uns carry. - test-roundtrip_complex_seurat.R: Seurat -> SCE -> AnnData -> Seurat; asserts counts, reductions, factor/NA metadata, and NN-graph connectivity. The converters are one-directional per component; the documented structural gaps (Seurat-mediated path drops extra layers and var; convert_anndata_to_sce has no obsp->colPairs; altExps/colPairs/varm/varp/raw not restored on the reverse leg) are written as skip()'d tests with TODO(#18) so they are recorded rather than silently missing. Test-only change: no edits to R/, NAMESPACE, man/, DESCRIPTION, or NEWS.md. Verified with R 4.4.1 + Seurat 5.4.0 + anndata: testthat 448 pass / 0 fail / 8 skip; R CMD check 0 errors / 0 warnings / 3 benign notes; package coverage 80.1% -> 82.7%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Huangy57 and others added 4 commits May 8, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bidirectional AnnData <-> Seurat/SCE conversion (v0.3.0)#4

Bidirectional AnnData <-> Seurat/SCE conversion (v0.3.0)#4
Huangy57 wants to merge 5 commits into
mainfrom
yhuang2/combine-convert2seurat-dev

Huangy57 commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026

Uh oh!

katosh commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Huangy57 commented May 19, 2026

Summary

Validation

Known limitations

Test plan

Uh oh!

codecov Bot commented May 19, 2026

Welcome to Codecov 🎉

Uh oh!

katosh commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants