v0.2.0 follow-up: extract MHD helper, snake_case rename, lift knobs (API-breaking) by katosh · Pull Request #7 · settylab/scEcho

katosh · 2026-05-19T03:06:17Z

Summary

Implements PR #4's deferred Tier A items + drops the redundant sqrt-guard +1e-16 per operator comment 4483947672.

This PR stacks on top of #4, not amends it. Base branch is dominik/v0.1.0-polish; will need to be retargeted to main after #4 merges (no rebase required — fast-forward only).

⚠️ API-breaking changes

Users importing from the renamed modules must update their import paths:

from scEcho import Echo_states → from scEcho import echo_states
from scEcho import Echo_features → from scEcho import echo_features
from scEcho.Echo_features import compute_ncells → from scEcho.echo_features import compute_ncells
scEcho.Echo_states.dn_comp_obsm(...) → scEcho.echo_states.dn_comp_obsm(...)
scEcho.Echo_features.run_echo_features(...) → scEcho.echo_features.run_echo_features(...)

Combined with PR #4's try_models / test_components removal, this constitutes the v0.2.0 API boundary. No backward-compat shims (operator-approved).

What changed

_compute_group_mhd_and_stats helper extracted (echo_features internals); ~50 LOC of duplication between get_desynch_stats and run_null_desynch_test eliminated. Determinism regression passes bit-identically. Future drift between the two paths now has only one site to update.
Echo_states.py → echo_states.py, Echo_features.py → echo_features.py (PEP 8 snake_case). Kept as two separate git mv commits so reviewers can see one rename at a time and git log --follow stays clean.
direction_colors kwarg added to dn_comp_obsm and run_null_desynch_test (default ("#ff7f0e", "#1f77b4", "lightgrey")). Order matches each function's CategoricalDtype ordering — see docstrings.
adjust_text arrowprops moved to module-level _DEFAULT_ADJUST_TEXT_KWARGS in plotting.py; callers can override any adjust_text key (including arrowprops) via plot_scores's existing **adjust_text_kwargs catch-all. The repulsion knobs (expand, force_text, force_points, max_move_frac, iter_lim) stay as explicit kwargs (already lifted in wave-1).
+ 1e-16 at echo_states.py:176 dropped. Instrumentation across the full pytest suite showed variance_model.min() = 0.0225 — ~2e14× above the epsilon, so the guard contributed nothing but noise. Operator confirms upstream pipeline guarantees positivity. Kompot's internal Cholesky stabilization (different concern) stays untouched.
notebooks/example.ipynb source cells use the snake_case import paths (no re-execution; cached D59-retina outputs preserved).
tests/ updated to match renamed modules.
CHANGELOG.md added at repo root covering both PR v0.1.0 polish from audit #4 and this PR's changes.

Verification

pytest --cov=scEcho --cov-report=term tests/
==================== 50 passed, 3 warnings in 84.77s ====================

src/scEcho/__init__.py            8      2    75%
src/scEcho/echo_features.py     232     39    83%
src/scEcho/echo_states.py        62      8    87%
src/scEcho/plotting.py          262     61    77%
src/scEcho/utils.py             222     48    78%
TOTAL                           786    158    80%

All 50 tests pass; coverage 80 % (unchanged from polish-branch baseline).
Determinism regression (tests/test_determinism.py::test_dn_comp_obsm_is_deterministic) passes — refactor is bit-identical.
New import paths work; old scEcho.Echo_states / scEcho.Echo_features paths correctly raise ModuleNotFoundError.
Notebook source cells use snake_case only (20 cells inspected); cached warning paths in outputs retain the historical filename (harmless).

Refs operator comment 4483947672 on #4.

katosh · 2026-05-19T03:37:08Z

@settylab-dotto-bot let's

not increment the version.
keep the + 1e-16 in the sqrt but make it an adjustable eps
skip the CHANGELOG.md but merge it into the v0.1.0-polish and update its PR description

The ``+1e-16`` inside ``np.sqrt(variance_model + 1e-16)`` was a hardcoded floor; lift it to a public kwarg so callers can override (raise it for extra-noisy variance, set it to 0 to disable). Default unchanged at 1e-16 — empirical floor (min observed 0.0225 across the test suite) makes the default a no-op, but the guard stays for robustness per operator decision on PR #7.

…t_desynch_stats and run_null_desynch_test Removes ~50 LOC of near-identical Mahalanobis-distance + obsp-key-fallback code between the two functions, plus the dead 'diff' assignment in the null path. The two call sites differ only in (a) which layer namespace the predicted-covariance keys live under (observed layer vs null layer) and (b) which layer namespace the precomputed LFC layer lives under (always the observed layer in current code paths). Helper signature: _compute_group_mhd_and_stats( ad, ind, layer, layer_for_lfc, embedding1, embedding2, diagonal_variance, ) - get_desynch_stats: layer=layer, layer_for_lfc=layer - run_null_desynch_test: layer=null_layer, layer_for_lfc=layer Determinism regression passes bit-identically. Future drift between the two paths now has only one site to update.

Three previously hardcoded knobs lifted so downstream callers can customize without monkey-patching: - ``dn_comp_obsm(..., direction_colors=("#ff7f0e", "#1f77b4", "lightgrey"))`` — sequence written to ``ad.uns[f"{direction_key}_colors"]``, in the order ``(modality2-higher, modality1-higher, neutral)`` matching the CategoricalDtype. - ``run_null_desynch_test(..., direction_colors=(...))`` — sibling lift, same default, same order matching its own CategoricalDtype ``(modality2-structure, modality1-structure, not-significant)``. - ``plot_scores`` — module-level ``_DEFAULT_ADJUST_TEXT_KWARGS`` with the previously hardcoded ``arrowprops`` style; callers can override any key (or add new ones) via the existing ``**adjust_text_kwargs`` catch-all. The repulsion knobs (``expand``, ``force_text``, ``force_points``, ``max_move_frac``, ``iter_lim``) stay as explicit kwargs — already lifted in wave-1.

API-breaking. No backward-compat shim (operator-approved). Users importing the module qualifier or referencing it as ``scEcho.Echo_states.<name>`` must update their import paths: from scEcho import Echo_states -> from scEcho import echo_states scEcho.Echo_states.dn_comp_obsm(...) -> scEcho.echo_states.dn_comp_obsm(...) Symbols inside the module (``dn_comp_obsm``) are unchanged. Notebook source cells updated in-place (no re-run; cached D59-retina outputs preserved); cached warning paths still reference the old filename and that is fine. Echo_features renamed in the next commit; bundling both into one commit would obscure git's rename detection.

API-breaking. No backward-compat shim (operator-approved). Users importing the module qualifier or referencing it as ``scEcho.Echo_features.<name>`` must update their import paths: from scEcho import Echo_features -> from scEcho import echo_features scEcho.Echo_features.run_echo_features(...) -> scEcho.echo_features.run_echo_features(...) from scEcho.Echo_features import compute_ncells -> from scEcho.echo_features import compute_ncells Symbols inside the module (``embeddings_predict_layer``, ``get_desynch_stats``, ``run_null_desynch_test``, ``run_echo_features``, ``make_null_layer``, ``get_reconstruction_results``, ``compute_ncells``) are unchanged. Notebook source cells updated in-place; cached RuntimeWarning paths in outputs still reference the old filename and that is harmless. Internal cross-module import in ``utils.py`` updated as part of this commit (``from .Echo_features import embeddings_predict_layer`` → ``from .echo_features import ...``). Combined with the Echo_states rename in the previous commit and the try_models / test_components absorption from PR #4's structure stream, this completes the v0.2.0 API boundary.

katosh · 2026-05-19T03:45:58Z

Folded into PR #4 per operator request (comment 4484179757) — the Tier A commits are now part of the v0.1.0 polish branch. See PR #4's body for the Tier A subsection. Closing markers: 70debbb is the HEAD of both dominik/v0.1.0-polish and dominik/v0.2.0-tier-a; GitHub auto-marked this PR "merged" when the head became reachable from the base.

* fix(plotting): return fig from plot_scores interactive path * docs(README): rewrite Usage, Modules, AnnData inputs, Data, install sections * fix(Echo_states): ravel SampleVarianceEstimator.predict(diag=True) output Kompot's SampleVarianceEstimator.predict(diag=True) returns shape (n_cells, 1) for density estimators (see kompot/differential/sample_variance_estimator.py:296). Assigning the un-raveled (n_cells, 1) array into ad.obs produces object-dtype nested values that the downstream variance accumulator at Echo_states.py:151-154 cannot sum elementwise. Ravel to (n_cells,) so the obs column is plain numeric. * fix(Echo_features): drop no-op +1e-16 regularization; Kompot handles Cholesky stabilization Both compute_mahalanobis_distances call sites (Echo_features.py:284,633) added 1e-16 to the covariance before passing in. Kompot's compute_mahalanobis_distances already stabilizes via `cov + jnp.eye(...) * eps` with eps=1e-8 by default (kompot/utils.py:98, 329). The smaller 1e-16 is dominated and so a true no-op. Remove and document that Kompot owns the stabilization. * perf(Echo_features): hoist .toarray() out of group loops get_desynch_stats already materialized `arr = ad.layers[layer].toarray()` once before the per-group loop for the total-variance computation; the per-group variance branch then re-densified `ad[ind].layers[layer]` on every iteration. Reuse the hoisted `arr` and slice by boolean mask. run_null_desynch_test had the same pattern on `ad.layers[null_layer]`; materialize once before the per-group loop and slice by mask. Avoids per-iteration AnnData view → toarray densification, which at real data scale (50k cells × 20k features × N groups) is multi-GB per loop. * fix(utils): column-wise obs assignment preserves categorical dtypes run_and_store_pr_res rebuilt ad.obs via ad.obs = pd.concat((ad.obs, pr_res.branch_probs, masks), axis=1) which routes through AnnData's whole-row obs setter — that setter has historically silently dropped existing categorical dtypes on re-assignment. Assign new branch-probability and mask columns one at a time so AnnData treats them as ordinary new columns and leaves existing columns alone. * fix(Echo_features): avoid view .obsp access in masked-Mahalanobis paths Both get_desynch_stats and run_null_desynch_test fetched the per-group uncertainty covariances via `ad[ind].obsp[key]`. AnnData ≥0.8 issues an ImplicitModificationWarning on boolean-masked-view .obsp access, and the view materialization is wasted work since the access is read-only. Index the underlying obsp ndarray directly with np.ix_(ind, ind) — same numerical result, no view, no warning. * feat(Echo_states): default optimizer to Mellon's L-BFGS-B; document ls_factor override dn_comp_obsm previously forced optimizer='advi' silently overriding Mellon's DEFAULT_OPTIMIZER ('L-BFGS-B'); ADVI is much slower on the typical scEcho workload. Change the default to None and only pass the kwarg through when explicitly set, preserving the API for callers who opt into ADVI. Document the rationale for ls_factor=2 (Mellon default is 1) and note that fit_predict + uncertainty are separate calls in Mellon 1.7.1 (no combined idiom). * feat(Echo_features): add save_covariance/save_predictions opt-outs; document ls_factor override embeddings_predict_layer writes an (n_cells, n_cells) float64 posterior covariance into ad.obsp per modality per layer — ~800 MB per entry at 10k cells, scaling quadratically. Add save_covariance/save_predictions kwargs (default True) to skip those writes, lifted into run_echo_features and run_null_desynch_test so the null layer call respects the same opt-out. Downstream get_desynch_stats and run_null_desynch_test now check for the covariance keys before computing Mahalanobis distance, mirroring the existing smoothed- residual fallback (warn + NaN if missing). Document ls_factor=10 (Mellon default is 1) rationale at both call sites. * refactor: replace assert validation with raise across src/scEcho assert statements are stripped under python -O, leaving the package unable to surface user-facing validation errors. Converted 31 sites across Echo_features, plotting, test_components, try_models, and utils to explicit raise statements — KeyError for missing AnnData columns/layers/obsm/varm/uns keys and dict lookups, ValueError for bad numeric ranges, bad string formats, and missing required kwargs. Original messages preserved verbatim. * fix: avoid mutable-default-arg footguns in try_models / Echo_features / plotting try_models.name_dict, get_desynch_stats.extra_ncells_layers, and linked_plot.line_mask all carried mutable defaults (dict / list) that risk silent state leakage across calls — try_models in particular mutated the default dict in place at the per-embedding name-resolution step. Switched all three to None and initialize fresh inside the function body. * chore: pin minimum versions for mellon/kompot/scanpy/anndata/palantir/h5py Per audit recommendation: unpinned deps risk silent breakage from upstream releases. Pins use ranges with known wheel coverage (h5py>=3.10 avoids the 3.16 sdist trap). * chore: dynamic __version__, __all__, type hints, and logging on public API - __version__ via importlib.metadata.version, with a "0.0.0+local" fallback so an unpackaged checkout still imports. - Add __all__ to each public-API module (Echo_states, Echo_features, plotting, try_models, utils, test_components) listing the audit-confirmed public entry points; internals (e.g. make_lines_df, make_points_df, compute_ncells) stay un-renamed but unlisted. - Type hints on the public surface — parameter types + return annotations, using `from __future__ import annotations` to keep runtime cost zero on the existing >=3.9 floor. - Replace library print() in Echo_states.dn_comp_obsm and try_models with module-level loggers; run_echo_features's verbose=True/False now gates info vs debug instead of stdout. - Add ruff config (E/F/I, line 100, py311 target) and ruff to dev extras. * test: add pytest smoke + determinism suite under tests/ Six smoke tests covering each major public entry point (dn_comp_obsm, run_echo_features, plot_scores both paths, try_models, plot_SE, utils) plus a determinism regression on dn_comp_obsm guarded with the slow marker. Fixture under tests/conftest.py builds a 100-cell synthetic AnnData with the obsm/obs layout scEcho expects; matplotlib uses Agg backend so the plotting tests run headless. pyproject [tool.pytest] scopes discovery to tests/ so src/scEcho/test_components.py (the diffusion-component sweep utility, not a pytest suite) isn't collected. * ci: add GitHub Actions workflow running pytest on push Single-job workflow on push + pull_request, Python 3.11 matrix slot, uses uv with --only-binary=:all: to dodge the h5py-sdist install trap. Skips the slow-marked determinism test in CI (run it locally before release). ruff runs --exit-zero for now — pre-existing F841 dead-code locals would otherwise gate CI; ruff will be tightened once the structure stream cleans them up. * docs(Echo_states): correct fit_predict/uncertainty cache note The previous comment claimed there was no combined-call idiom, which sidestepped the actual concern. Empirical timing on n=300, d=6 shows predict.uncertainty after fit_predict is ~4 % of the fit cost for ADVI (107 ms vs 2678 ms) and ~15 % for L-BFGS-B (87 ms vs 570 ms); Mellon caches cov_func / Lp / L on the estimator, so the second call reuses them. The audit's "doubles the slow step" assertion was wrong. * feat(plotting): gate fig.show() on show=True kwarg in plot_scores * docs(README): minimal delta from main — keep only Modules list + install-snag note + dev install * docs(README): link example.ipynb for GitHub preview * ci: add coverage step + codecov upload conditional on CODECOV_TOKEN secret Replace the plain pytest step with pytest-cov producing coverage.xml + terminal summary. Codecov upload step runs only when CODECOV_TOKEN is set, so the workflow is a no-op for that step until the operator wires the secret on the public repo. Simplify `on:` to the `[push, pull_request]` shorthand (the prior `branches: ["**"]` form coincided with GitHub flagging the run as "workflow file issue" with zero jobs scheduled — the shorthand is the form documented as universal-trigger). Add pytest-cov to [project.optional-dependencies].dev so CI's `uv pip install -e ".[dev]"` pulls it. .gitignore picks up .coverage, coverage.xml, htmlcov/ so local runs don't leak artifacts. * chore: add .codecov.yml skeleton with sensible defaults Auto project target (tracks current coverage, warns on regression); 70% patch target with 5% threshold (new code expected at >=70% line coverage). Ignores tests/ and notebooks/ from coverage computation. Operator can tune once the repo flips public and codecov starts ingesting uploads. * test: correctness tests for public API surface (value-equality on seeded input) Consolidate wave-1 per-function smoke tests into per-module suites covering every entry in each module's __all__ plus the unexposed compute_ncells helper, and add value-equality assertions alongside smoke checks. Hardcoded expected arrays for numeric outputs use rtol=1e-4 (Mellon L-BFGS has minor numerical jitter across jax/jaxopt patch versions); dn_comp_obsm tests pass optimizer="L-BFGS-B" explicitly so determinism doesn't depend on Mellon ADVI's PRNG-key derivation staying stable upstream. Coverage on src/scEcho/ is 83% (50 tests; fast suite 47 tests in ~80s, slow suite 3 tests in ~61s). The two Echo_states dn_comp_obsm tests carry @pytest.mark.slow at module scope — each call is ~25s on the 100-cell fixture due to Mellon's L-BFGS + JAX compilation. The plotting tests that depend on dn_comp_obsm output via the adata_with_dn_comp fixture stay in the fast suite (the JAX compile cache amortizes across them). read_test_results, plot_model_heatmap, sweep_diffusion_components, collect_sweep_residual_means, make_null_layer, get_reconstruction_results, rotate_coords (with rotation/flip/translation correctness), regress_embedding (zero-Pearson-residual property), and calc_corr (hardcoded Spearman values) all gain correctness coverage. run_and_store_pr_res is skipped — palantir.core.run_palantir doesn't run on a 100-cell Poisson fixture without the palantir preprocessor stack; brittle to mock. * docs(README): drop defensive parentheticals; section is now self-explanatory * test: align exception-class expectations with validation-stream raises Wave-1 used `pytest.raises(AssertionError)` for the input-validation paths because the codebase asserted. The validation stream's c1faa59 converted those 31 sites to typed raises; updating the five tests that pinned the old class: - try_models.try_models invalid layer → KeyError - try_models.plot_model_heatmap bad embedding → ValueError - test_components.sweep_diffusion_components bad min_components → ValueError - plotting.plot_direction_fractions missing obs_col → KeyError - Echo_features.get_reconstruction_results missing group → KeyError The 5th (get_reconstruction_results) is also an exception-class swap, not a column-matching issue — the for-loop substring check on the happy-path output holds for all 20 returned columns; only the missing-group sentinel changed class. Verified against origin/dominik/v0.1.0-polish HEAD (69ec381): 50 passed, 3 warnings in 114s. Coverage 80% (slight dip from wave-2's 83% — polish added ~58 LOC across Echo_features/plotting; absolute lines covered grew, denominator grew faster). * notebook: polish example.ipynb — strip parallelization, notebook tqdm, end-to-end run on D59 retina dataset Operator directive: settylab/scEcho PR #4 comment 4483735623. Changes - Drop 14 abandoned-scaffolding empty cells (audit finding). - Merge `# Echo states` / `# Echo Features` header cells with their bodies so each section is one markdown unit. - Remove `threadpoolctl.threadpool_limits(limits=4)` wrapper around `run_echo_features` — operator wants a clean single-process reference (BLAS thread autodetect is intra-library, not "additional parallelization"). - Drop unused `import scanpy as sc`. - Add `warnings.filterwarnings` for two noisy informational warnings from transitive deps (jaxopt deprecation, joblib loky CPU-count probe). - Fix mis-labelled markdown ("cyNpre" → "MPC") to match the code path. - Normalize `kernelspec.name` to `python3` so the notebook is portable across environments. Progress rendering - `scEcho.Echo_features` / `Echo_states` already import `tqdm.auto`, which picks up `tqdm.notebook` widgets in Jupyter; widget-state metadata is preserved so GitHub's notebook viewer renders the bars in their final state. Execution - Ran end-to-end on `D59_retina.h5ad` (5412 cells × 23519 features, 8 combo_type groups). Wall clock ~10:15. Zero errors. - `MPC` group yields 2321 genes called "associated with RNA structure" (>30 cells), fed into kompot's StringDB enrichment in the final cell. Notebook source-tree footprint - Data file is operator-placed at `work/echo-audit/D59_retina.h5ad`. Symlinked in as `notebooks/D59_retina.h5ad` (path already in .gitignore); not committed. * refactor: absorb try_models and test_components into utils Move all three public functions from try_models.py (try_models, read_test_results, plot_model_heatmap) and both from test_components.py (sweep_diffusion_components, collect_sweep_residual_means) into utils.py and delete the now-empty source files. Eliminates the pytest-collision module name (test_components) and collapses two near-singleton modules into the existing utility surface. No backward-compat shims — operator accepted the API break pre-release. - src/scEcho/utils.py: extend imports (logging, Sequence, mellon, sns, Axes, scipy.sparse, tqdm, Echo_features.embeddings_predict_layer) and __all__; append the five moved functions verbatim at the end. - src/scEcho/__init__.py: drop try_models and test_components from the module re-exports. - src/scEcho/try_models.py, src/scEcho/test_components.py: removed. - README.md: collapse the try_models and test_components bullets into a single utils bullet that describes the broadened surface. * chore(tests): fold test_try_models + test_test_components into test_utils Tests now live alongside the rest of utils coverage. Imports switched from scEcho.try_models.* / scEcho.test_components.* to scEcho.utils.*, matching the source merge in the previous commit. All 50 tests still pass; coverage 80% (down ~3pts from the wave-2 baseline because the moved try_models source has a couple of branches that the existing fixtures don't exercise — not a regression in test behavior, just more denominator). * feat(Echo_states): expose sqrt_eps kwarg on dn_comp_obsm (default 1e-16) The ``+1e-16`` inside ``np.sqrt(variance_model + 1e-16)`` was a hardcoded floor; lift it to a public kwarg so callers can override (raise it for extra-noisy variance, set it to 0 to disable). Default unchanged at 1e-16 — empirical floor (min observed 0.0225 across the test suite) makes the default a no-op, but the guard stays for robustness per operator decision on PR #7. * refactor(Echo_features): extract _compute_group_mhd_and_stats from get_desynch_stats and run_null_desynch_test Removes ~50 LOC of near-identical Mahalanobis-distance + obsp-key-fallback code between the two functions, plus the dead 'diff' assignment in the null path. The two call sites differ only in (a) which layer namespace the predicted-covariance keys live under (observed layer vs null layer) and (b) which layer namespace the precomputed LFC layer lives under (always the observed layer in current code paths). Helper signature: _compute_group_mhd_and_stats( ad, ind, layer, layer_for_lfc, embedding1, embedding2, diagonal_variance, ) - get_desynch_stats: layer=layer, layer_for_lfc=layer - run_null_desynch_test: layer=null_layer, layer_for_lfc=layer Determinism regression passes bit-identically. Future drift between the two paths now has only one site to update. * feat: lift direction_colors and adjust_text arrowprops to public surface Three previously hardcoded knobs lifted so downstream callers can customize without monkey-patching: - ``dn_comp_obsm(..., direction_colors=("#ff7f0e", "#1f77b4", "lightgrey"))`` — sequence written to ``ad.uns[f"{direction_key}_colors"]``, in the order ``(modality2-higher, modality1-higher, neutral)`` matching the CategoricalDtype. - ``run_null_desynch_test(..., direction_colors=(...))`` — sibling lift, same default, same order matching its own CategoricalDtype ``(modality2-structure, modality1-structure, not-significant)``. - ``plot_scores`` — module-level ``_DEFAULT_ADJUST_TEXT_KWARGS`` with the previously hardcoded ``arrowprops`` style; callers can override any key (or add new ones) via the existing ``**adjust_text_kwargs`` catch-all. The repulsion knobs (``expand``, ``force_text``, ``force_points``, ``max_move_frac``, ``iter_lim``) stay as explicit kwargs — already lifted in wave-1. * refactor: rename Echo_states.py to echo_states.py (PEP 8 snake_case) API-breaking. No backward-compat shim (operator-approved). Users importing the module qualifier or referencing it as ``scEcho.Echo_states.<name>`` must update their import paths: from scEcho import Echo_states -> from scEcho import echo_states scEcho.Echo_states.dn_comp_obsm(...) -> scEcho.echo_states.dn_comp_obsm(...) Symbols inside the module (``dn_comp_obsm``) are unchanged. Notebook source cells updated in-place (no re-run; cached D59-retina outputs preserved); cached warning paths still reference the old filename and that is fine. Echo_features renamed in the next commit; bundling both into one commit would obscure git's rename detection. * refactor: rename Echo_features.py to echo_features.py (PEP 8 snake_case) API-breaking. No backward-compat shim (operator-approved). Users importing the module qualifier or referencing it as ``scEcho.Echo_features.<name>`` must update their import paths: from scEcho import Echo_features -> from scEcho import echo_features scEcho.Echo_features.run_echo_features(...) -> scEcho.echo_features.run_echo_features(...) from scEcho.Echo_features import compute_ncells -> from scEcho.echo_features import compute_ncells Symbols inside the module (``embeddings_predict_layer``, ``get_desynch_stats``, ``run_null_desynch_test``, ``run_echo_features``, ``make_null_layer``, ``get_reconstruction_results``, ``compute_ncells``) are unchanged. Notebook source cells updated in-place; cached RuntimeWarning paths in outputs still reference the old filename and that is harmless. Internal cross-module import in ``utils.py`` updated as part of this commit (``from .Echo_features import embeddings_predict_layer`` → ``from .echo_features import ...``). Combined with the Echo_states rename in the previous commit and the try_models / test_components absorption from PR #4's structure stream, this completes the v0.2.0 API boundary. * fix(Echo_features): silence intentional divide-by-zero in null distribution Wrap `mse_null_diff / null_var` (run_null_desynch_test, ~L600) in np.errstate(divide="ignore", invalid="ignore"). Zero-variance features produce inf / NaN here by design, and the immediately-following `expressed_mask = null_var > 0` filters them out before any statistic that would propagate the NaN. The RuntimeWarnings were noise about an edge case the algorithm already handles correctly; the per-group UserWarning "N features have zero variance in group G ..." remains the user-facing channel for that information. Surfaced by polish PR #5's executed notebook outputs (operator pointed at the 14 RuntimeWarning blocks in cell 13). Found via inspection, fixed at source rather than suppressed at the notebook level — the fix is unambiguous and would help anyone running run_null_desynch_test outside the notebook too. * refactor(echo_features): drop per-group "N features have zero variance" UserWarning Per operator pushback on PR #6 follow-up (comment 4484264635). The ``expressed_mask = null_var > 0`` filter on the next line already drops the zero-variance features silently; the per-group warning was noise the user does not need to act on. The count is recoverable by selecting ``var_explained_diff_*_null_*`` columns and counting non-finite entries if downstream code wants it. Pairs with the np.errstate guard one commit prior: together they make ``run_null_desynch_test`` quiet about zero-variance features, matching how the algorithm treats them at every other step. pytest 50/50 in scEcho-notebook worktree. * notebook(example): nbqa-black + snake_case module refs + rerun on D59 Per operator review on PR #6 (comment 4484264635): rebase on the current dominik/v0.1.0-polish tip and re-execute. Source-side changes (np.errstate guard in run_null_desynch_test, drop of the per-group zero-variance UserWarning) landed directly on dominik/v0.1.0-polish in c0d9d23 + f948697 — this commit is notebook-only. Notebook delta vs. the pre-rebase state - nbqa black formatting (target py311). Each cell's multi-arg calls wrap one-per-line, decimal literals zero-prefixed, trailing commas per black convention. No semantic changes. - Module references updated to the Tier A snake_case names: ``scEcho.Echo_states.*`` → ``scEcho.echo_states.*`` ``scEcho.Echo_features.*`` → ``scEcho.echo_features.*`` - Markdown / structural polish carried forward from PR #5: 36→20 cells, merged section headers, suppressed jaxopt + loky transitive warnings, cyNpre→MPC markdown fix, kernelspec normalised to python3. End-to-end run on D59_retina.h5ad (5412 cells × 23519 features, 8 combo_type groups). Wall clock 9:11, zero errors, zero RuntimeWarning blocks, zero zero-variance UserWarning blocks (both surfaces now silent thanks to the source-side fixes on polish), 220 widget states preserved. * notebook: revert example.ipynb to main version — landing via #8 PR #4 carried ~7.4k lines of notebook diff alongside the source/API changes. Notebook is being landed standalone in #8 (off main, same polish-version content) so this PR's diff stays focused on the source surface. Three-way merge handles either merge order: - If #8 lands first: main has polish notebook; polish branch has main-original notebook in this commit; merging polish makes no notebook change because polish's tip-state == base-state == polish made no net change. #8's notebook is preserved on main. - If #4 lands first: main has main-original notebook (this commit's state); then #8 merges to update notebook. Either way, the notebook ends up correctly on main and #4's diff is clean.

katosh mentioned this pull request May 19, 2026

v0.1.0 polish from audit #4

Merged

katosh added 5 commits May 18, 2026 20:42

katosh force-pushed the dominik/v0.2.0-tier-a branch from f0ab273 to 70debbb Compare May 19, 2026 03:45

katosh merged commit 70debbb into dominik/v0.1.0-polish May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 follow-up: extract MHD helper, snake_case rename, lift knobs (API-breaking)#7

v0.2.0 follow-up: extract MHD helper, snake_case rename, lift knobs (API-breaking)#7
katosh merged 5 commits into
dominik/v0.1.0-polishfrom
dominik/v0.2.0-tier-a

katosh commented May 19, 2026

Uh oh!

katosh commented May 19, 2026

Uh oh!

katosh commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

katosh commented May 19, 2026

Summary

⚠️ API-breaking changes

What changed

Verification

Uh oh!

katosh commented May 19, 2026

Uh oh!

katosh commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant