v0.2.0 follow-up: extract MHD helper, snake_case rename, lift knobs (API-breaking)#7
Merged
Merged
Conversation
Merged
Contributor
Author
|
@settylab-dotto-bot let's
|
The ``+1e-16`` inside ``np.sqrt(variance_model + 1e-16)`` was a hardcoded floor; lift it to a public kwarg so callers can override (raise it for extra-noisy variance, set it to 0 to disable). Default unchanged at 1e-16 — empirical floor (min observed 0.0225 across the test suite) makes the default a no-op, but the guard stays for robustness per operator decision on PR #7.
…t_desynch_stats and run_null_desynch_test
Removes ~50 LOC of near-identical Mahalanobis-distance + obsp-key-fallback
code between the two functions, plus the dead 'diff' assignment in the
null path. The two call sites differ only in (a) which layer namespace
the predicted-covariance keys live under (observed layer vs null layer)
and (b) which layer namespace the precomputed LFC layer lives under
(always the observed layer in current code paths).
Helper signature:
_compute_group_mhd_and_stats(
ad, ind, layer, layer_for_lfc, embedding1, embedding2,
diagonal_variance,
)
- get_desynch_stats: layer=layer, layer_for_lfc=layer
- run_null_desynch_test: layer=null_layer, layer_for_lfc=layer
Determinism regression passes bit-identically. Future drift between the
two paths now has only one site to update.
Three previously hardcoded knobs lifted so downstream callers can
customize without monkey-patching:
- ``dn_comp_obsm(..., direction_colors=("#ff7f0e", "#1f77b4", "lightgrey"))``
— sequence written to ``ad.uns[f"{direction_key}_colors"]``, in the
order ``(modality2-higher, modality1-higher, neutral)`` matching the
CategoricalDtype.
- ``run_null_desynch_test(..., direction_colors=(...))`` — sibling lift,
same default, same order matching its own CategoricalDtype
``(modality2-structure, modality1-structure, not-significant)``.
- ``plot_scores`` — module-level ``_DEFAULT_ADJUST_TEXT_KWARGS`` with
the previously hardcoded ``arrowprops`` style; callers can override
any key (or add new ones) via the existing ``**adjust_text_kwargs``
catch-all. The repulsion knobs (``expand``, ``force_text``,
``force_points``, ``max_move_frac``, ``iter_lim``) stay as explicit
kwargs — already lifted in wave-1.
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_states.<name>`` must update their import paths:
from scEcho import Echo_states -> from scEcho import echo_states
scEcho.Echo_states.dn_comp_obsm(...) -> scEcho.echo_states.dn_comp_obsm(...)
Symbols inside the module (``dn_comp_obsm``) are unchanged. Notebook
source cells updated in-place (no re-run; cached D59-retina outputs
preserved); cached warning paths still reference the old filename and
that is fine.
Echo_features renamed in the next commit; bundling both into one
commit would obscure git's rename detection.
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_features.<name>`` must update their import paths:
from scEcho import Echo_features
-> from scEcho import echo_features
scEcho.Echo_features.run_echo_features(...)
-> scEcho.echo_features.run_echo_features(...)
from scEcho.Echo_features import compute_ncells
-> from scEcho.echo_features import compute_ncells
Symbols inside the module (``embeddings_predict_layer``,
``get_desynch_stats``, ``run_null_desynch_test``, ``run_echo_features``,
``make_null_layer``, ``get_reconstruction_results``, ``compute_ncells``)
are unchanged. Notebook source cells updated in-place; cached
RuntimeWarning paths in outputs still reference the old filename and
that is harmless.
Internal cross-module import in ``utils.py`` updated as part of this
commit (``from .Echo_features import embeddings_predict_layer``
→ ``from .echo_features import ...``).
Combined with the Echo_states rename in the previous commit and the
try_models / test_components absorption from PR #4's structure stream,
this completes the v0.2.0 API boundary.
f0ab273 to
70debbb
Compare
Contributor
Author
|
Folded into PR #4 per operator request (comment 4484179757) — the Tier A commits are now part of the v0.1.0 polish branch. See PR #4's body for the Tier A subsection. Closing markers: |
Connorr0
pushed a commit
that referenced
this pull request
May 21, 2026
* fix(plotting): return fig from plot_scores interactive path
* docs(README): rewrite Usage, Modules, AnnData inputs, Data, install sections
* fix(Echo_states): ravel SampleVarianceEstimator.predict(diag=True) output
Kompot's SampleVarianceEstimator.predict(diag=True) returns shape
(n_cells, 1) for density estimators (see kompot/differential/sample_variance_estimator.py:296).
Assigning the un-raveled (n_cells, 1) array into ad.obs produces
object-dtype nested values that the downstream variance accumulator
at Echo_states.py:151-154 cannot sum elementwise. Ravel to (n_cells,)
so the obs column is plain numeric.
* fix(Echo_features): drop no-op +1e-16 regularization; Kompot handles Cholesky stabilization
Both compute_mahalanobis_distances call sites (Echo_features.py:284,633)
added 1e-16 to the covariance before passing in. Kompot's
compute_mahalanobis_distances already stabilizes via
`cov + jnp.eye(...) * eps` with eps=1e-8 by default
(kompot/utils.py:98, 329). The smaller 1e-16 is dominated and so a
true no-op. Remove and document that Kompot owns the stabilization.
* perf(Echo_features): hoist .toarray() out of group loops
get_desynch_stats already materialized `arr = ad.layers[layer].toarray()`
once before the per-group loop for the total-variance computation; the
per-group variance branch then re-densified `ad[ind].layers[layer]` on
every iteration. Reuse the hoisted `arr` and slice by boolean mask.
run_null_desynch_test had the same pattern on `ad.layers[null_layer]`;
materialize once before the per-group loop and slice by mask.
Avoids per-iteration AnnData view → toarray densification, which at real
data scale (50k cells × 20k features × N groups) is multi-GB per loop.
* fix(utils): column-wise obs assignment preserves categorical dtypes
run_and_store_pr_res rebuilt ad.obs via
ad.obs = pd.concat((ad.obs, pr_res.branch_probs, masks), axis=1)
which routes through AnnData's whole-row obs setter — that setter has
historically silently dropped existing categorical dtypes on re-assignment.
Assign new branch-probability and mask columns one at a time so AnnData
treats them as ordinary new columns and leaves existing columns alone.
* fix(Echo_features): avoid view .obsp access in masked-Mahalanobis paths
Both get_desynch_stats and run_null_desynch_test fetched the per-group
uncertainty covariances via `ad[ind].obsp[key]`. AnnData ≥0.8 issues an
ImplicitModificationWarning on boolean-masked-view .obsp access, and the
view materialization is wasted work since the access is read-only.
Index the underlying obsp ndarray directly with np.ix_(ind, ind) — same
numerical result, no view, no warning.
* feat(Echo_states): default optimizer to Mellon's L-BFGS-B; document ls_factor override
dn_comp_obsm previously forced optimizer='advi' silently overriding
Mellon's DEFAULT_OPTIMIZER ('L-BFGS-B'); ADVI is much slower on the
typical scEcho workload. Change the default to None and only pass the
kwarg through when explicitly set, preserving the API for callers who
opt into ADVI. Document the rationale for ls_factor=2 (Mellon default
is 1) and note that fit_predict + uncertainty are separate calls in
Mellon 1.7.1 (no combined idiom).
* feat(Echo_features): add save_covariance/save_predictions opt-outs; document ls_factor override
embeddings_predict_layer writes an (n_cells, n_cells) float64 posterior
covariance into ad.obsp per modality per layer — ~800 MB per entry at
10k cells, scaling quadratically. Add save_covariance/save_predictions
kwargs (default True) to skip those writes, lifted into
run_echo_features and run_null_desynch_test so the null layer call
respects the same opt-out. Downstream get_desynch_stats and
run_null_desynch_test now check for the covariance keys before
computing Mahalanobis distance, mirroring the existing smoothed-
residual fallback (warn + NaN if missing). Document ls_factor=10
(Mellon default is 1) rationale at both call sites.
* refactor: replace assert validation with raise across src/scEcho
assert statements are stripped under python -O, leaving the package
unable to surface user-facing validation errors. Converted 31 sites
across Echo_features, plotting, test_components, try_models, and
utils to explicit raise statements — KeyError for missing AnnData
columns/layers/obsm/varm/uns keys and dict lookups, ValueError for
bad numeric ranges, bad string formats, and missing required kwargs.
Original messages preserved verbatim.
* fix: avoid mutable-default-arg footguns in try_models / Echo_features / plotting
try_models.name_dict, get_desynch_stats.extra_ncells_layers, and
linked_plot.line_mask all carried mutable defaults (dict / list) that
risk silent state leakage across calls — try_models in particular
mutated the default dict in place at the per-embedding name-resolution
step. Switched all three to None and initialize fresh inside the
function body.
* chore: pin minimum versions for mellon/kompot/scanpy/anndata/palantir/h5py
Per audit recommendation: unpinned deps risk silent breakage from upstream
releases. Pins use ranges with known wheel coverage (h5py>=3.10 avoids the
3.16 sdist trap).
* chore: dynamic __version__, __all__, type hints, and logging on public API
- __version__ via importlib.metadata.version, with a "0.0.0+local" fallback
so an unpackaged checkout still imports.
- Add __all__ to each public-API module (Echo_states, Echo_features, plotting,
try_models, utils, test_components) listing the audit-confirmed public
entry points; internals (e.g. make_lines_df, make_points_df, compute_ncells)
stay un-renamed but unlisted.
- Type hints on the public surface — parameter types + return annotations,
using `from __future__ import annotations` to keep runtime cost zero on
the existing >=3.9 floor.
- Replace library print() in Echo_states.dn_comp_obsm and try_models with
module-level loggers; run_echo_features's verbose=True/False now gates
info vs debug instead of stdout.
- Add ruff config (E/F/I, line 100, py311 target) and ruff to dev extras.
* test: add pytest smoke + determinism suite under tests/
Six smoke tests covering each major public entry point (dn_comp_obsm,
run_echo_features, plot_scores both paths, try_models, plot_SE, utils)
plus a determinism regression on dn_comp_obsm guarded with the slow
marker. Fixture under tests/conftest.py builds a 100-cell synthetic
AnnData with the obsm/obs layout scEcho expects; matplotlib uses Agg
backend so the plotting tests run headless. pyproject [tool.pytest]
scopes discovery to tests/ so src/scEcho/test_components.py (the
diffusion-component sweep utility, not a pytest suite) isn't collected.
* ci: add GitHub Actions workflow running pytest on push
Single-job workflow on push + pull_request, Python 3.11 matrix slot,
uses uv with --only-binary=:all: to dodge the h5py-sdist install trap.
Skips the slow-marked determinism test in CI (run it locally before
release). ruff runs --exit-zero for now — pre-existing F841 dead-code
locals would otherwise gate CI; ruff will be tightened once the
structure stream cleans them up.
* docs(Echo_states): correct fit_predict/uncertainty cache note
The previous comment claimed there was no combined-call idiom, which
sidestepped the actual concern. Empirical timing on n=300, d=6 shows
predict.uncertainty after fit_predict is ~4 % of the fit cost for ADVI
(107 ms vs 2678 ms) and ~15 % for L-BFGS-B (87 ms vs 570 ms); Mellon
caches cov_func / Lp / L on the estimator, so the second call reuses
them. The audit's "doubles the slow step" assertion was wrong.
* feat(plotting): gate fig.show() on show=True kwarg in plot_scores
* docs(README): minimal delta from main — keep only Modules list + install-snag note + dev install
* docs(README): link example.ipynb for GitHub preview
* ci: add coverage step + codecov upload conditional on CODECOV_TOKEN secret
Replace the plain pytest step with pytest-cov producing coverage.xml +
terminal summary. Codecov upload step runs only when CODECOV_TOKEN is
set, so the workflow is a no-op for that step until the operator wires
the secret on the public repo. Simplify `on:` to the `[push, pull_request]`
shorthand (the prior `branches: ["**"]` form coincided with GitHub
flagging the run as "workflow file issue" with zero jobs scheduled —
the shorthand is the form documented as universal-trigger).
Add pytest-cov to [project.optional-dependencies].dev so CI's
`uv pip install -e ".[dev]"` pulls it. .gitignore picks up .coverage,
coverage.xml, htmlcov/ so local runs don't leak artifacts.
* chore: add .codecov.yml skeleton with sensible defaults
Auto project target (tracks current coverage, warns on regression);
70% patch target with 5% threshold (new code expected at >=70% line
coverage). Ignores tests/ and notebooks/ from coverage computation.
Operator can tune once the repo flips public and codecov starts
ingesting uploads.
* test: correctness tests for public API surface (value-equality on seeded input)
Consolidate wave-1 per-function smoke tests into per-module suites
covering every entry in each module's __all__ plus the unexposed
compute_ncells helper, and add value-equality assertions alongside
smoke checks. Hardcoded expected arrays for numeric outputs use
rtol=1e-4 (Mellon L-BFGS has minor numerical jitter across jax/jaxopt
patch versions); dn_comp_obsm tests pass optimizer="L-BFGS-B"
explicitly so determinism doesn't depend on Mellon ADVI's PRNG-key
derivation staying stable upstream.
Coverage on src/scEcho/ is 83% (50 tests; fast suite 47 tests in ~80s,
slow suite 3 tests in ~61s).
The two Echo_states dn_comp_obsm tests carry @pytest.mark.slow at
module scope — each call is ~25s on the 100-cell fixture due to
Mellon's L-BFGS + JAX compilation. The plotting tests that depend on
dn_comp_obsm output via the adata_with_dn_comp fixture stay in the
fast suite (the JAX compile cache amortizes across them).
read_test_results, plot_model_heatmap, sweep_diffusion_components,
collect_sweep_residual_means, make_null_layer, get_reconstruction_results,
rotate_coords (with rotation/flip/translation correctness), regress_embedding
(zero-Pearson-residual property), and calc_corr (hardcoded Spearman
values) all gain correctness coverage. run_and_store_pr_res is skipped
— palantir.core.run_palantir doesn't run on a 100-cell Poisson fixture
without the palantir preprocessor stack; brittle to mock.
* docs(README): drop defensive parentheticals; section is now self-explanatory
* test: align exception-class expectations with validation-stream raises
Wave-1 used `pytest.raises(AssertionError)` for the input-validation paths
because the codebase asserted. The validation stream's c1faa59 converted
those 31 sites to typed raises; updating the five tests that pinned the
old class:
- try_models.try_models invalid layer → KeyError
- try_models.plot_model_heatmap bad embedding → ValueError
- test_components.sweep_diffusion_components bad min_components → ValueError
- plotting.plot_direction_fractions missing obs_col → KeyError
- Echo_features.get_reconstruction_results missing group → KeyError
The 5th (get_reconstruction_results) is also an exception-class swap, not
a column-matching issue — the for-loop substring check on the happy-path
output holds for all 20 returned columns; only the missing-group
sentinel changed class.
Verified against origin/dominik/v0.1.0-polish HEAD (69ec381):
50 passed, 3 warnings in 114s. Coverage 80% (slight dip from wave-2's
83% — polish added ~58 LOC across Echo_features/plotting; absolute lines
covered grew, denominator grew faster).
* notebook: polish example.ipynb — strip parallelization, notebook tqdm, end-to-end run on D59 retina dataset
Operator directive: settylab/scEcho PR #4 comment 4483735623.
Changes
- Drop 14 abandoned-scaffolding empty cells (audit finding).
- Merge `# Echo states` / `# Echo Features` header cells with their bodies
so each section is one markdown unit.
- Remove `threadpoolctl.threadpool_limits(limits=4)` wrapper around
`run_echo_features` — operator wants a clean single-process reference
(BLAS thread autodetect is intra-library, not "additional parallelization").
- Drop unused `import scanpy as sc`.
- Add `warnings.filterwarnings` for two noisy informational warnings from
transitive deps (jaxopt deprecation, joblib loky CPU-count probe).
- Fix mis-labelled markdown ("cyNpre" → "MPC") to match the code path.
- Normalize `kernelspec.name` to `python3` so the notebook is portable
across environments.
Progress rendering
- `scEcho.Echo_features` / `Echo_states` already import `tqdm.auto`, which
picks up `tqdm.notebook` widgets in Jupyter; widget-state metadata is
preserved so GitHub's notebook viewer renders the bars in their final state.
Execution
- Ran end-to-end on `D59_retina.h5ad` (5412 cells × 23519 features, 8
combo_type groups). Wall clock ~10:15. Zero errors.
- `MPC` group yields 2321 genes called "associated with RNA structure"
(>30 cells), fed into kompot's StringDB enrichment in the final cell.
Notebook source-tree footprint
- Data file is operator-placed at `work/echo-audit/D59_retina.h5ad`. Symlinked
in as `notebooks/D59_retina.h5ad` (path already in .gitignore); not
committed.
* refactor: absorb try_models and test_components into utils
Move all three public functions from try_models.py (try_models,
read_test_results, plot_model_heatmap) and both from test_components.py
(sweep_diffusion_components, collect_sweep_residual_means) into utils.py
and delete the now-empty source files.
Eliminates the pytest-collision module name (test_components) and
collapses two near-singleton modules into the existing utility surface.
No backward-compat shims — operator accepted the API break pre-release.
- src/scEcho/utils.py: extend imports (logging, Sequence, mellon, sns,
Axes, scipy.sparse, tqdm, Echo_features.embeddings_predict_layer) and
__all__; append the five moved functions verbatim at the end.
- src/scEcho/__init__.py: drop try_models and test_components from the
module re-exports.
- src/scEcho/try_models.py, src/scEcho/test_components.py: removed.
- README.md: collapse the try_models and test_components bullets into a
single utils bullet that describes the broadened surface.
* chore(tests): fold test_try_models + test_test_components into test_utils
Tests now live alongside the rest of utils coverage. Imports switched
from scEcho.try_models.* / scEcho.test_components.* to scEcho.utils.*,
matching the source merge in the previous commit.
All 50 tests still pass; coverage 80% (down ~3pts from the wave-2
baseline because the moved try_models source has a couple of branches
that the existing fixtures don't exercise — not a regression in test
behavior, just more denominator).
* feat(Echo_states): expose sqrt_eps kwarg on dn_comp_obsm (default 1e-16)
The ``+1e-16`` inside ``np.sqrt(variance_model + 1e-16)`` was a hardcoded
floor; lift it to a public kwarg so callers can override (raise it for
extra-noisy variance, set it to 0 to disable). Default unchanged at
1e-16 — empirical floor (min observed 0.0225 across the test suite)
makes the default a no-op, but the guard stays for robustness per
operator decision on PR #7.
* refactor(Echo_features): extract _compute_group_mhd_and_stats from get_desynch_stats and run_null_desynch_test
Removes ~50 LOC of near-identical Mahalanobis-distance + obsp-key-fallback
code between the two functions, plus the dead 'diff' assignment in the
null path. The two call sites differ only in (a) which layer namespace
the predicted-covariance keys live under (observed layer vs null layer)
and (b) which layer namespace the precomputed LFC layer lives under
(always the observed layer in current code paths).
Helper signature:
_compute_group_mhd_and_stats(
ad, ind, layer, layer_for_lfc, embedding1, embedding2,
diagonal_variance,
)
- get_desynch_stats: layer=layer, layer_for_lfc=layer
- run_null_desynch_test: layer=null_layer, layer_for_lfc=layer
Determinism regression passes bit-identically. Future drift between the
two paths now has only one site to update.
* feat: lift direction_colors and adjust_text arrowprops to public surface
Three previously hardcoded knobs lifted so downstream callers can
customize without monkey-patching:
- ``dn_comp_obsm(..., direction_colors=("#ff7f0e", "#1f77b4", "lightgrey"))``
— sequence written to ``ad.uns[f"{direction_key}_colors"]``, in the
order ``(modality2-higher, modality1-higher, neutral)`` matching the
CategoricalDtype.
- ``run_null_desynch_test(..., direction_colors=(...))`` — sibling lift,
same default, same order matching its own CategoricalDtype
``(modality2-structure, modality1-structure, not-significant)``.
- ``plot_scores`` — module-level ``_DEFAULT_ADJUST_TEXT_KWARGS`` with
the previously hardcoded ``arrowprops`` style; callers can override
any key (or add new ones) via the existing ``**adjust_text_kwargs``
catch-all. The repulsion knobs (``expand``, ``force_text``,
``force_points``, ``max_move_frac``, ``iter_lim``) stay as explicit
kwargs — already lifted in wave-1.
* refactor: rename Echo_states.py to echo_states.py (PEP 8 snake_case)
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_states.<name>`` must update their import paths:
from scEcho import Echo_states -> from scEcho import echo_states
scEcho.Echo_states.dn_comp_obsm(...) -> scEcho.echo_states.dn_comp_obsm(...)
Symbols inside the module (``dn_comp_obsm``) are unchanged. Notebook
source cells updated in-place (no re-run; cached D59-retina outputs
preserved); cached warning paths still reference the old filename and
that is fine.
Echo_features renamed in the next commit; bundling both into one
commit would obscure git's rename detection.
* refactor: rename Echo_features.py to echo_features.py (PEP 8 snake_case)
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_features.<name>`` must update their import paths:
from scEcho import Echo_features
-> from scEcho import echo_features
scEcho.Echo_features.run_echo_features(...)
-> scEcho.echo_features.run_echo_features(...)
from scEcho.Echo_features import compute_ncells
-> from scEcho.echo_features import compute_ncells
Symbols inside the module (``embeddings_predict_layer``,
``get_desynch_stats``, ``run_null_desynch_test``, ``run_echo_features``,
``make_null_layer``, ``get_reconstruction_results``, ``compute_ncells``)
are unchanged. Notebook source cells updated in-place; cached
RuntimeWarning paths in outputs still reference the old filename and
that is harmless.
Internal cross-module import in ``utils.py`` updated as part of this
commit (``from .Echo_features import embeddings_predict_layer``
→ ``from .echo_features import ...``).
Combined with the Echo_states rename in the previous commit and the
try_models / test_components absorption from PR #4's structure stream,
this completes the v0.2.0 API boundary.
* fix(Echo_features): silence intentional divide-by-zero in null distribution
Wrap `mse_null_diff / null_var` (run_null_desynch_test, ~L600) in
np.errstate(divide="ignore", invalid="ignore"). Zero-variance features
produce inf / NaN here by design, and the immediately-following
`expressed_mask = null_var > 0` filters them out before any statistic
that would propagate the NaN. The RuntimeWarnings were noise about an
edge case the algorithm already handles correctly; the per-group
UserWarning "N features have zero variance in group G ..." remains the
user-facing channel for that information.
Surfaced by polish PR #5's executed notebook outputs (operator pointed
at the 14 RuntimeWarning blocks in cell 13). Found via inspection,
fixed at source rather than suppressed at the notebook level — the
fix is unambiguous and would help anyone running run_null_desynch_test
outside the notebook too.
* refactor(echo_features): drop per-group "N features have zero variance" UserWarning
Per operator pushback on PR #6 follow-up (comment 4484264635). The
``expressed_mask = null_var > 0`` filter on the next line already drops
the zero-variance features silently; the per-group warning was noise
the user does not need to act on. The count is recoverable by selecting
``var_explained_diff_*_null_*`` columns and counting non-finite entries
if downstream code wants it.
Pairs with the np.errstate guard one commit prior: together they make
``run_null_desynch_test`` quiet about zero-variance features, matching
how the algorithm treats them at every other step.
pytest 50/50 in scEcho-notebook worktree.
* notebook(example): nbqa-black + snake_case module refs + rerun on D59
Per operator review on PR #6 (comment 4484264635): rebase on the
current dominik/v0.1.0-polish tip and re-execute. Source-side changes
(np.errstate guard in run_null_desynch_test, drop of the per-group
zero-variance UserWarning) landed directly on dominik/v0.1.0-polish
in c0d9d23 + f948697 — this commit is notebook-only.
Notebook delta vs. the pre-rebase state
- nbqa black formatting (target py311). Each cell's multi-arg calls
wrap one-per-line, decimal literals zero-prefixed, trailing commas
per black convention. No semantic changes.
- Module references updated to the Tier A snake_case names:
``scEcho.Echo_states.*`` → ``scEcho.echo_states.*``
``scEcho.Echo_features.*`` → ``scEcho.echo_features.*``
- Markdown / structural polish carried forward from PR #5: 36→20 cells,
merged section headers, suppressed jaxopt + loky transitive warnings,
cyNpre→MPC markdown fix, kernelspec normalised to python3.
End-to-end run on D59_retina.h5ad (5412 cells × 23519 features,
8 combo_type groups). Wall clock 9:11, zero errors, zero
RuntimeWarning blocks, zero zero-variance UserWarning blocks
(both surfaces now silent thanks to the source-side fixes on polish),
220 widget states preserved.
* notebook: revert example.ipynb to main version — landing via #8
PR #4 carried ~7.4k lines of notebook diff alongside the source/API
changes. Notebook is being landed standalone in #8 (off main, same
polish-version content) so this PR's diff stays focused on the source
surface. Three-way merge handles either merge order:
- If #8 lands first: main has polish notebook; polish branch has main-original
notebook in this commit; merging polish makes no notebook change because
polish's tip-state == base-state == polish made no net change. #8's notebook
is preserved on main.
- If #4 lands first: main has main-original notebook (this commit's state);
then #8 merges to update notebook.
Either way, the notebook ends up correctly on main and #4's diff is clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements PR #4's deferred Tier A items + drops the redundant sqrt-guard
+1e-16per operator comment 4483947672.This PR stacks on top of #4, not amends it. Base branch is
dominik/v0.1.0-polish; will need to be retargeted tomainafter #4 merges (no rebase required — fast-forward only).Users importing from the renamed modules must update their import paths:
from scEcho import Echo_states→from scEcho import echo_statesfrom scEcho import Echo_features→from scEcho import echo_featuresfrom scEcho.Echo_features import compute_ncells→from scEcho.echo_features import compute_ncellsscEcho.Echo_states.dn_comp_obsm(...)→scEcho.echo_states.dn_comp_obsm(...)scEcho.Echo_features.run_echo_features(...)→scEcho.echo_features.run_echo_features(...)Combined with PR #4's
try_models/test_componentsremoval, this constitutes the v0.2.0 API boundary. No backward-compat shims (operator-approved).What changed
_compute_group_mhd_and_statshelper extracted (echo_featuresinternals); ~50 LOC of duplication betweenget_desynch_statsandrun_null_desynch_testeliminated. Determinism regression passes bit-identically. Future drift between the two paths now has only one site to update.Echo_states.py→echo_states.py,Echo_features.py→echo_features.py(PEP 8 snake_case). Kept as two separategit mvcommits so reviewers can see one rename at a time andgit log --followstays clean.direction_colorskwarg added todn_comp_obsmandrun_null_desynch_test(default("#ff7f0e", "#1f77b4", "lightgrey")). Order matches each function'sCategoricalDtypeordering — see docstrings.adjust_textarrowpropsmoved to module-level_DEFAULT_ADJUST_TEXT_KWARGSinplotting.py; callers can override any adjust_text key (includingarrowprops) viaplot_scores's existing**adjust_text_kwargscatch-all. The repulsion knobs (expand,force_text,force_points,max_move_frac,iter_lim) stay as explicit kwargs (already lifted in wave-1).+ 1e-16atecho_states.py:176dropped. Instrumentation across the full pytest suite showedvariance_model.min() = 0.0225— ~2e14× above the epsilon, so the guard contributed nothing but noise. Operator confirms upstream pipeline guarantees positivity. Kompot's internal Cholesky stabilization (different concern) stays untouched.notebooks/example.ipynbsource cells use the snake_case import paths (no re-execution; cached D59-retina outputs preserved).tests/updated to match renamed modules.CHANGELOG.mdadded at repo root covering both PR v0.1.0 polish from audit #4 and this PR's changes.Verification
tests/test_determinism.py::test_dn_comp_obsm_is_deterministic) passes — refactor is bit-identical.scEcho.Echo_states/scEcho.Echo_featurespaths correctly raiseModuleNotFoundError.Refs operator comment 4483947672 on #4.