v0.1.0 polish from audit#4
Conversation
…tput Kompot's SampleVarianceEstimator.predict(diag=True) returns shape (n_cells, 1) for density estimators (see kompot/differential/sample_variance_estimator.py:296). Assigning the un-raveled (n_cells, 1) array into ad.obs produces object-dtype nested values that the downstream variance accumulator at Echo_states.py:151-154 cannot sum elementwise. Ravel to (n_cells,) so the obs column is plain numeric.
…Cholesky stabilization Both compute_mahalanobis_distances call sites (Echo_features.py:284,633) added 1e-16 to the covariance before passing in. Kompot's compute_mahalanobis_distances already stabilizes via `cov + jnp.eye(...) * eps` with eps=1e-8 by default (kompot/utils.py:98, 329). The smaller 1e-16 is dominated and so a true no-op. Remove and document that Kompot owns the stabilization.
…t; drop +1e-16 no-op
get_desynch_stats already materialized `arr = ad.layers[layer].toarray()` once before the per-group loop for the total-variance computation; the per-group variance branch then re-densified `ad[ind].layers[layer]` on every iteration. Reuse the hoisted `arr` and slice by boolean mask. run_null_desynch_test had the same pattern on `ad.layers[null_layer]`; materialize once before the per-group loop and slice by mask. Avoids per-iteration AnnData view → toarray densification, which at real data scale (50k cells × 20k features × N groups) is multi-GB per loop.
run_and_store_pr_res rebuilt ad.obs via ad.obs = pd.concat((ad.obs, pr_res.branch_probs, masks), axis=1) which routes through AnnData's whole-row obs setter — that setter has historically silently dropped existing categorical dtypes on re-assignment. Assign new branch-probability and mask columns one at a time so AnnData treats them as ordinary new columns and leaves existing columns alone.
Both get_desynch_stats and run_null_desynch_test fetched the per-group uncertainty covariances via `ad[ind].obsp[key]`. AnnData ≥0.8 issues an ImplicitModificationWarning on boolean-masked-view .obsp access, and the view materialization is wasted work since the access is read-only. Index the underlying obsp ndarray directly with np.ix_(ind, ind) — same numerical result, no view, no warning.
…s_factor override
dn_comp_obsm previously forced optimizer='advi' silently overriding
Mellon's DEFAULT_OPTIMIZER ('L-BFGS-B'); ADVI is much slower on the
typical scEcho workload. Change the default to None and only pass the
kwarg through when explicitly set, preserving the API for callers who
opt into ADVI. Document the rationale for ls_factor=2 (Mellon default
is 1) and note that fit_predict + uncertainty are separate calls in
Mellon 1.7.1 (no combined idiom).
…ocument ls_factor override embeddings_predict_layer writes an (n_cells, n_cells) float64 posterior covariance into ad.obsp per modality per layer — ~800 MB per entry at 10k cells, scaling quadratically. Add save_covariance/save_predictions kwargs (default True) to skip those writes, lifted into run_echo_features and run_null_desynch_test so the null layer call respects the same opt-out. Downstream get_desynch_stats and run_null_desynch_test now check for the covariance keys before computing Mahalanobis distance, mirroring the existing smoothed- residual fallback (warn + NaN if missing). Document ls_factor=10 (Mellon default is 1) rationale at both call sites.
assert statements are stripped under python -O, leaving the package unable to surface user-facing validation errors. Converted 31 sites across Echo_features, plotting, test_components, try_models, and utils to explicit raise statements — KeyError for missing AnnData columns/layers/obsm/varm/uns keys and dict lookups, ValueError for bad numeric ranges, bad string formats, and missing required kwargs. Original messages preserved verbatim.
… / plotting try_models.name_dict, get_desynch_stats.extra_ncells_layers, and linked_plot.line_mask all carried mutable defaults (dict / list) that risk silent state leakage across calls — try_models in particular mutated the default dict in place at the per-embedding name-resolution step. Switched all three to None and initialize fresh inside the function body.
…-wise obs, np.ix_ for masked obsp
…/h5py Per audit recommendation: unpinned deps risk silent breakage from upstream releases. Pins use ranges with known wheel coverage (h5py>=3.10 avoids the 3.16 sdist trap).
…c API - __version__ via importlib.metadata.version, with a "0.0.0+local" fallback so an unpackaged checkout still imports. - Add __all__ to each public-API module (Echo_states, Echo_features, plotting, try_models, utils, test_components) listing the audit-confirmed public entry points; internals (e.g. make_lines_df, make_points_df, compute_ncells) stay un-renamed but unlisted. - Type hints on the public surface — parameter types + return annotations, using `from __future__ import annotations` to keep runtime cost zero on the existing >=3.9 floor. - Replace library print() in Echo_states.dn_comp_obsm and try_models with module-level loggers; run_echo_features's verbose=True/False now gates info vs debug instead of stdout. - Add ruff config (E/F/I, line 100, py311 target) and ruff to dev extras.
Six smoke tests covering each major public entry point (dn_comp_obsm, run_echo_features, plot_scores both paths, try_models, plot_SE, utils) plus a determinism regression on dn_comp_obsm guarded with the slow marker. Fixture under tests/conftest.py builds a 100-cell synthetic AnnData with the obsm/obs layout scEcho expects; matplotlib uses Agg backend so the plotting tests run headless. pyproject [tool.pytest] scopes discovery to tests/ so src/scEcho/test_components.py (the diffusion-component sweep utility, not a pytest suite) isn't collected.
Single-job workflow on push + pull_request, Python 3.11 matrix slot, uses uv with --only-binary=:all: to dodge the h5py-sdist install trap. Skips the slow-marked determinism test in CI (run it locally before release). ruff runs --exit-zero for now — pre-existing F841 dead-code locals would otherwise gate CI; ruff will be tightened once the structure stream cleans them up.
The previous comment claimed there was no combined-call idiom, which sidestepped the actual concern. Empirical timing on n=300, d=6 shows predict.uncertainty after fit_predict is ~4 % of the fit cost for ADVI (107 ms vs 2678 ms) and ~15 % for L-BFGS-B (87 ms vs 570 ms); Mellon caches cov_func / Lp / L on the estimator, so the second call reuses them. The audit's "doubles the slow step" assertion was wrong.
…ariance/save_predictions opt-outs Conflicts in Echo_features.py resolved by combining: - mellon's if/else fallback wrapping for missing covariance keys - perf's np.ix_ direct ndarray indexing (avoids view warning) - kompot's removal of no-op + 1e-16 Cholesky stabilization
…le-default-arg fixes
… per operator pushback
… __all__/type hints/logging, pytest suite + CI
Conflicts in Echo_features.py, Echo_states.py, plotting.py, try_models.py
resolved by combining:
- infra's type hints + __all__ + logging
- validation's None defaults for mutable-default fixes (extra_ncells_layers,
line_mask, name_dict typed as Optional)
- mellon's optimizer=None default + save_predictions/save_covariance kwargs
Verified: 9/9 pytest tests pass on the merged tree.
|
@settylab-dotto-bot
|
|
@settylab-dotto-bot also polish the example notebook. Improve the syntax, and avoid using additional parallelization. Also make sure the output (e.g., the progress) is rendered nicely (e.g., make sure the notebook tqdm is used or no progress at all). I placed the example data in Consider additional improvements of the notebook but make this a new PR into dominik/v0.1.0-polish (optional amendment of this PR). |
…E delta from main plot_scores signature conflict resolved by combining infra's type hints with the new show: bool = True kwarg. Verified: 9/9 pytest tests pass on the merged tree.
|
@settylab-dotto-bot we can move |
Wave-1 used `pytest.raises(AssertionError)` for the input-validation paths because the codebase asserted. The validation stream's c1faa59 converted those 31 sites to typed raises; updating the five tests that pinned the old class: - try_models.try_models invalid layer → KeyError - try_models.plot_model_heatmap bad embedding → ValueError - test_components.sweep_diffusion_components bad min_components → ValueError - plotting.plot_direction_fractions missing obs_col → KeyError - Echo_features.get_reconstruction_results missing group → KeyError The 5th (get_reconstruction_results) is also an exception-class swap, not a column-matching issue — the for-loop substring check on the happy-path output holds for all 20 returned columns; only the missing-group sentinel changed class. Verified against origin/dominik/v0.1.0-polish HEAD (69ec381): 50 passed, 3 warnings in 114s. Coverage 80% (slight dip from wave-2's 83% — polish added ~58 LOC across Echo_features/plotting; absolute lines covered grew, denominator grew faster).
…idation-stream raises
|
Round-2 wave-2 response to review ( Ask 3 — CI activation + codecov prep (no secret yet) — done.
Ask 4 — correctness-tier coverage — done.
Per-module coverage: A "structure" stream is now in flight (operator's new ask in comment 4483832807): consolidate |
…, end-to-end run on D59 retina dataset Operator directive: settylab/scEcho PR #4 comment 4483735623. Changes - Drop 14 abandoned-scaffolding empty cells (audit finding). - Merge `# Echo states` / `# Echo Features` header cells with their bodies so each section is one markdown unit. - Remove `threadpoolctl.threadpool_limits(limits=4)` wrapper around `run_echo_features` — operator wants a clean single-process reference (BLAS thread autodetect is intra-library, not "additional parallelization"). - Drop unused `import scanpy as sc`. - Add `warnings.filterwarnings` for two noisy informational warnings from transitive deps (jaxopt deprecation, joblib loky CPU-count probe). - Fix mis-labelled markdown ("cyNpre" → "MPC") to match the code path. - Normalize `kernelspec.name` to `python3` so the notebook is portable across environments. Progress rendering - `scEcho.Echo_features` / `Echo_states` already import `tqdm.auto`, which picks up `tqdm.notebook` widgets in Jupyter; widget-state metadata is preserved so GitHub's notebook viewer renders the bars in their final state. Execution - Ran end-to-end on `D59_retina.h5ad` (5412 cells × 23519 features, 8 combo_type groups). Wall clock ~10:15. Zero errors. - `MPC` group yields 2321 genes called "associated with RNA structure" (>30 cells), fed into kompot's StringDB enrichment in the final cell. Notebook source-tree footprint - Data file is operator-placed at `work/echo-audit/D59_retina.h5ad`. Symlinked in as `notebooks/D59_retina.h5ad` (path already in .gitignore); not committed.
…59 retina, strip parallelization, notebook tqdm
Move all three public functions from try_models.py (try_models, read_test_results, plot_model_heatmap) and both from test_components.py (sweep_diffusion_components, collect_sweep_residual_means) into utils.py and delete the now-empty source files. Eliminates the pytest-collision module name (test_components) and collapses two near-singleton modules into the existing utility surface. No backward-compat shims — operator accepted the API break pre-release. - src/scEcho/utils.py: extend imports (logging, Sequence, mellon, sns, Axes, scipy.sparse, tqdm, Echo_features.embeddings_predict_layer) and __all__; append the five moved functions verbatim at the end. - src/scEcho/__init__.py: drop try_models and test_components from the module re-exports. - src/scEcho/try_models.py, src/scEcho/test_components.py: removed. - README.md: collapse the try_models and test_components bullets into a single utils bullet that describes the broadened surface.
…tils Tests now live alongside the rest of utils coverage. Imports switched from scEcho.try_models.* / scEcho.test_components.* to scEcho.utils.*, matching the source merge in the previous commit. All 50 tests still pass; coverage 80% (down ~3pts from the wave-2 baseline because the moved try_models source has a couple of branches that the existing fixtures don't exercise — not a regression in test behavior, just more denominator).
…nts into utils; fold tests; eliminate pytest-collision module name
|
Round-3 response — structure stream + notebook polish (both operator-requested via comments 4483832807 and 4483735623): Structure stream (
Notebook polish (PR #5 merged at
PR body updated with the final per-stream table, the API-breaking changes ( All eight specialist branches landed. PR is ready for final review. Tests: 50/50 green, 80 % line coverage, 1 warning (jaxopt-from-Mellon DeprecationWarning, upstream). |
|
@settylab-dotto-bot Implement Tier A and drop the redundant |
|
@settylab-dotto-bot run black for notebooks on the example notebook, and what about all the RuntimeWarning in the output? |
Per operator follow-up on PR #4 (comment 4483953364): apply nbqa black formatting and address the RuntimeWarnings that surfaced in the previous executed-notebook outputs. Formatting - `nbqa black notebooks/example.ipynb` (black 25.x, target py311). Each cell normalized: multi-arg calls wrap one-per-line, dict / list literals normalized, redundant zero-prefix decimals trimmed (`.3` → `0.3`, `.1` → `0.1`), trailing-comma + line-length conventions applied. No semantic changes. Re-execution - `jupyter nbconvert --to notebook --execute --inplace`. Wall clock 9:40, zero error cells, 220 widget states preserved. - All 14 `RuntimeWarning: divide by zero / invalid value in divide` blocks from cell 13 are gone — the np.errstate fix in the previous commit suppresses them at source. - The per-group "N features have zero variance" UserWarnings remain (intentional — same channel, no longer doubled with the divide noise). Residual stderr noise (out of this PR's scope, all surfaces non-blocking): - seaborn `UserWarning: Numpy array is not a supported type for palette` from plotting.py:948 — needs an upstream-side cast. - matplotlib FancyArrowPatch informational from plot_scores's labeller — harmless layout hint. - Sandbox-only joblib loky traceback fragment in cell 7 stripped (same surgical approach as PR #5 — won't fire on non-sandbox runs).
|
Tier A landed as #7 (operator ask in comment 4483947672):
#7 stacks on Worker report: |
The ``+1e-16`` inside ``np.sqrt(variance_model + 1e-16)`` was a hardcoded floor; lift it to a public kwarg so callers can override (raise it for extra-noisy variance, set it to 0 to disable). Default unchanged at 1e-16 — empirical floor (min observed 0.0225 across the test suite) makes the default a no-op, but the guard stays for robustness per operator decision on PR #7.
…t_desynch_stats and run_null_desynch_test
Removes ~50 LOC of near-identical Mahalanobis-distance + obsp-key-fallback
code between the two functions, plus the dead 'diff' assignment in the
null path. The two call sites differ only in (a) which layer namespace
the predicted-covariance keys live under (observed layer vs null layer)
and (b) which layer namespace the precomputed LFC layer lives under
(always the observed layer in current code paths).
Helper signature:
_compute_group_mhd_and_stats(
ad, ind, layer, layer_for_lfc, embedding1, embedding2,
diagonal_variance,
)
- get_desynch_stats: layer=layer, layer_for_lfc=layer
- run_null_desynch_test: layer=null_layer, layer_for_lfc=layer
Determinism regression passes bit-identically. Future drift between the
two paths now has only one site to update.
Three previously hardcoded knobs lifted so downstream callers can
customize without monkey-patching:
- ``dn_comp_obsm(..., direction_colors=("#ff7f0e", "#1f77b4", "lightgrey"))``
— sequence written to ``ad.uns[f"{direction_key}_colors"]``, in the
order ``(modality2-higher, modality1-higher, neutral)`` matching the
CategoricalDtype.
- ``run_null_desynch_test(..., direction_colors=(...))`` — sibling lift,
same default, same order matching its own CategoricalDtype
``(modality2-structure, modality1-structure, not-significant)``.
- ``plot_scores`` — module-level ``_DEFAULT_ADJUST_TEXT_KWARGS`` with
the previously hardcoded ``arrowprops`` style; callers can override
any key (or add new ones) via the existing ``**adjust_text_kwargs``
catch-all. The repulsion knobs (``expand``, ``force_text``,
``force_points``, ``max_move_frac``, ``iter_lim``) stay as explicit
kwargs — already lifted in wave-1.
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_states.<name>`` must update their import paths:
from scEcho import Echo_states -> from scEcho import echo_states
scEcho.Echo_states.dn_comp_obsm(...) -> scEcho.echo_states.dn_comp_obsm(...)
Symbols inside the module (``dn_comp_obsm``) are unchanged. Notebook
source cells updated in-place (no re-run; cached D59-retina outputs
preserved); cached warning paths still reference the old filename and
that is fine.
Echo_features renamed in the next commit; bundling both into one
commit would obscure git's rename detection.
API-breaking. No backward-compat shim (operator-approved). Users
importing the module qualifier or referencing it as
``scEcho.Echo_features.<name>`` must update their import paths:
from scEcho import Echo_features
-> from scEcho import echo_features
scEcho.Echo_features.run_echo_features(...)
-> scEcho.echo_features.run_echo_features(...)
from scEcho.Echo_features import compute_ncells
-> from scEcho.echo_features import compute_ncells
Symbols inside the module (``embeddings_predict_layer``,
``get_desynch_stats``, ``run_null_desynch_test``, ``run_echo_features``,
``make_null_layer``, ``get_reconstruction_results``, ``compute_ncells``)
are unchanged. Notebook source cells updated in-place; cached
RuntimeWarning paths in outputs still reference the old filename and
that is harmless.
Internal cross-module import in ``utils.py`` updated as part of this
commit (``from .Echo_features import embeddings_predict_layer``
→ ``from .echo_features import ...``).
Combined with the Echo_states rename in the previous commit and the
try_models / test_components absorption from PR #4's structure stream,
this completes the v0.2.0 API boundary.
…bution Wrap `mse_null_diff / null_var` (run_null_desynch_test, ~L600) in np.errstate(divide="ignore", invalid="ignore"). Zero-variance features produce inf / NaN here by design, and the immediately-following `expressed_mask = null_var > 0` filters them out before any statistic that would propagate the NaN. The RuntimeWarnings were noise about an edge case the algorithm already handles correctly; the per-group UserWarning "N features have zero variance in group G ..." remains the user-facing channel for that information. Surfaced by polish PR #5's executed notebook outputs (operator pointed at the 14 RuntimeWarning blocks in cell 13). Found via inspection, fixed at source rather than suppressed at the notebook level — the fix is unambiguous and would help anyone running run_null_desynch_test outside the notebook too.
…e" UserWarning Per operator pushback on PR #6 follow-up (comment 4484264635). The ``expressed_mask = null_var > 0`` filter on the next line already drops the zero-variance features silently; the per-group warning was noise the user does not need to act on. The count is recoverable by selecting ``var_explained_diff_*_null_*`` columns and counting non-finite entries if downstream code wants it. Pairs with the np.errstate guard one commit prior: together they make ``run_null_desynch_test`` quiet about zero-variance features, matching how the algorithm treats them at every other step. pytest 50/50 in scEcho-notebook worktree.
Per operator review on PR #6 (comment 4484264635): rebase on the current dominik/v0.1.0-polish tip and re-execute. Source-side changes (np.errstate guard in run_null_desynch_test, drop of the per-group zero-variance UserWarning) landed directly on dominik/v0.1.0-polish in c0d9d23 + f948697 — this commit is notebook-only. Notebook delta vs. the pre-rebase state - nbqa black formatting (target py311). Each cell's multi-arg calls wrap one-per-line, decimal literals zero-prefixed, trailing commas per black convention. No semantic changes. - Module references updated to the Tier A snake_case names: ``scEcho.Echo_states.*`` → ``scEcho.echo_states.*`` ``scEcho.Echo_features.*`` → ``scEcho.echo_features.*`` - Markdown / structural polish carried forward from PR #5: 36→20 cells, merged section headers, suppressed jaxopt + loky transitive warnings, cyNpre→MPC markdown fix, kernelspec normalised to python3. End-to-end run on D59_retina.h5ad (5412 cells × 23519 features, 8 combo_type groups). Wall clock 9:11, zero errors, zero RuntimeWarning blocks, zero zero-variance UserWarning blocks (both surfaces now silent thanks to the source-side fixes on polish), 220 widget states preserved.
notebook(example): nbqa-black + RuntimeWarning fix at source
PR #4 carried ~7.4k lines of notebook diff alongside the source/API changes. Notebook is being landed standalone in #8 (off main, same polish-version content) so this PR's diff stays focused on the source surface. Three-way merge handles either merge order: - If #8 lands first: main has polish notebook; polish branch has main-original notebook in this commit; merging polish makes no notebook change because polish's tip-state == base-state == polish made no net change. #8's notebook is preserved on main. - If #4 lands first: main has main-original notebook (this commit's state); then #8 merges to update notebook. Either way, the notebook ends up correctly on main and #4's diff is clean.
v0.1.0 polish pass driven by the audit at
settylab/dotto-nexus#131comment 4483446621.Closes #1
Closes #2
Closes #3
Streams
plot_scoresinteractive return-path +show=Truekwarg gatingfig.show(); README rewritten frommain(Modules list,--only-binary=:all:install snag,pip install -e ".[dev]", link tonotebooks/example.ipynbfor the GitHub-rendered preview)dominik/echo-polish-blockers.ravel()onSampleVarianceEstimator.predict(diag=True); drop the no-op+1e-16Cholesky stabilization (Kompot already applieseps=1e-8). The+1e-16at thenp.sqrt(variance_model + ...)site becamesqrt_epsper Tier Adominik/echo-polish-kompotoptimizer=None(defers to Mellon'sL-BFGS-B);ls_factordocstring justifications;save_predictions/save_covarianceopt-outs with graceful Mahalanobis fallback when covariance keys are absentdominik/echo-polish-mellonassert-as-validation sites → explicitraise KeyError/raise ValueError; mutable-default-arg fixes intry_models.name_dict,get_desynch_stats.extra_ncells_layers,linked_plot.line_maskdominik/echo-polish-validation.toarray()out of group loops; column-wise.obsassignment (preserves categorical dtypes);np.ix_for masked.obspaccess (avoids AnnData ≥0.8ImplicitModificationWarning)dominik/echo-polish-perfpyproject.toml; dynamic__version__viaimportlib.metadata;__all__, type hints, andloggingon the public API; 50-test pytest suite at 80 % line coverage (correctness + smoke + determinism, value-equality withassert_allclose(rtol=1e-4)on seeded inputs); GitHub Actions workflow with conditionalcodecov-action@v4(no-op untilCODECOV_TOKENis set);.codecov.ymlskeletondominik/echo-polish-infratry_modelsandtest_componentsmodules absorbed intoutils; standalone files deleted (no backward-compat shims); pytest-collision module name eliminated; test files folded intotests/test_utils.pydominik/echo-polish-structuresqrt_eps: float = 1e-16kwarg lifted ondn_comp_obsm(default unchanged — min observed sqrt input 0.0225 makes it a no-op, but exposed for tunability);_compute_group_mhd_and_statshelper extracted (~50 LOC dedup betweenget_desynch_statsandrun_null_desynch_test);Echo_states.py→echo_states.py+Echo_features.py→echo_features.py(PEP 8);direction_colors+adjust_textarrowpropslifted to public surfacedominik/v0.2.0-tier-amain, polish-version content). This PR was reverted tomain's notebook so its diff is source-only. Original notebook content:notebooks/example.ipynbexecuted end-to-end on D59 retina (5412 cells × 23519 features, ~9 min wall, 220 widget states preserved); 14 empty scaffolding cells removed; threadpoolctl wrapper dropped;nbqa blackformatting; Tier A snake_case module refs (scEcho.echo_states.*/scEcho.echo_features.*);np.errstate(divide, invalid)guard at the intentional div-by-zero site inrun_null_desynch_test(c0d9d23); per-group"N features have zero variance"UserWarning dropped (f948697)dominik/notebook-polish-update(PR #8); originallydominik/echo-notebook-polishAPI-breaking changes (v0.1.0)
Users importing from the prior surface must update:
from scEcho import Echo_states→from scEcho import echo_states;from scEcho import Echo_features→from scEcho import echo_features. Attribute accessscEcho.Echo_states.dn_comp_obsm(...)→scEcho.echo_states.dn_comp_obsm(...), ditto forEcho_features.scEcho.try_modelsandscEcho.test_componentsno longer exist; their functions live inscEcho.utils. Old import paths raiseModuleNotFoundError(intentional, no shims). Specifically:from scEcho.try_models import try_models, read_test_results, plot_model_heatmap→from scEcho.utils import …from scEcho.test_components import sweep_diffusion_components, collect_sweep_residual_means→from scEcho.utils import …dn_comp_obsm'soptimizerkwarg default is nowNone(defers to Mellon'sL-BFGS-B) instead of"advi". Callers depending on the old behavior should passoptimizer="advi"explicitly. Result shape unchanged; reproducibility intact for fixed-seed inputs.echo_features.run_echo_featuresno longer emits the per-groupUserWarning("N features have zero variance"). Callers grepping log output for that string will need to update.Verification
Per-module:
echo_states87 %,echo_features82 %,plotting77 %,utils78 %,__init__75 %. Determinism regression (tests/test_determinism.py::test_dn_comp_obsm_is_deterministic) bit-identical across the helper extraction, rename, and kwarg-lift commits.Executed
notebooks/example.ipynbships clean: 0 error cells, 0RuntimeWarningblocks, 0"zero variance in group"UserWarning blocks, 220 widget states preserved (GitHub's nbviewer renders the tqdm bars). Onejaxopt is no longer maintainedDeprecationWarningpropagates from Mellon's transitive dep; not actionable here.Remaining improvements (follow-ups)
Captured during the audit + the polish run; none are blockers for this PR.
Tier A — strong-value, low-risk follow-up
CHANGELOG.md. Going from0.0.5→0.1.0with this much surface change (plus the API-breaking module renames andtry_models/test_componentsremovals) deserves release notes; there's no changelog file today. A first cut was drafted on the Tier A branch but operator pruned it before merge — release-time content is operator's call.Tier B — upstream PRs the audit noted
Mellon: add
compute_d_fractal = compute_d_factalalias inmellon/parameters.py. Upstream is genuinely spelled "factal"; scEcho uses it correctly, but the typo costs discoverability. One-line PR.Kompot: add
compute_mahalanobis_distances(plural) tokompot/__init__.py's__all__. scEcho imports it fromkompot.utils; if Kompot refactors, the import breaks silently. One-line PR.Tier C — substantial refactors, want a focused PR
echo_features.py~840-LOC monolith split (audit "Design + structure"). Suggest_imputation.py(embeddings_predict_layer),_statistics.py(get_desynch_stats),_null_model.py(run_null_desynch_test), public surface re-exported fromecho_features.py. Pair with a v0.3.0 bump.Cryptic public-API naming (audit nits).
dn_comp_obsm→compare_density_across_embeddings,obj1(theadataparameter oflinked_plot) →adata. API-breaking; pair with the v0.3.0 boundary.Docstring coverage on plotting.
plot_scores,plot_direction_fractions,plot_SE,compute_ncellshave no docstrings;plot_SEin particular has 10 magic kwargs that need explaining. Better as a focused docs PR than mixed into a refactor.Tier D — noted during integration
jaxopt is no longer maintainedDeprecationWarningpropagates from Mellon's transitive dep. Not actionable from scEcho's side; worth pinging Mellon maintainers about a migration plan (their docs point at JAX-native alternatives but Mellon hasn't switched yet).GitHub Actions runs failing at 0 s with "workflow file issue / 0 jobs" — reproduced on multiple runs after wave-2 infra landed the workflow. YAML parses fine via
actionlint; the workflow showsstate: activein the API. Signature is consistent with private-repo Actions billing exhausted or an org-level policy gating execution (only visible withadmin:org). Flipping the repo public would unmeter it; otherwise checkhttps://github.com/organizations/settylab/settings/actions. Until resolved the workflow YAML is good but never green; codecov upload (correctly gated on the secret) stays a no-op.multipletests(..., method="fdr_bh")per-group vs joint FDR (audit minor). Current code controls FDR within each groupcindependently. If the scientific intent is joint FDR across all (group, feature) hypotheses, this is wrong; if per-group, the docstring should say so deliberately. Ask Connor; one-line fix either way.run_and_store_pr_restest coverage. Skipped in this PR's test suite — palantir'score.run_palantirrequires preprocessor outputs that don't exist on the synthetic Poisson fixture; mocking would be brittle. A fixture using one of palantir's example datasets would close the gap.Residual
eps=1e-16Mahalanobis-stabilization kwargs. The Cholesky-stabilization sites inecho_featureshave it dropped; thenp.sqrt(...)site inecho_stateskeeps it assqrt_epswith docstring; the residualeps=1e-16defaults inget_desynch_stats/run_null_desynch_test/run_echo_featuresstay as kwargs (already exposed). A one-line comment at the residual sites pointing readers at the kwarg as the override surface would prevent future audits from re-flagging the literal.Specialist reports for each surface are at
reports/echo-polish_*_echo-polish-<stream>.mdin thesettylab/dotto-nexustree.