Skip to content

Mp/netvae refactor#60

Open
matthewpeterkort wants to merge 11 commits into
feature/netvae-refactorfrom
MP/netvae-refactor
Open

Mp/netvae refactor#60
matthewpeterkort wants to merge 11 commits into
feature/netvae-refactorfrom
MP/netvae-refactor

Conversation

@matthewpeterkort
Copy link
Copy Markdown
Collaborator

@matthewpeterkort matthewpeterkort commented Apr 10, 2026

clean up alot of stuff that broke in the netvae refactor.
add some e2e tests to prove functionality
remove old depcreated code, wire up existing code to use existing patterns.

Adds a verify command ex: embkit model verify netvae_latent2061_epochs20.model --ci
That can be used for inspecting the model file to see what the model did, the features, etc. It can also spot potential issues in the model if the model is something that you don't expect it to be, ex:

((venv) ) peterkor@RNB11238 embedding-kit % embkit model verify --help                               
Usage: embkit model verify [OPTIONS] MODEL_PATH

  Verify model integrity and architecture sanity.

Options:
  --json                          Emit a machine-readable JSON report.
  --ci                            CI mode: same as '--json --fail-on-
                                  unhealthy'.
  --fail-on-unhealthy / --no-fail-on-unhealthy
                                  Return non-zero exit code when integrity
                                  checks fail.  [default: no-fail-on-
                                  unhealthy]
  --strict                        Enable strict identity checks against
                                  expected model shape.
  --expected-feature-count INTEGER
                                  Expected number of input features in strict
                                  mode.
  --expected-latent-dim INTEGER   Expected latent dimension in strict mode.
  --expected-features-file FILE   Path to newline-delimited expected feature
                                  names in strict mode.

AI PR SUMMARY:

1. Constraint Architecture Migration

  • New Components: Added src/embkit/constraints/pathway_constraint.py featuring PathwayConstraintInfo.
  • Deprecations: Removed legacy src/embkit/constraints/network_constraint.py.
  • Factory Updates:
    • Updated constraint deserialization in src/embkit/factory/layers.py.
    • ConstraintInfo is now an abstract interface.
    • Concrete pathway payloads now deserialize via PathwayConstraintInfo.from_dict.
  • Logic Changes:
    • Removed old FeatureGroups-based mask generation from the layer factory path.
    • Layer.gen_layer() now passes (in_features, out_features) into constraints and enforces strict shape validation.

2. NetVAE Rebuilt Around Pathway Constraints

  • Layer Construction: src/embkit/models/vae/net_vae.py now constructs encoder/decoder masked layers directly from features, latent_groups, and group_layer_size.
  • New Constraint Hooks:
    • set_constraint_active(...)
    • update_membership(...)
    • refresh_masks(...)
    • get_constraint_projection_weights(...)
  • Integrity & Auditing:
    • Added verify_integrity for per-layer sparsity and leakage checks.
    • Removed deprecated config alias behavior; from_dict now errors on group_layer_scaling.

3. Training Path Hardening for Masked Weights

  • Masked Modules:
    • In src/embkit/modules/masked_linear.py, added in_features / out_features passthrough properties.
    • Forward pass now uses effective weight * mask.
    • Added clamp_masked_weights() for hard post-step enforcement.
  • Optimization Logic:
    • In src/embkit/optimize/__init__.py, added _enforce_model_masks(...) called after optimizer steps.
    • fit_vae(...) now refreshes masks before training.
    • Kept fit_net_vae(...) (API path) with explicit TODO/context comments.
    • Removed unused fit_alt(...).

4. Serialization and Verification Pipeline

  • Safety Nets: src/embkit/factory/core.py:save(...) now clamps masked weights before serialization.
  • Verification Tools: Added run_model_verification(...) to load models and run integrity checks.
  • Implementations: Added updates across VAE and FFNN classes for base checks (NaN/Inf, norms), RNAVAE diagnostics, and FFNN serialization support.

5. CLI Improvements

  • Command Updates:
    • train-netvae: Instantiates NetVAE(...) directly; added --group-layer-size.
    • train-vae: Added normalization guard for HDF5 input.
  • Pathway Parsing: Improved extract_sif_interactions and build_feature_map_indices.
  • New Command: Added model verify with --json, --ci, and --strict flags for health checks and latent assertions.

6. Documentation & API Cleanup

  • Removed docs for deleted legacy network constraint page.
  • Updated CLI/concepts/training/netvae docs for new architecture.
  • Added docs/change-notes.md.
  • Updated mkdocs navigation and requirements.

7. Testing Expansion (Unit + E2E)

  • E2E Workflow: Added toy NetVAE workflow with assertions for mask equality, gradient blocking on masked edges, and encode output validation.
  • Refactoring: Updated tests across commands, factory, and modules; removed tests for legacy constraints.

8. CI Coverage Enforcement Overhaul

  • Workflow Updates: /.github/workflows/pr_coverage_check.yml now fetches full history for diff coverage.
  • Reporting: Generates XML + JSON; prints breakdowns in logs.
  • Thresholds: Enforces 80% total project coverage and 80% new file coverage via custom scripts.

Breaking / Behavioral Notes

  • group_layer_scaling is no longer accepted for NetVAE config loading.
  • Legacy embkit.constraints.network_constraint has been removed.
  • Constraint behavior is now strictly via PathwayConstraintInfo and MaskedLinear.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 10, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
2819 2342 83% 0% 🟢

New Files

File Coverage Status
src/embkit/constraints/pathway_constraint.py 89% 🟢
TOTAL 89% 🟢

Modified Files

File Coverage Status
src/embkit/c_bio/api.py 100% 🟢
src/embkit/commands/align.py 100% 🟢
src/embkit/commands/matrix.py 100% 🟢
src/embkit/commands/model.py 85% 🟢
src/embkit/commands/protein.py 98% 🟢
src/embkit/constraints/_init_.py 100% 🟢
src/embkit/factory/_init_.py 100% 🟢
src/embkit/factory/core.py 48% 🟢
src/embkit/factory/layers.py 78% 🟢
src/embkit/files/h5.py 100% 🟢
src/embkit/files/read_csv.py 93% 🟢
src/embkit/losses/vae_loss.py 89% 🟢
src/embkit/models/vae/base_vae.py 42% 🟢
src/embkit/models/vae/decoder.py 93% 🟢
src/embkit/models/vae/encoder.py 87% 🟢
src/embkit/models/vae/net_vae.py 68% 🟢
src/embkit/models/vae/rna_vae.py 81% 🟢
src/embkit/models/vae/vae.py 86% 🟢
src/embkit/modules/masked_linear.py 88% 🟢
src/embkit/modules/tsp.py 100% 🟢
src/embkit/optimize/_init_.py 68% 🟢
src/embkit/pathway.py 98% 🟢
src/embkit/preprocessing/normalize.py 100% 🟢
src/embkit/resources/resource.py 86% 🟢
src/embkit/utilities/pca.py 100% 🟢
TOTAL 88% 🟢

updated for commit: 62f197e by action🐍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants