-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add Prov-GigaPath linear probe test workflows and prediction-map utilities #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
129 commits
Select commit
Hold shift + click to select a range
24668c3
feat: create ml pipeline for linear probe
vojtech-cifka f340038
refactor(ml): switch DataModule to HF datasets with fold-based split
vojtech-cifka c3ef38a
feat(ml): wire up linear probe training with k-fold CV on cached embe…
vojtech-cifka c644f22
fix(configs): use override for class_mapping in experiment yaml
vojtech-cifka 564b0b1
fix(scripts): drop duplicate +ml= from linear-probe submit command
vojtech-cifka 3a77adc
fix(ml): register random_seed/len resolvers and unflatten class_mappi…
vojtech-cifka 11b19f0
fix(ml): accept already-canonical labels in datamodule label map
vojtech-cifka c6bfe8e
feat(ml): class-weighted CE, raise class_coverage_min to 0.5
vojtech-cifka 894c27b
fix: sort only tiles parquet
vojtech-cifka fc824ad
fix: log join types of tile keys
vojtech-cifka 11931d1
fix: remove embeddings from the join
vojtech-cifka fb6b320
fix: remove label column
vojtech-cifka 7434ae9
fix: prevent overflow
vojtech-cifka 1b18daa
Merge remote-tracking branch 'origin/master' into feature/linear-probe
vojtech-cifka bef70df
feat: add embedding dataset build pipeline
vojtech-cifka 911bec2
feat: add class tresholds and run ids
vojtech-cifka 1a02395
fix: wrong run id
vojtech-cifka 08d7ba5
Merge remote-tracking branch 'origin/master' into feature/embedding-d…
vojtech-cifka b38465e
feat: add timing
vojtech-cifka bfc9578
refactor: use pyarrow to avoid to pandas conversion
vojtech-cifka eb213c6
fix: join on keys only
vojtech-cifka c92d9a1
fix: typing
vojtech-cifka 01cc394
fix: add prints
vojtech-cifka cad0d37
refactor: use combine chunks
vojtech-cifka ae04552
fix: lazy-cast embeddings to large_list and stay in Arrow during join
vojtech-cifka 82320db
fix: validate label/tissue_prop columns when derive=False
vojtech-cifka 3b0137f
chore: remove time
vojtech-cifka 8df47aa
feat: add timing
vojtech-cifka 926753d
chore: revert to the previous state
vojtech-cifka b0e9ba4
feat: add prints
vojtech-cifka 6a915de
refactor: use discusssed thresholds
vojtech-cifka 0f50307
refactor: use different labeling strategy
vojtech-cifka 4d953dc
feat: implement training pipeline
vojtech-cifka d5798bc
feat: add class weights
vojtech-cifka ae45cd5
refactor: join embeddings with metadata while loading the dataset
vojtech-cifka bdce760
feat: add prints
vojtech-cifka ac633d5
fix: use chunks
vojtech-cifka 2793562
fix: use numpy chunks
vojtech-cifka e81973e
fix: call end at the end of the main
vojtech-cifka 0071592
chore: remove prints
vojtech-cifka c0a7499
chore: remove debug prints, stale TODO, and unused preprocessing pipe…
vojtech-cifka fe918d1
chore: remove markdown file
vojtech-cifka 6b7d1e8
fix: edge cases
vojtech-cifka 4ff988e
feat: normalize the confusion matrix rows per class recall
vojtech-cifka 32375b2
fix: format
vojtech-cifka af9538a
feat: use stratified k fold run
vojtech-cifka bc0819a
fix: remove criterion
vojtech-cifka b8e85e0
fix: remove criterion from configs
vojtech-cifka c387189
feat: implement test pipeline
vojtech-cifka 1216504
fix: Hydra unreached
vojtech-cifka 7ec86ef
fix: set weights only to false
vojtech-cifka c9b566e
fix: criterion weight
vojtech-cifka ff4d307
Merge branch 'master' into feature/ml-linear-classifier
vojtech-cifka 3cc670d
feat: add option to use different kfold strategies
vojtech-cifka ad0a4e7
feat: add training without validation
vojtech-cifka 811e21c
feat: implement final test run
vojtech-cifka 27ceea3
fix: lower LR and patience
vojtech-cifka efde82a
fix: use f1 macro as a monitor
vojtech-cifka c8102de
fix: rever back to validation loss
vojtech-cifka c5bab90
fix: add weight decay 1e-3 to linear classifier
vojtech-cifka 475b67c
Revert "fix: add weight decay 1e-3 to linear classifier"
vojtech-cifka 43663a9
feat: add logistic regression
vojtech-cifka a2fe451
feat: polish and add two distinct submission scripts
vojtech-cifka 31ecf6d
fix: submission scripts
vojtech-cifka ff8d0bf
feat: implement knn
vojtech-cifka 1f87154
refactor: focus on convergence
vojtech-cifka 7039307
Remove kNN sklearn baseline
vojtech-cifka 729eccd
fix: change monitor to focus on train losss
vojtech-cifka d3ed2ed
feat: add run name
vojtech-cifka e9fd559
chore: remove logistic regression
vojtech-cifka 6dadbd7
feat: implement lbfgs
vojtech-cifka d5d3edd
fix: run id
vojtech-cifka 2163699
fix: cache the tiles and embeddings so they do not need to be downloa…
vojtech-cifka 9286807
fix: limit num of workers
vojtech-cifka bb8a043
fix: support checkpoint test and prediction export
vojtech-cifka c284d8d
Merge remote-tracking branch 'origin/feature/linear-probe' into featu…
vojtech-cifka efddcd6
Revert "Merge remote-tracking branch 'origin/feature/linear-probe' in…
vojtech-cifka 420534e
Merge remote-tracking branch 'origin/feature/ml-linear-classifier' in…
vojtech-cifka 14909e2
feat: add functionality to submit final train for both adamw and lbfgs
vojtech-cifka 8167363
feat: implement prediction maps
vojtech-cifka 4e45ce1
fix: change the adamw checkpoint dir name to last
vojtech-cifka 8f9ce70
fix: lower the batch so the compute does not hang
vojtech-cifka 99c2d0d
fix: put num workers to 0
vojtech-cifka 01486bd
feat: add prints
vojtech-cifka 64963ac
Merge branch 'master' into feature/ml-test-mode
vojtech-cifka 85270fd
feat: add diagnostic prints
vojtech-cifka 5db671c
fix: use numpy buffer
vojtech-cifka 3aea3c2
refactor: use HeatmapAssembler
vojtech-cifka 756642a
chore: clean config structure
vojtech-cifka 2771e78
fix: prediction maps class indices
vojtech-cifka 918b691
fix: format and mypy
vojtech-cifka 4032df3
feat: add posibility to predict the whole slide with tissue area
vojtech-cifka ca50a7c
feat: add embeddings for whole slide
vojtech-cifka 6489cd0
refactor: compute grayscale mask per each class
vojtech-cifka afddfc1
feat: add the provgigapath train and test runs
vojtech-cifka b618807
feat: set final weight decay for train
vojtech-cifka 0ec0da8
feat: turn on prediction maps for the test runs over the annotated re…
vojtech-cifka 9d8729a
refactor: do not generate error masks
vojtech-cifka ac0ce16
chore: config cleanup
vojtech-cifka 099e277
feat: add prints to the prediction maps writer
vojtech-cifka 4909324
feat: add embeddings run id for the whole tissue tiles run
vojtech-cifka ee9d2da
feat: add prediction maps in configs
vojtech-cifka 8b3a82d
chore: deduplicate, apply safety nets
vojtech-cifka 27c7596
Merge branch 'feature/ml-test-mode' into feature/provgigapath-metrics…
vojtech-cifka e16426e
fix: pytorch checkpoint loading
vojtech-cifka fd3fdd6
chore: remove redundancy, rename variables
vojtech-cifka c401015
chore: remove username and branch
vojtech-cifka 847c3cc
refactor: rename configs
vojtech-cifka b138a42
Merge branch 'feature/ml-test-mode' into feature/provgigapath-metrics…
vojtech-cifka 51f36fb
chore: remove pgp test prediction maps
vojtech-cifka 2ba0562
fix: keep criterion.weight in state_dict for strict checkpoint load
vojtech-cifka e370417
fix: criterion weight
vojtech-cifka 632a8f6
fix: keep space in MUG prediction masks names
vojtech-cifka 3cd0243
fix: log test accuracy as jsons
vojtech-cifka 76e4194
chore: remove username from the submission script
vojtech-cifka 597e348
fix: force the entering of the write phase of the prediction maps
vojtech-cifka e4a4cc5
fix: surface why prediction-map write phase skips
vojtech-cifka 3829ebd
fix: remove username
vojtech-cifka 0b2d38e
feat: generate embeddings up to a budget
vojtech-cifka ff7c06e
Merge branch 'feature/ml-test-mode' into feature/provgigapath-metrics…
vojtech-cifka 4a528d7
Merge branch 'master' into feature/provgigapath-metrics-test
vojtech-cifka b3d803a
chore: rename ml experiments for clarity
vojtech-cifka 1703c01
feat: add original slide name in the per slide statistics
vojtech-cifka 3c72c4f
Merge remote-tracking branch 'origin/master' into feature/provgigapat…
vojtech-cifka 67d3ef4
refactor: simplify the preprocessing name scripts
vojtech-cifka cd4b19a
fix: commnets, generate safer filenames
vojtech-cifka c9b4c67
fix: remove erorr masks
vojtech-cifka df7888f
fix: remove rendundant column selection
vojtech-cifka 191892c
fix: format
vojtech-cifka File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
21 changes: 21 additions & 0 deletions
21
configs/experiment/ml/final_linear_provgigapath_adamw.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/ml/final_linear_virchow2_adamw | ||
| - _self_ | ||
|
|
||
| embedding_model_name: ProvGigaPath | ||
| embedding_dim: 1536 | ||
| embedding_run_id: 410c8672471348ceb4c58817f70fa097 | ||
| kfold_strategy: stratified_group | ||
| kfold_run_id: ${dataset.mlflow_artifacts.stratified_group_kfold_run_id} | ||
| mlflow_artifact_path: linear_classifier_final_provgigapath | ||
|
|
||
| # Set after Stage 1 from ProvGigaPath's own AdamW sweep selected by | ||
| # validation/f1_macro. | ||
| model: | ||
| weight_decay: 1.0e-4 | ||
|
|
||
| metadata: | ||
| run_name: Final Linear Classifier AdamW ProvGigaPath ${dataset.name} | ||
| description: "Final AdamW linear probe over frozen ProvGigaPath embeddings, trained on all training folds with the ProvGigaPath-selected weight decay." |
21 changes: 21 additions & 0 deletions
21
configs/experiment/ml/final_linear_provgigapath_lbfgs.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/ml/final_linear_virchow2_lbfgs | ||
| - _self_ | ||
|
|
||
| embedding_model_name: ProvGigaPath | ||
| embedding_dim: 1536 | ||
| embedding_run_id: 410c8672471348ceb4c58817f70fa097 | ||
| kfold_strategy: stratified_group | ||
| kfold_run_id: ${dataset.mlflow_artifacts.stratified_group_kfold_run_id} | ||
| mlflow_artifact_path: linear_classifier_final_provgigapath | ||
|
|
||
| # Set after Stage 1 from ProvGigaPath's own LBFGS sweep selected by | ||
| # validation/f1_macro. | ||
| model: | ||
| weight_decay: 1.0e-4 | ||
|
|
||
| metadata: | ||
| run_name: Final Linear Classifier LBFGS ProvGigaPath ${dataset.name} | ||
| description: "Final LBFGS linear probe over frozen ProvGigaPath embeddings, exact full-batch solve with the ProvGigaPath-selected weight decay." |
File renamed without changes.
File renamed without changes.
13 changes: 0 additions & 13 deletions
13
configs/experiment/ml/linear_classifier_adamw_stratified_kfold.yaml
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/ml/final_linear_provgigapath_adamw | ||
| - _self_ | ||
|
|
||
| # Held-out test for the final ProvGigaPath AdamW checkpoint. Uses the same | ||
| # filtered labeled test split, thresholds, metrics, and checkpoint convention as | ||
| # the Virchow2 test config. | ||
| mode: test | ||
| final_train_run_id: fe172ccd8c1140269f7f3d1fdbd351ea | ||
| checkpoint: mlflow-artifacts:/104/${final_train_run_id}/artifacts/checkpoints/last/checkpoint.ckpt | ||
| checkpoint_weights_only: false | ||
|
|
||
| data: | ||
| num_workers: 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/ml/final_linear_provgigapath_lbfgs | ||
| - override /ml/trainer: early_stopping | ||
| - _self_ | ||
|
|
||
| # Held-out test for the final ProvGigaPath LBFGS checkpoint. Uses the same | ||
| # filtered labeled test split, thresholds, metrics, and checkpoint convention as | ||
| # the Virchow2 test config. | ||
| mode: test | ||
| final_train_run_id: 067b08dcbdb54d9187fbd4dd8d5599a1 | ||
| checkpoint: mlflow-artifacts:/104/${final_train_run_id}/artifacts/checkpoints/last/checkpoint.ckpt | ||
| checkpoint_weights_only: false | ||
|
|
||
| data: | ||
| train_batch_size: 1024 | ||
| num_workers: 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...ment/ml/linear_classifier_test_lbfgs.yaml → ...riment/ml/test_linear_virchow2_lbfgs.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
configs/experiment/ml/train_linear_provgigapath_adamw_group_kfold.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/ml/train_linear_virchow2_adamw_group_kfold | ||
| - _self_ | ||
|
|
||
| embedding_model_name: ProvGigaPath | ||
| embedding_dim: 1536 | ||
| embedding_run_id: 410c8672471348ceb4c58817f70fa097 | ||
| mlflow_artifact_path: linear_classifier_provgigapath | ||
|
|
||
| metadata: | ||
| run_name: Linear Classifier ProvGigaPath ${dataset.name} ${kfold_strategy} fold=${val_fold} opt=${model.optimizer} wd=${model.weight_decay} | ||
| description: "Linear probe over frozen ProvGigaPath embeddings (run ${embedding_run_id}), ${kfold_strategy} kfold metadata ${kfold_run_id}." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
2 changes: 1 addition & 1 deletion
2
...ssifier_lbfgs_stratified_group_kfold.yaml → ...in_linear_virchow2_lbfgs_group_kfold.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
17 changes: 0 additions & 17 deletions
17
configs/experiment/preprocessing/embeddings_virchow2_tissue_tiles_05mpp.yaml
This file was deleted.
Oops, something went wrong.
20 changes: 20 additions & 0 deletions
20
configs/experiment/preprocessing/embeddings_virchow2_tissue_tiles_0_5mpp.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # @package _global_ | ||
|
|
||
| defaults: | ||
| - /experiment/preprocessing/embeddings_virchow2_0_5mpp | ||
| - _self_ | ||
|
|
||
| # Embeddings for a deterministic sampled subset of test slides whose tiles | ||
| # intersect the tissue mask. The sample is capped by slide_sample_max_tiles and | ||
| # selected with slide_sample_seed for doctor-review prediction maps. | ||
| splits: | ||
| - test | ||
| tile_source_run_id: ${dataset.mlflow_artifacts.tissue_stats_run_id} | ||
| tile_source_artifact_template: "tissue_stats/{split}_tiles.parquet" | ||
| tile_filter_column: tile_tissue_coverage | ||
| slide_sample_max_tiles: 2000000 | ||
| slide_sample_seed: 0 | ||
|
|
||
| metadata: | ||
| run_name: "Embeddings: ${model} tissue tiles" | ||
| description: "Tile embeddings using ${model} over a sampled held-out test slide subset with tile_tissue_coverage > 0, capped by slide_sample_max_tiles=${slide_sample_max_tiles} and selected with slide_sample_seed=${slide_sample_seed}." | ||
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.