A pipeline for testing differential effect (DE) of gene targets on gene programs. This pipeline implements a Conditional Randomization Test (CRT) to assess the statistical significance of perturbations.
The core analysis pipeline lives in the src.sceptre module.
This pipeline tests target gene → program usage effects using a Conditional Randomization Test (CRT) with optional skew-normal calibration.
- Cells:
$i = 1,\dots,N$ - Programs:
$k = 1,\dots,K$ (here$K \approx 70$ ) - Covariates:
$C \in \mathbb{R}^{N \times p}$ (includes an intercept column) - cNMF usage (composition):
$u_i \in \Delta^{K-1}$ with$\sum_{k=1}^K u_{ik}=1$ and$u_{ik}\ge 0$
For each target gene
Because program usages are compositional (each row sums to 1), we apply a centered log-ratio (CLR) transform. After flooring and renormalization (to avoid
Then the CLR-transformed outcome is:
Collect outcomes in
For each program
where
Test statistic: the OLS coefficient
The CRT tests
Fit a regularized logistic regression for the union indicator:
Generate
Efficient Bernoulli resampling (index sampler). Instead of drawing
- Draw how many resamples include cell
$i$ as treated:
- Sample
$M_i$ distinct resample indices uniformly without replacement:
- Set
$\tilde x_i^{(b)}=1$ for$b\in S_i$ and$\tilde x_i^{(b)}=0$ otherwise.
This yields exactly the same distribution as i.i.d. Bernoulli draws across
For each resample
In the implementation,
Let
CRT nulls can be skewed (sparse perturbations, heterogeneous covariates). We optionally smooth tails by fitting a skew-normal distribution to null z-scores.
Compute mean and standard deviation from the null draws:
Define null z-scores and the observed z-score:
Fit
where
- Two-sided:
- Right-tailed:
- Left-tailed:
Note: The skew‑normal calibration is two‑sided by default (skew_normal_side_code=0). The p‑value is computed as 2 × one tail, where the tail is chosen using the empirical median of the null z‑scores: use the right tail if (
If the skew-normal fit is unstable or fails diagnostics, fall back to the empirical CRT p-values.
Because
If
cNMF usages can be extremely close to zero, making
Practical choices for
- Quantile-based:
$\varepsilon=\mathrm{quantile}({u_{ik}}, q)$ with small$q$ (e.g.,$10^{-4}$ to$10^{-6}$ ) - Fixed:
$\varepsilon = 10^{-6}$
Smaller
- Python 3
condaorvirtualenv
-
Create a virtual environment:
It is recommended to create a dedicated environment. You can use the built-in Makefile command:
make create_environment
This will create a
condaorvirtualenvenvironment namedprogramDE. Activate it before proceeding. -
Install dependencies:
Install the required Python packages using pip:
pip install -r requirements.txt
-
Install the project source code:
To make the
srcdirectory importable as a package, install it in editable mode:pip install -e .
prepare_crt_inputs expects the following to be present in adata. It searches
common AnnData containers (.obsm, .layers, .obsp, .uns, .obs) by key.
- Covariates:
adata.obsm["covar"](shapeN x p, numeric, catergorical, or object). ADataFrameorndarrayis fine. Count-based covariates should be already log-transformed. Columns are z-scored and an intercept is added by the pipeline. Ifcovaris aDataFramewith categorical/string/bool/mixed columns, they are one-hot encoded automatically (dropping one level by default to avoid collinearity). Numeric columns with a small number of unique values (<=20 by default) are also treated as categorical and one-hot encoded. You can override the numeric threshold viaprepare_crt_inputs(..., numeric_as_category_threshold=...)or setnumeric_as_category_threshold=Noneto disable this heuristics. If you want stratified-permutation CRT with batch stratification, include abatchcolumn (or choose anotherbatch_key) in this DataFrame. - cNMF usage:
adata.obsm["cnmf_usage"](shapeN x K, numeric). Each row should sum to 1 (the CLR step will floor and renormalize). - Guide assignment:
adata.obsm["guide_assignment"](shapeN x G). Can be dense or sparse; nonzero means guide present. - Guide → gene mapping:
adata.uns["guide2gene"](dict-like). Keys are guide IDs, values are gene names. - Guide names (if guide_assignment is not a DataFrame with columns):
adata.uns["guide_names"]list of guide IDs in column order. - Program names (optional):
adata.uns["program_names"]list of lengthK.
All matrices must have the same number of rows (N = number of cells).
The core functionality is the SCEPTRE-style union CRT. The recommended starting point is skew-normal calibrated p-values:
from src.sceptre import (
prepare_crt_inputs,
run_all_genes_union_crt,
store_results_in_adata,
limit_threading,
)
# It is recommended to limit BLAS threads for reproducibility
limit_threading()
inputs = prepare_crt_inputs(
adata=adata,
usage_key="cnmf_usage",
covar_key="covar",
guide_assignment_key="guide_assignment",
guide2gene_key="guide2gene",
)
out = run_all_genes_union_crt(
inputs=inputs,
B=1023, # Number of permutations
n_jobs=16, # Number of parallel jobs
calibrate_skew_normal=True,
return_skew_normal=True,
)
# Store results back into the AnnData object
store_results_in_adata(
adata=adata,
pvals_df=out["pvals_df"],
betas_df=out["betas_df"],
treated_df=out["treated_df"],
)Optional: switch to the stratified-permutation CRT sampler (S-CRT):
out = run_all_genes_union_crt(
inputs=inputs,
B=1023,
n_jobs=16,
resampling_method="stratified_perm",
resampling_kwargs=dict(
n_bins=20,
stratify_by_batch=True,
batch_key="batch",
min_stratum_size=2,
),
calibrate_skew_normal=False,
return_raw_pvals=True,
)Rank-based U-test statistic (S-CRT U-test):
out = run_all_genes_union_crt(
inputs=inputs,
B=1023,
n_jobs=16,
resampling_method="stratified_perm",
resampling_kwargs=dict(
n_bins=20,
stratify_by_batch=True,
batch_key="batch",
min_stratum_size=2,
),
test_stat="utest",
test_stat_kwargs=dict(use="clr", rank_method="average"),
calibrate_skew_normal=False,
)
# Rank-biserial effect sizes live in out["betas_df"];
# out["stats_df"] is provided as an explicit alias when test_stat="utest".
# To rank raw usage instead of CLR, use test_stat_kwargs={"use": "usage"}.
# This uses the same eps_quantile flooring/renormalization from prepare_crt_inputs.Batch stratification uses the raw covariate DataFrame (before one-hot encoding). If your covariates were provided as an ndarray, stratify_by_batch is ignored.
This mode uses NTC pseudo-genes as the null distribution and matches on:
n1: treated count (union)d: OLS denominatorx^T M_C x(computed from the same CLR-OLS statistic)
Workflow (don’t skip steps):
- Ensure
adata.obsm["covar"]is a DataFrame with abatchcolumn. - Call
prepare_crt_inputs(...)(captures raw covariates for batch meta‑analysis). - Choose NTC labels that exist in
adata.uns["guide2gene"]values. - Run
run_all_genes_union_crt(..., null_method="ntc_empirical"). - Use
out["pvals_df"]as the empirical‑null p-values; inspectout["ntc_matching_info"]for bin sizes/fallbacks.
End‑to‑end example:
from src.sceptre import prepare_crt_inputs, run_all_genes_union_crt
# 1) Ensure covariates are a DataFrame with batch
covar_df = adata.obsm["covar"].copy()
if "batch" not in covar_df.columns:
raise ValueError("covar_df must contain a 'batch' column for ntc_empirical.")
adata.obsm["covar"] = covar_df
# 2) Prepare inputs
inputs = prepare_crt_inputs(adata=adata, usage_key="cnmf_usage", covar_key="covar")
# 3) Run NTC empirical‑null CRT
out = run_all_genes_union_crt(
inputs=inputs,
null_method="ntc_empirical",
null_kwargs=dict(
ntc_labels=["SAFE", "non-targeting", "NTC"],
guides_per_unit=6,
n_ntc_units=5000,
batch_mode="meta", # per-batch pvals + Fisher combine
combine_method="fisher",
matching=dict(
n_n1_bins=10,
n_d_bins=10,
min_ntc_per_bin=50,
use_pbar=True, # optional: add pbar matching
n_pbar_bins=8,
),
min_treated=10,
min_control=10,
),
)
pvals_ntc = out["pvals_df"]
betas = out["betas_df"]
matching_info = out.get("ntc_matching_info", {})These QQ diagnostics cross-fit the NTC null: split NTC pseudo-genes into A/B, use A to define the null, then evaluate calibration using B vs A. This avoids the overly-optimistic “in-sample” QQ that happens if you compare NTC_A to itself.
Recommended usage (gene-level QQ is the primary diagnostic):
from src.sceptre import make_ntc_empirical_qq_plots, run_all_genes_union_crt
out = run_all_genes_union_crt(
inputs=inputs,
null_method="ntc_empirical",
qq_crossfit=True,
null_kwargs=dict(
ntc_labels=["SAFE", "non-targeting", "NTC"],
guides_per_unit=6,
n_ntc_units=5000,
batch_mode="meta",
combine_method="fisher",
matching=dict(n_n1_bins=10, n_d_bins=10, min_ntc_per_bin=50),
min_treated=10,
min_control=10,
qq_crossfit_seed=11,
qq_crossfit_frac_A=0.5,
),
)
make_ntc_empirical_qq_plots(
out=out,
out_dir="results/qq_ntc",
programs_to_plot="top_var",
n_programs=6,
make_per_batch=True,
make_meta=True,
envelope_boot=200,
seed=0,
)Step-by-step workflow (including what to do after plotting):
- Prepare inputs:
adata.obsm["covar"]must be a DataFrame with abatchcolumn, then runprepare_crt_inputs(...). - Run NTC empirical-null with cross-fit: set
null_method="ntc_empirical"andqq_crossfit=Truesoout["ntc_crossfit"]is produced. - Generate QQ plots: call
make_ntc_empirical_qq_plots(...)as shown above. - Inspect outputs on disk (the function saves PNGs; it does not display them):
from pathlib import Path
print(sorted(Path("results/qq_ntc").glob("**/*.png")))Expected files:
- Meta (combined across batches):
results/qq_ntc/qq_ntc_empirical_genelevel_meta.pngresults/qq_ntc/qq_ntc_empirical_program_<program>_meta.png
- Per-batch (if
make_per_batch=True):results/qq_ntc/per_batch/<batch>/qq_ntc_empirical_genelevel.pngresults/qq_ntc/per_batch/<batch>/qq_ntc_empirical_program_<program>.png
If no images appear, verify:
qq_crossfit=Trueandnull_method="ntc_empirical"out["ntc_crossfit"]is present (it’s required by the plotting function)programs_to_plotmatches program names inout["betas_df"].columns
Notes:
- The holdout calibration curve is NTC_B vs NTC_A (with a bootstrap band).
- The “Real genes vs A” curve should be interpreted relative to that holdout band.
- Program-level QQ uses a subset of programs (default: top variance in
betas_df). - Enabling
use_pbaradds a propensity-fit per unit, so it increases runtime.
Use this checklist to avoid missing any steps:
- Ensure
adata.obsm["covar"]is a DataFrame with abatchcolumn (or your chosenbatch_key). - (Optional) Add a burden column to
covar(e.g., log1p non-NTC guides per cell). - Call
prepare_crt_inputs(...)(captures raw covariates for stratification). - Define
resampling_kwargsand reuse them everywhere (main CRT, NTC groups, CRT-null pvals). - Run
run_all_genes_union_crt(..., resampling_method="stratified_perm"). - Build NTC guide groups and compute grouped NTC p-values with the same sampler options.
- Compute CRT-null p-values for the same NTC units (grouped guides).
- Plot the QQ curve using the NTC p-values + CRT-null p-values.
End-to-end example:
from src.sceptre import (
add_burden_covariate,
build_ntc_group_inputs,
compute_ntc_group_null_pvals_parallel,
crt_pvals_for_ntc_groups_ensemble,
crt_pvals_for_ntc_groups_ensemble_skew,
make_ntc_groups_ensemble,
prepare_crt_inputs,
run_all_genes_union_crt,
)
from src.visualization import qq_plot_ntc_pvals
# 1) Make covariates a DataFrame with batch
covar_df = adata.obsm["covar"].copy()
if "batch" not in covar_df.columns:
raise ValueError("covar_df must contain a 'batch' column for stratification.")
adata.obsm["covar"] = covar_df
# 2) Optional burden column (exclude NTC guides if desired)
add_burden_covariate(
adata=adata,
guide_assignment_key="guide_assignment",
covar_key="covar",
guide_names_key="guide_names",
guide2gene_key="guide2gene",
burden_key="log1p_non_ntc_guides_per_cell",
ntc_labels=["non-targeting", "safe-targeting", "NTC"],
include_ntc=False,
use_log1p=True,
)
# 3) Prepare CRT inputs
inputs = prepare_crt_inputs(adata=adata, covar_key="covar", batch_key="batch")
# 4) Shared S-CRT settings
resampling_kwargs = dict(
n_bins=20,
stratify_by_batch=True,
batch_key="batch",
min_stratum_size=2,
# Optional burden stratification:
# burden_key="log1p_non_ntc_guides_per_cell",
# n_burden_bins=8,
# burden_bin_method="quantile",
)
# 5) Run gene-level CRT with stratified permutation
out = run_all_genes_union_crt(
inputs=inputs,
B=1023,
n_jobs=16,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
calibrate_skew_normal=True,
return_raw_pvals=True,
return_skew_normal=True,
)
# 6) Build NTC guide groups (6-guide units) + compute NTC p-values
ntc_labels = ["non-targeting", "safe-targeting", "NTC"]
ntc_guides, guide_freq, guide_to_bin, real_sigs = build_ntc_group_inputs(
inputs=inputs,
ntc_label=ntc_labels,
group_size=6,
n_bins=10,
)
ntc_groups_ens = make_ntc_groups_ensemble(
ntc_guides=ntc_guides,
ntc_freq=guide_freq,
real_gene_bin_sigs=real_sigs,
guide_to_bin=guide_to_bin,
n_ensemble=10,
seed0=7,
group_size=6,
max_groups=None,
)
ntc_group_pvals_ens = crt_pvals_for_ntc_groups_ensemble(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
# If you ran the main CRT with test_stat="utest", pass the same here.
# test_stat="utest",
# test_stat_kwargs={"use": "clr", "rank_method": "average"},
)
ntc_group_pvals_skew_ens = crt_pvals_for_ntc_groups_ensemble_skew(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
)
# 7) CRT-null p-values matched to the same NTC units
null_pvals = compute_ntc_group_null_pvals_parallel(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
n_jobs=8,
backend="threading",
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
# Keep test_stat/test_stat_kwargs in sync with your main run if needed.
# test_stat="utest",
# test_stat_kwargs={"use": "clr", "rank_method": "average"},
)
# 8) QQ plot
ax = qq_plot_ntc_pvals(
pvals_raw_df=out["pvals_raw_df"],
guide2gene=adata.uns["guide2gene"],
ntc_genes=ntc_labels,
pvals_skew_df=out["pvals_df"],
null_pvals=null_pvals,
ntc_group_pvals_ens=ntc_group_pvals_ens,
ntc_group_pvals_skew_ens=ntc_group_pvals_skew_ens,
show_ntc_ensemble_band=True,
show_all_pvals=True,
title="QQ plot: S-CRT (grouped NTC controls)",
)run_all_genes_union_crt always returns pvals_df as the main p-value output.
What it contains depends on calibrate_skew_normal:
calibrate_skew_normal=False:pvals_df= raw CRT p-values.calibrate_skew_normal=True:pvals_df= skew-normal calibrated p-values.
Optional outputs (only when requested):
pvals_raw_df(ifreturn_raw_pvals=True): raw CRT p-values.pvals_skew_df(ifreturn_skew_normal=True): skew-normal p-values. Whencalibrate_skew_normal=True, this matchespvals_df.
Use this to sanity-check calibration of p-values for negative-control genes. The dashed null curve is built from CRT-null p-values (leave-one-out), so you must provide null_pvals directly or pass null_stats and let the helper compute them.
Required inputs:
pvals_raw_df: observed raw CRT p-values (genes × programs).guide2gene: guide → gene mapping.ntc_genes: negative-control gene labels (must exist inguide2genevalues).null_pvalsornull_stats.
Optional inputs:
pvals_skew_df: skew-calibrated p-values (for comparison).ntc_group_pvals_ens: grouped NTC control p-values to match real genes which has several guides/gene (recommended; see below).ntc_group_pvals_skew_ens: grouped skew p-values (required when comparing skew).show_null_skew=True: add a second null curve drawn from a skew-normal fit tonull_stats.show_all_pvals=True: scatter all observed p-values frompvals_raw_df.
Terminology:
null_stats: raw CRT null test statistics (e.g.,beta_nullfrom resamples).null_pvals: leave-one-out CRT-null p-values computed fromnull_stats.
Curve notes:
Important: the null curve should be built from the same unit as the observed NTC curve. If you plot grouped NTC controls (6‑guide units), compute CRT‑null p-values for those same guide groups (not the whole NTC union into one giant gene). When you use grouped NTC controls, concatenate p-values from all groups across all ensemble partitions for both raw and skew curves. One ensemble replicate = one random partition of NTC guides into groups while respecting the guide frequency signature of a real gene.
If you want all groups generated within each ensemble replicate (no truncation), use max_groups=None when building the ensemble and pass the full grouped outputs into qq_plot_ntc_pvals.
- NTC raw curve (grouped): QQ of all grouped raw p-values pooled across all ensembles.
- NTC skew curve (grouped): QQ of all grouped skew p-values pooled across all ensembles.
- NTC ensemble IQR band: variability across ensembles (median + IQR of per-ensemble QQ quantiles), not the curve itself.
- 95% CI band: theoretical uniform QQ band using
m_null(the number of null p-values), not fit to NTC or skew curves. - Each curve uses its own
m(number of p-values) when forming expected quantiles. - By default, NTC curves include all programs; slice DataFrames first if you want a subset.
Example (raw vs skew, grouped NTC controls, CRT-null curve):
from src.sceptre import (
build_ntc_group_inputs,
compute_ntc_group_null_pvals_parallel,
crt_pvals_for_ntc_groups_ensemble,
crt_pvals_for_ntc_groups_ensemble_skew,
make_ntc_groups_ensemble,
run_all_genes_union_crt,
)
from src.visualization import qq_plot_ntc_pvals
out = run_all_genes_union_crt(
inputs=inputs,
B=1023,
n_jobs=16,
calibrate_skew_normal=True,
return_raw_pvals=True,
return_skew_normal=True,
)
ntc_labels = ["non-targeting", "safe-targeting"]
# Identify NTC guides and build guide-frequency bins / real-gene signatures
ntc_guides, guide_freq, guide_to_bin, real_sigs = build_ntc_group_inputs(
inputs=inputs,
ntc_label=ntc_labels,
group_size=6,
n_bins=10,
)
# Create multiple random partitions (ensembles) of NTC guides into 6-guide groups
ntc_groups_ens = make_ntc_groups_ensemble(
ntc_guides=ntc_guides,
ntc_freq=guide_freq,
real_gene_bin_sigs=real_sigs,
guide_to_bin=guide_to_bin,
n_ensemble=10,
seed0=7,
group_size=6,
max_groups=None,
)
# Compute raw CRT p-values for each NTC group in each ensemble
ntc_group_pvals_ens = crt_pvals_for_ntc_groups_ensemble(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
)
# Compute skew-calibrated CRT p-values for each NTC group in each ensemble
ntc_group_pvals_skew_ens = crt_pvals_for_ntc_groups_ensemble_skew(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
)
# Build CRT-null p-values matched to NTC group units (recommended)
null_pvals = compute_ntc_group_null_pvals_parallel(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
n_jobs=8,
backend="threading",
)
ax = qq_plot_ntc_pvals(
pvals_raw_df=out["pvals_raw_df"],
guide2gene=adata.uns["guide2gene"],
ntc_genes=ntc_labels,
pvals_skew_df=out["pvals_df"],
null_pvals=null_pvals,
ntc_group_pvals_ens=ntc_group_pvals_ens,
ntc_group_pvals_skew_ens=ntc_group_pvals_skew_ens,
show_ntc_ensemble_band=True,
show_all_pvals=True,
title="QQ plot: grouped NTC controls (raw vs skew) vs CRT null",
)
import matplotlib.pyplot as plt
plt.show()Stratified‑permutation variant (S‑CRT) for the QQ plot:
resampling_kwargs = dict(
n_bins=20,
stratify_by_batch=True,
batch_key="batch",
min_stratum_size=2,
burden_key="log1p_non_ntc_guides_per_cell",
n_burden_bins=8,
burden_bin_method="quantile",
burden_clip_quantiles=(0.0, 1.0),
)
out = run_all_genes_union_crt(
inputs=inputs,
B=1023,
n_jobs=16,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
calibrate_skew_normal=True,
return_raw_pvals=True,
return_skew_normal=True,
)
ntc_group_pvals_ens = crt_pvals_for_ntc_groups_ensemble(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
# If you ran the main CRT with test_stat="utest", pass the same here.
# test_stat="utest",
# test_stat_kwargs={"use": "clr", "rank_method": "average"},
)
ntc_group_pvals_skew_ens = crt_pvals_for_ntc_groups_ensemble_skew(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
seed0=23,
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
)
null_pvals = compute_ntc_group_null_pvals_parallel(
inputs=inputs,
ntc_groups_ens=ntc_groups_ens,
B=1023,
n_jobs=8,
backend="threading",
resampling_method="stratified_perm",
resampling_kwargs=resampling_kwargs,
# Keep test_stat/test_stat_kwargs in sync with your main run if needed.
# test_stat="utest",
# test_stat_kwargs={"use": "clr", "rank_method": "average"},
)Burden helper (compute and store a recommended burden column):
from src.sceptre import add_burden_covariate
add_burden_covariate(
adata=adata,
guide_assignment_key="guide_assignment",
covar_key="covar",
guide_names_key="guide_names",
guide2gene_key="guide2gene",
burden_key="log1p_non_ntc_guides_per_cell",
ntc_labels=["non-targeting", "safe-targeting", "NTC"],
include_ntc=False,
count_nonzero=True,
use_log1p=True,
)Use the synthetic generator for large‑scale diagnostics:
from tests.synthetic_data import make_sceptre_style_synth
from src.sceptre import prepare_crt_inputs
adata = make_sceptre_style_synth(
N=10000,
K=20,
n_target_genes=200,
guides_per_gene=6,
ntc_frac_guides=0.15,
moi_mean=5.5,
frac_causal_genes=0.10,
n_effect_programs=3,
effect_size=0.6,
confound_strength=0.0,
seed=0,
)
# The generator stores usage in adata.obsm["usage"].
# Pass usage_key="usage" when preparing inputs.
inputs = prepare_crt_inputs(adata=adata, usage_key="usage")Note on scale: this generator is designed for large synthetic cohorts (e.g., tens of
thousands of cells). For quick smoke tests, reduce N, K, and n_target_genes
to keep runtime and memory low.
This project uses a Makefile to streamline common tasks.
make requirements: Install Python dependencies.make data: Run the data processing pipeline.make lint: Lint the source code usingflake8.make clean: Remove compiled Python files.
Run the mock in-memory smoke test:
python3 scripts/test_mock_crt.pyRun the full test suite (excluding performance):
python3 -m pytest -q -m "not performance"Run performance tests explicitly:
python3 -m pytest -q -m performance├── Makefile # Makefile with useful commands
├── README.md # Project README
├── requirements.txt # Python package requirements
├── setup.py # Makes the project pip-installable
├── scripts # Utility scripts
│ └── test_mock_crt.py
├── src # Project source code
│ ├── data # Scripts to download or generate data
│ ├── features # Scripts to generate features
│ ├── models # Scripts for model training and prediction
│ ├── sceptre # Core CRT analysis pipeline
│ │ ├── __init__.py
│ │ ├── adata_utils.py
│ │ ├── crt.py
│ │ ├── pipeline.py
│ │ ├── pipeline_helpers.py
│ │ ├── propensity.py
│ │ ├── shared_low_level_functions.cpp
│ │ └── skew_normal.py
│ └── visualization # Scripts for plotting and visualization
│ │ ├── __init__.py
│ │ └── qq_plot.py
├── data
│ ├── external # Data from third-party sources
│ ├── interim # Intermediate transformed data
│ ├── processed # Final, canonical datasets
│ └── raw # Original, immutable data
├── docs # Project documentation
├── models # Trained models
├── notebooks # Jupyter notebooks for exploration
└── reports # Generated analysis reports and figures