The Genotypes and Phenotypes in Families (GPF) system manages large databases of genetic variants and phenotypic measurements from family collections.
User documentation: see the GPF documentation at https://iossifovlab.com/gpfuserdocs/.
- Core library: DAE (Data Access Environment) in
dae/ - Web application and REST API: WDAE in
wdae/ - Optional genotype storages:
impala_storage/,impala2_storage/,gcp_storage/ - Federation and REST client:
federation/,rest_client/ - Documentation sources:
docs/
Primary stack: Python 3.12, Django 5.2, dask, pandas, pyarrow, duckdb, pysam, pytest, mypy, ruff.
Release notes live in docs/changes.rst.
We recommend using a Conda/Mamba environment. All development tools (pytest, ruff, mypy) are installed via Conda, not system pip.
From the repository root:
mamba env create --name gpf --file ./environment.yml
mamba env update --name gpf --file ./dev-environment.yml
conda activate gpfNotes:
- Prefer
environment.ymlover the legacyrequirements.txt. - Always activate the
gpfenvironment before running tools or tests.
Install GPF packages into the active gpf environment:
pip install -e dae
pip install -e wdaeTip: after changing package code, re-run the editable installs if imports fail.
Quick cycles (examples):
cd dae
pytest -v tests/small/test_file.py
pytest -v tests/small/module/Full suites (parallel):
cd dae
conda run -n gpf pytest -v -n 10 tests/
cd ../wdae
conda run -n gpf pytest -v -n 5 wdae/Test markers and configuration are defined in dae/pytest.ini (e.g.,
gs_impala, gs_gcp, gs_duckdb, grr_rw, grr_ro).
ruff check --fix .
mypy dae --exclude dae/docs/
mypy wdae --exclude wdae/docs/ --exclude wdae/conftest.py
# Optional extra checks
pylint dae/dae -f parseable --reports=no -j 4If you want to work with federation and rest_client modules, install
additional dependencies and then the packages:
mamba env update --name gpf --file ./federation/federation-environment.yml
pip install -e rest_client
pip install -e federationSome storages are not included in the default installation. Install their dependencies only if you plan to use or develop them.
mamba env update --name gpf --file ./impala_storage/impala-environment.yml
pip install -e impala_storagemamba env update --name gpf --file ./impala2_storage/impala2-environment.yml
pip install -e impala2_storagemamba env update --name gpf --file ./gcp_storage/gcp-environment.yml
pip install -e gcp_storageAuthenticate for the seqpipe-gcp-storage-testing project before running tests:
gcloud config list project
gcloud auth application-default loginRun GCP storage tests (from the gcp_storage/ directory):
pytest -v gcp_storage/tests/Run integration tests against the GCP genotype storage definition:
pytest -v ../dae/tests/ --gsf gcp_storage/tests/gcp_storage.yamlA git pre-commit hook for lint checking with Ruff is included. Install it from the repository root:
cp pre-commit .git/hooks/To bypass the pre-commit hook when committing:
git commit --no-verify- Always activate the
gpfConda environment before running commands:conda activate gpf. - Prefer
environment.ymloverrequirements.txt(legacy). - If imports fail after changes, re-run
pip install -e daeand/orpip install -e wdae. - Some tests may be flaky with high parallelism; reduce
-nor run without it.
This project is licensed under the MIT License. See LICENSE for details.