diff --git a/.gitignore b/.gitignore index 583b596..4dfc8f2 100644 --- a/.gitignore +++ b/.gitignore @@ -136,6 +136,13 @@ uv.lock # Quarto docs/_site/ - -# created by quartodoc -docs/api \ No newline at end of file +docs/.quarto/ +docs/**/*.quarto_ipynb* +docs/api/*.qmd +!docs/api/index.qmd +!docs/api/_metadata.yml +# Generated from notebooks/ by docs/scripts/generate_examples_from_notebooks.py +docs/examples/*.ipynb +docs/examples/*_files/ + +# created by quartodoc \ No newline at end of file diff --git a/Makefile b/Makefile index 460f4cf..2127f69 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ LIB = src/tsod -.PHONY: check build lint format test coverage docs clean +.PHONY: check build lint format test coverage docs examples clean check: lint test @@ -19,13 +19,15 @@ test: coverage: uv run pytest --cov-report html --cov=$(LIB) tests/ +examples: + $(MAKE) -C docs examples + docs: - cd docs && uv run quartodoc build - uv run quarto render docs + $(MAKE) -C docs build clean: rm -rf .pytest_cache rm -rf .mypy_cache rm -rf .coverage rm -rf dist - rm -rf docs/_build + $(MAKE) -C docs clean diff --git a/docs/.gitignore b/docs/.gitignore deleted file mode 100644 index ad29309..0000000 --- a/docs/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -/.quarto/ -**/*.quarto_ipynb diff --git a/docs/Makefile b/docs/Makefile index 7d837ac..04ea590 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -1,7 +1,7 @@ # Minimal makefile for Quarto documentation # -.PHONY: help api build preview clean +.PHONY: help api examples build preview clean help: @echo "Please use 'make ' where is one of:" @@ -13,11 +13,17 @@ help: api: uv run quartodoc build -build: api +examples: + uv run python scripts/generate_examples_from_notebooks.py + +build: api examples uv run quarto render -preview: api +preview: api examples uv run quarto preview clean: - rm -rf _site api objects.json + rm -rf _site .quarto objects.json + find api -name "*.qmd" ! -name "index.qmd" -delete + rm -f examples/*.ipynb + rm -rf examples/*_files diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 34bab29..735ff17 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -2,13 +2,15 @@ project: type: website website: - title: "tsod" + title: "" page-footer: "© 2025 DHI Group" repo-url: https://github.com/DHI/tsod repo-actions: [edit] repo-subdir: docs + page-navigation: true navbar: + logo: https://raw.githubusercontent.com/DHI/tsod/main/images/logo/tsod.png tools: - icon: github menu: @@ -17,13 +19,30 @@ website: - text: Report a Bug url: https://github.com/DHI/tsod/issues left: - - href: index.qmd - text: Home - - href: getting-started.qmd - text: Getting Started - - href: design.qmd - - href: api/index.qmd - text: API Reference + - text: Home + href: index.qmd + - text: User Guide + href: user-guide/getting-started.qmd + - text: Examples + href: examples/index.qmd + - text: API Reference + href: api/index.qmd + + sidebar: + - title: "User Guide" + style: docked + contents: + - user-guide/getting-started.qmd + - user-guide/design.qmd + - title: "Examples" + style: docked + contents: + - examples/index.qmd + # BEGIN_GENERATED_EXAMPLES — managed by docs/scripts/generate_examples_from_notebooks.py + - examples/Getting started.ipynb + - examples/Example Water Level.ipynb + - examples/Detect on DataFrames.ipynb + # END_GENERATED_EXAMPLES filters: - interlinks @@ -63,6 +82,7 @@ quartodoc: format: html: theme: cosmo + css: custom.css toc: true - ipynb: - toc: true + # ipynb: + # toc: true diff --git a/docs/api/index.qmd b/docs/api/index.qmd new file mode 100644 index 0000000..0857ce4 --- /dev/null +++ b/docs/api/index.qmd @@ -0,0 +1,17 @@ +# API Reference {.doc .doc-index} + +## tsod + + + +| | | +| --- | --- | +| [RangeDetector](RangeDetector.qmd#tsod.RangeDetector) | Detect values outside range. | +| [ConstantValueDetector](ConstantValueDetector.qmd#tsod.ConstantValueDetector) | Detect contiguous periods of constant values within a configurable time window. | +| [ConstantGradientDetector](ConstantGradientDetector.qmd#tsod.ConstantGradientDetector) | Detect constant gradients. | +| [GradientDetector](GradientDetector.qmd#tsod.GradientDetector) | Detect abrupt changes in time series data. | +| [DiffDetector](DiffDetector.qmd#tsod.DiffDetector) | Detect sudden shifts in data, irrespective of time axis. | +| [RollingStandardDeviationDetector](RollingStandardDeviationDetector.qmd#tsod.RollingStandardDeviationDetector) | Detect large variations. | +| [CombinedDetector](CombinedDetector.qmd#tsod.CombinedDetector) | Combine detectors. | +| [HampelDetector](HampelDetector.qmd#tsod.HampelDetector) | Hampel filter implementation that works on numpy arrays, implemented with numba. | +| [load](load.qmd#tsod.load) | Load a saved model from disk saved with `Detector.save` | \ No newline at end of file diff --git a/docs/custom.css b/docs/custom.css new file mode 100644 index 0000000..4470ead --- /dev/null +++ b/docs/custom.css @@ -0,0 +1,7 @@ +#quarto-content.page-layout-full main.content.column-body > #title-block-header + p { + display: none; +} + +#quarto-content.page-layout-full main.content.column-body { + max-width: min(1600px, calc(100vw - 4rem)); +} \ No newline at end of file diff --git a/docs/examples/index.qmd b/docs/examples/index.qmd new file mode 100644 index 0000000..76c938c --- /dev/null +++ b/docs/examples/index.qmd @@ -0,0 +1,17 @@ +--- +title: Examples +page-layout: full +toc: false +--- + +# Examples + +This page is auto-generated from notebooks in `notebooks/`. + +## Available notebook examples + +- [Getting started](Getting%20started.ipynb) +- [Example Water Level](Example%20Water%20Level.ipynb) +- [Detect on DataFrames](Detect%20on%20DataFrames.ipynb) + +Regenerate with `make examples`. diff --git a/docs/index.qmd b/docs/index.qmd index 5577cf2..9806e30 100644 --- a/docs/index.qmd +++ b/docs/index.qmd @@ -20,12 +20,13 @@ format-links: false Install **tsod** with [`pip`](https://pypi.org/project/tsod/) and get up and running in minutes - +[**Getting started**](user-guide/getting-started.qmd) ## {{< fa brands python >}} **It's just Python** Use familiar Python workflows to integrate anomaly detection into your models and pipelines +[**API Reference**](api/index.qmd) ::: @@ -40,6 +41,7 @@ Choose from detectors like `RangeDetector` and `ConstantValueDetector` to identi **tsod** is licensed under MIT and the source code is available on [GitHub](https://github.com/DHI/tsod) +[**Design philosophy**](user-guide/design.qmd) ::: diff --git a/docs/scripts/generate_examples_from_notebooks.py b/docs/scripts/generate_examples_from_notebooks.py new file mode 100644 index 0000000..11c9591 --- /dev/null +++ b/docs/scripts/generate_examples_from_notebooks.py @@ -0,0 +1,174 @@ +import json +import re +from pathlib import Path +from urllib.parse import quote + +REPO_ROOT = Path(__file__).resolve().parents[2] +NOTEBOOKS_DIR = REPO_ROOT / "notebooks" +EXAMPLES_DIR = REPO_ROOT / "docs" / "examples" +QUARTO_YML = REPO_ROOT / "docs" / "_quarto.yml" + +_SIDEBAR_BEGIN = " # BEGIN_GENERATED_EXAMPLES — managed by docs/scripts/generate_examples_from_notebooks.py" +_SIDEBAR_END = " # END_GENERATED_EXAMPLES" +_EXAMPLE_ORDER = { + "Getting started.ipynb": 0, + "Example Water Level.ipynb": 1, + "Detect on DataFrames.ipynb": 2, +} + + +def sort_entries(entries: list[tuple[str, str]]) -> list[tuple[str, str]]: + """Keep generated example pages in a stable, user-defined order.""" + return sorted( + entries, + key=lambda item: (_EXAMPLE_ORDER.get(item[1], len(_EXAMPLE_ORDER)), item[0].lower()), + ) + + +def title_from_notebook(notebook: dict, fallback: str) -> str: + metadata_title = notebook.get("metadata", {}).get("title") + if isinstance(metadata_title, str) and metadata_title.strip(): + return metadata_title.strip() + return fallback + + +def rewrite_notebook_relative_paths(source: str) -> str: + """Rewrite paths that are valid in notebooks/ to paths valid in docs/examples/.""" + return source.replace("../tests/", "../../tests/") + + +def rewrite_cell_source_paths(notebook: dict) -> None: + """Rewrite relative paths in markdown and code cell sources.""" + for cell in notebook.get("cells", []): + source = cell.get("source") + if isinstance(source, str): + cell["source"] = rewrite_notebook_relative_paths(source) + continue + if isinstance(source, list): + cell["source"] = [rewrite_notebook_relative_paths(line) for line in source] + + +def notebook_front_matter_source(title: str, notebook_name: str) -> str: + lines = [ + "---", + f"title: {title}", + f"description: Auto-generated from notebooks/{notebook_name}", + "jupyter: tsod", + "page-layout: full", + "---", + "", + "", + ] + return "\n".join(lines) + "\n" + + +def apply_front_matter_cell(notebook: dict, title: str, notebook_name: str) -> None: + source = notebook_front_matter_source(title=title, notebook_name=notebook_name) + front_matter_cell = { + "cell_type": "markdown", + "metadata": {"language": "markdown", "tags": ["remove-cell"]}, + "source": source, + } + + cells = notebook.setdefault("cells", []) + if not cells: + cells.append(front_matter_cell) + return + + first_cell = cells[0] + first_source = first_cell.get("source") + lines = first_source if isinstance(first_source, list) else [str(first_source or "")] + first_line = lines[0].strip() if lines else "" + + if first_cell.get("cell_type") == "markdown" and first_line == "---": + first_cell["source"] = source + return + + cells.insert(0, front_matter_cell) + + +def copy_notebook_to_examples(notebook_path: Path) -> tuple[str, str]: + notebook = json.loads(notebook_path.read_text(encoding="utf-8")) + stem = notebook_path.stem + title = title_from_notebook(notebook, fallback=stem) + ipynb_path = EXAMPLES_DIR / notebook_path.name + + rewrite_cell_source_paths(notebook) + apply_front_matter_cell(notebook, title=title, notebook_name=notebook_path.name) + + ipynb_path.write_text(json.dumps(notebook, indent=2, ensure_ascii=False) + "\n", encoding="utf-8") + return title, ipynb_path.name + + +def write_index(entries: list[tuple[str, str]]) -> None: + index_lines = [ + "---", + "title: Examples", + "page-layout: full", + "toc: false", + "---", + "", + "# Examples", + "", + "This page is auto-generated from notebooks in `notebooks/`.", + "", + "## Available notebook examples", + "", + ] + + for title, rel_path in sort_entries(entries): + encoded_path = quote(rel_path, safe="/") + index_lines.append(f"- [{title}]({encoded_path})") + + index_lines.append("") + index_lines.append("Regenerate with `make examples`.") + + (EXAMPLES_DIR / "index.qmd").write_text("\n".join(index_lines) + "\n", encoding="utf-8") + + +def update_quarto_sidebar(entries: list[tuple[str, str]]) -> None: + """Keep the Examples sidebar in _quarto.yml in sync with generated notebooks.""" + content = QUARTO_YML.read_text(encoding="utf-8") + + lines = [_SIDEBAR_BEGIN] + for _title, ipynb_name in sort_entries(entries): + lines.append(f" - examples/{ipynb_name}") + lines.append(_SIDEBAR_END) + new_block = "\n".join(lines) + + updated = re.sub( + re.escape(_SIDEBAR_BEGIN) + r".*?" + re.escape(_SIDEBAR_END), + new_block, + content, + flags=re.DOTALL, + ) + QUARTO_YML.write_text(updated, encoding="utf-8") + + +def main() -> None: + EXAMPLES_DIR.mkdir(parents=True, exist_ok=True) + + # Remove previously generated files to avoid stale pages. + for ipynb_file in EXAMPLES_DIR.glob("*.ipynb"): + ipynb_file.unlink() + + for qmd_file in EXAMPLES_DIR.glob("*.qmd"): + if qmd_file.name != "index.qmd": + qmd_file.unlink() + + for quarto_ipynb in EXAMPLES_DIR.glob("*.quarto_ipynb"): + quarto_ipynb.unlink() + + notebook_files = sorted(NOTEBOOKS_DIR.glob("*.ipynb")) + entries: list[tuple[str, str]] = [] + + for notebook_path in notebook_files: + title, ipynb_name = copy_notebook_to_examples(notebook_path) + entries.append((title, ipynb_name)) + + write_index(entries) + update_quarto_sidebar(entries) + + +if __name__ == "__main__": + main() diff --git a/docs/design.qmd b/docs/user-guide/design.qmd similarity index 97% rename from docs/design.qmd rename to docs/user-guide/design.qmd index ce25eb4..411108c 100644 --- a/docs/design.qmd +++ b/docs/user-guide/design.qmd @@ -1,41 +1,41 @@ -# Design philosophy - - -## {{< fa brands python >}} Familiar - -tsod aims to use a syntax familiar to users of scientific computing libraries such as Pandas & sckit-learn. - -## {{< fa download >}} Easy to install - -```bash -$ pip install tsod -``` - - -## {{< fa brands osi >}} Open Source​ -tsod is an open source project licensed under the MIT license. -The software is provided free of charge with the source code available for inspection and modification. - -Contributions are welcome! - -## {{< fa comments >}} Easy to collaborate -By developing tsod on GitHub along with a completely open discussion, we believe that the collaboration between developers and end-users results in a useful library. - -## {{< fa list-ol >}} Reproducible -By providing the historical versions of tsod on PyPI it is possible to reproduce the behaviour of an older existing system, based on an older version. - -**Install specific version** - -```bash -pip install tsod==0.2.0 -``` - -## {{< fa brands github >}} Easy access to new features -Features are being added all the time, by developers at DHI in offices all around the globe as well as external contributors using tsod in their work. -These new features are always available from the [main branch on GitHub](https://github.com/DHI/tsod) and thanks to automated testing, it is always possible to verify that the tests passes before downloading a new development version. - -**Install development version** - -```bash -$ pip install https://github.com/DHI/tsod/archive/main.zip +# Design philosophy + + +## {{< fa brands python >}} Familiar + +tsod aims to use a syntax familiar to users of scientific computing libraries such as Pandas & sckit-learn. + +## {{< fa download >}} Easy to install + +```bash +$ pip install tsod +``` + + +## {{< fa brands osi >}} Open Source​ +tsod is an open source project licensed under the MIT license. +The software is provided free of charge with the source code available for inspection and modification. + +Contributions are welcome! + +## {{< fa comments >}} Easy to collaborate +By developing tsod on GitHub along with a completely open discussion, we believe that the collaboration between developers and end-users results in a useful library. + +## {{< fa list-ol >}} Reproducible +By providing the historical versions of tsod on PyPI it is possible to reproduce the behaviour of an older existing system, based on an older version. + +**Install specific version** + +```bash +pip install tsod==0.2.0 +``` + +## {{< fa brands github >}} Easy access to new features +Features are being added all the time, by developers at DHI in offices all around the globe as well as external contributors using tsod in their work. +These new features are always available from the [main branch on GitHub](https://github.com/DHI/tsod) and thanks to automated testing, it is always possible to verify that the tests passes before downloading a new development version. + +**Install development version** + +```bash +$ pip install https://github.com/DHI/tsod/archive/main.zip ``` \ No newline at end of file diff --git a/docs/getting-started.qmd b/docs/user-guide/getting-started.qmd similarity index 95% rename from docs/getting-started.qmd rename to docs/user-guide/getting-started.qmd index 5c2a9c0..5273396 100644 --- a/docs/getting-started.qmd +++ b/docs/user-guide/getting-started.qmd @@ -1,53 +1,57 @@ -Getting started -=============== - -![](https://raw.githubusercontent.com/DHI/tsod/main/images/anomaly.png) - -Sensors often provide faulty or missing observations. These anomalies must be detected automatically and replaced with more feasible values before feeding the data to numerical simulation engines as boundary conditions or real time decision systems. - -This package aims to provide examples and algorithms for detecting anomalies in time series data specifically tailored to DHI users and the water domain. It is simple to install and deploy operationally and is accessible to everyone (open-source). - -`tsod` is library for timeseries data. The format of a timeseries is always a [](`pandas.Series`) and in some cases with a [](`pandas.DatetimeIndex`) - -1. Get data in the form of a a [](`pandas.Series`) (see Data formats below) -2. Select one or more detectors e.g. [](`~tsod.RangeDetector`) or [](`~tsod.ConstantValueDetector`) -3. Define parameters (e.g. min/max, max rate of change) or... -4. Fit parameters based on normal data, i.e. without outliers -5. Detect outliers in any dataset - -Example -------- - -```{python} -import pandas as pd -from tsod import RangeDetector -rd = RangeDetector(max_value=2.0) -data = pd.Series([0.0, 1.0, 3.0]) # 3.0 is out of range i.e. an anomaly -anom = rd.detect(data) -anom -``` - -```{python} -data[anom] # get anomalous data -``` - -```{python} -data[~anom] # get normal data -``` - - -Saving and loading ------------------- -Save a configured detector -```python -cd = CombinedDetector([ConstantValueDetector(), RangeDetector()]) -cd.fit(normal_data) -cd.save("detector.joblib") -``` - -... and then later load it from disk -```python -my_detector = tsod.load("detector.joblib") -my_detector.detect(some_data) -``` +--- +title: Getting started +execute: + enabled: false +--- + + +![](https://raw.githubusercontent.com/DHI/tsod/main/images/anomaly.png) + +Sensors often provide faulty or missing observations. These anomalies must be detected automatically and replaced with more feasible values before feeding the data to numerical simulation engines as boundary conditions or real time decision systems. + +This package aims to provide examples and algorithms for detecting anomalies in time series data specifically tailored to DHI users and the water domain. It is simple to install and deploy operationally and is accessible to everyone (open-source). + +`tsod` is library for timeseries data. The format of a timeseries is always a [](`pandas.Series`) and in some cases with a [](`pandas.DatetimeIndex`) + +1. Get data in the form of a a [](`pandas.Series`) (see Data formats below) +2. Select one or more detectors e.g. [](`~tsod.RangeDetector`) or [](`~tsod.ConstantValueDetector`) +3. Define parameters (e.g. min/max, max rate of change) or... +4. Fit parameters based on normal data, i.e. without outliers +5. Detect outliers in any dataset + +Example +------- + +```{python} +import pandas as pd +from tsod import RangeDetector +rd = RangeDetector(max_value=2.0) +data = pd.Series([0.0, 1.0, 3.0]) # 3.0 is out of range i.e. an anomaly +anom = rd.detect(data) +anom +``` + +```{python} +data[anom] # get anomalous data +``` + +```{python} +data[~anom] # get normal data +``` + + +Saving and loading +------------------ +Save a configured detector +```python +cd = CombinedDetector([ConstantValueDetector(), RangeDetector()]) +cd.fit(normal_data) +cd.save("detector.joblib") +``` + +... and then later load it from disk +```python +my_detector = tsod.load("detector.joblib") +my_detector.detect(some_data) +``` \ No newline at end of file