CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Runnable is a Python-based pipeline orchestration framework for data science and machine learning workflows. It provides a plugin-based architecture for creating reproducible, configurable pipelines that can execute Python functions, Jupyter notebooks, and shell scripts across different environments.

When trying to match examples to documentation, always use the python based examples. DO NOT use yaml examples unless I ask you to do so.

When running Python code in this repository, ALWAYS use uv run instead of python directly. This ensures proper dependency management and virtual environment isolation.

ANY PLANS related to this project should be stored in the current directory only, not anywhere else.

Development Commands

Environment Setup

# Install with development dependencies and docs
uv sync --all-extras --dev --group docs

# Install pre-commit hooks
pre-commit install

Testing

# Run all tests
pytest

# Run specific test file
pytest tests/runnable/test_sdk.py

# Run tests matching pattern
pytest -k "test_function_name"

Code Quality

# Format code (pre-commit will run this automatically)
ruff format .

# Run linting
ruff check . --fix

# Pre-commit
Ensure that the checks in pre-commit config are respected

# Type checking
mypy runnable/ extensions/

CLI Usage

# Execute a pipeline from YAML definition
runnable execute pipeline.yaml --config config.yaml --parameters params.yaml

# Get help for CLI commands
runnable --help
runnable execute --help

Architecture Overview

Core Components

Pipeline Definition: Declarative YAML or programmatic Python API for defining workflows
Task Types: Support for Python functions, Jupyter notebooks, shell scripts, and PyTorch models
Node Types: Linear, parallel, map (iterative), conditional, and nested pipeline execution
Plugin System: Extensible architecture with entry points for executors, catalogs, secrets, and storage

Workspace Structure

This is a UV workspace with multiple extensions:

runnable/: Core framework code
extensions/: Plugin implementations
- catalog/: Data storage backends (file system, S3, Minio)
- job_executor/: Job execution backends (local, Kubernetes)
- nodes/: Node type implementations
- pipeline_executor/: Pipeline execution backends (local, container, Argo)
- run_log_store/: Execution metadata storage
- secrets/: Secret management backends
visualization/: Pipeline visualization tools

Key Design Patterns

Plugin Architecture: Uses Python entry points for extensibility
Separation of Concerns: Domain code remains independent of orchestration
Reproducibility: Built-in metadata tracking and execution history
Environment Agnostic: Same pipeline can run locally, in containers, or on Kubernetes

Common Development Workflows

Adding New Task Types

Create task implementation in runnable/tasks.py
Register via entry point in pyproject.toml under [project.entry-points.'tasks']
Add tests in tests/runnable/test_tasks.py

Adding New Executors

Implement executor in appropriate extensions/ subdirectory
Register via entry point in main pyproject.toml
Add configuration examples in examples/configs/
Add integration tests

Running yaml Examples, a legacy which might go away.

# Execute example pipeline
runnable execute examples/01-tasks/python_tasks.yaml

# Execute with custom config
runnable execute examples/01-tasks/python_tasks.yaml --config examples/configs/local-container.yaml

# Execute with parameters
runnable execute examples/03-parameters/passing_parameters_python.yaml --parameters examples/03-parameters/parameters.yaml

Running python based examples

uv run examples/01-tasks/python_tasks.py

Extension Development

Each extension is a separate package in the workspace:

Has its own pyproject.toml with dependencies
Registers plugins via entry points in the main pyproject.toml
Can be developed and tested independently
Uses workspace dependencies for shared code

Pipeline Definition Patterns

Python API

from runnable import Pipeline, PythonTask, pickled

task = PythonTask(
    function=my_function,
    name="process_data",
    returns=[pickled("result")]
)

pipeline = Pipeline(steps=[task])
pipeline.execute()

YAML Definition which is legacy and might go away

dag:
  start_at: process_data

  process_data:
    type: task
    task_type: python
    command: my_module.my_function
    returns:
      - name: result
        type: pickled

Release Process

Uses semantic-release for automated versioning
Conventional commit format required
GitHub Actions handle CI/CD
Supports alpha releases from alpha-* branches
Main release branch is main
Never EVER put a ! in the prefix of commit as it triggers a major release.

Testing Strategy

Unit tests for core functionality in tests/runnable/
Extension tests in tests/extensions/
Integration tests for pipeline examples in tests/test_pipeline_examples.py
Mock executor for testing without external dependencies
Assertion helpers in tests/assertions.py

Documentation

The docs are based on mkdocs and the base configuration file is in mkdocs.yaml while the content is present in docs folder.

The docs use a lot of code snippets based on examples and you should keep that pattern where ever possible. The docs explain the contextual example first and then show a detailed working example from the examples folder

When writing docs always use code from examples directory and always use code snippets to avoid duplication

Remember that when writing lists in md, there should be an empty line between the list and the preceding line. This applies to all lists, including those following headings, text, or other elements

I prefer to give prompts in a visual editor and I have my prompts in a file called prompt.md. I will refer to the particular section that I want to talk about and use that as my interactions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Development Commands

Environment Setup

Testing

Code Quality

CLI Usage

Architecture Overview

Core Components

Workspace Structure

Key Design Patterns

Common Development Workflows

Adding New Task Types

Adding New Executors

Running yaml Examples, a legacy which might go away.

Running python based examples

Extension Development

Pipeline Definition Patterns

Python API

YAML Definition which is legacy and might go away

Release Process

Testing Strategy

Documentation

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Development Commands

Environment Setup

Testing

Code Quality

CLI Usage

Architecture Overview

Core Components

Workspace Structure

Key Design Patterns

Common Development Workflows

Adding New Task Types

Adding New Executors

Running yaml Examples, a legacy which might go away.

Running python based examples

Extension Development

Pipeline Definition Patterns

Python API

YAML Definition which is legacy and might go away

Release Process

Testing Strategy

Documentation