Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 89 additions & 54 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,14 @@ keywords: HED tutorial, Python guide, validation examples, BIDS datasets,
```{index} user guide, tutorial, getting started, HED, Hierarchical Event Descriptors
```

HED (Hierarchical Event Descriptors) is a framework for systematically describing events and experimental metadata in machine-actionable form. This guide provides comprehensive documentation for using the HED Python tools for validation, BIDS integration, and analysis:
HED (Hierarchical Event Descriptors) is a framework for systematically describing events and experimental metadata in machine-actionable form. This guide provides comprehensive documentation for using the HED Python tools for validation, BIDS integration, and analysis.

01. [What is HED?](#what-is-hed)
02. [Getting started](#getting-started)
03. [Working with HED schemas](#working-with-hed-schemas)
04. [Validating HED strings](#validating-hed-strings)
05. [Working with BIDS datasets](#working-with-bids-datasets)
06. [Working with sidecars](#working-with-sidecars)
07. [Jupyter notebooks](#jupyter-notebooks)
08. [Command-line tools](#command-line-tools)
09. [Best practices](#best-practices)
10. [Troubleshooting](#troubleshooting)

## Getting started
## Installation

```{index} installation, pip, PyPI
```

### Installation

#### From PyPI (recommended)
### From PyPI (recommended)

Install the latest stable release:

Expand All @@ -43,7 +30,7 @@ pip install hedtools

**Note**: The PyPI package includes the core hedtools library but **not the example Jupyter notebooks**. To access the notebooks, see the options below.

#### For Jupyter notebook examples
### For Jupyter notebook examples

The example notebooks are only available in the GitHub repository. Choose one of these options:

Expand All @@ -66,7 +53,7 @@ pip install hedtools jupyter notebook

See [examples/README.md](https://github.com/hed-standard/hed-python/tree/main/examples) for detailed notebook documentation.

#### From GitHub (latest development version)
### From GitHub (latest development version)

```bash
pip install git+https://github.com/hed-standard/hed-python/@main
Expand All @@ -82,13 +69,15 @@ cd hed-python
pip install -e .
```

#### Python requirements
### Python requirements

- **Python 3.10 or later** is required
- Core dependencies: pandas, numpy, defusedxml, openpyxl
- Jupyter support: Install with `pip install jupyter notebook`

### Basic example
## Basic usage

### A starter example

Here's a simple example to get you started with HED validation:

Expand All @@ -110,12 +99,12 @@ else:
print("✓ HED string is valid!")
```

## Working with HED schemas
### Loading schemas

```{index} schema; loading, schema; validation, load_schema, load_schema_version
```

### Loading schemas
A HED schema specifies an allowed HED vocabulary. The official HED schemas are hosted in the [hed-standard/hed-schemas](https://github.com/hed-standard/hed-schemas) GitHub repository. Most HEDTools operations require that you specify which versions of the HED schemas that you are using. You may also specify a file with your own schema for testing and development. The HEDTools assume that the HED schema that you are using has been validated.

```python
from hed import load_schema, load_schema_version
Expand Down Expand Up @@ -153,13 +142,11 @@ schema = load_schema_version(["score_2.1.0", "lang_1.1.0"])

Note: It is now standard for a library schema to be partnered with a standard schema. In general, you should not use an earlier, non-partnered versions of a library schema.

## Validating HED strings
### Validating HED strings

```{index} validation; HED strings, HedString class, validate method
```

### Basic validation

The HED string must be created with a schema, and validation is performed on the string object:

```python
Expand All @@ -172,34 +159,11 @@ issues = hed_string.validate()

```

### Batch validation

```python
from hed import HedString, load_schema_version, get_printable_issue_string


schema = load_schema_version("8.4.0")

hed_strings = [
"Sensory-event, Visual-presentation",
"Invalid-tag, Another-invalid",
"(Red, Square)"
]
issues = []
for i, hed_str in enumerate(hed_strings, 1):
hed_string = HedString(hed_str, schema)
issues += hed_string.validate()
if issues:
print(get_printable_issue_string(issues))
```

## Working with BIDS datasets
### Validating a BIDS dataset

```{index} BIDS; dataset validation, BidsDataset class, dataset_description.json
```

### Dataset-level validation

```python
from hed.tools import BidsDataset

Expand All @@ -221,7 +185,7 @@ issues = dataset.validate(check_for_warnings=True)

Since a BIDS dataset includes the HED version in its `dataset_description.json`, a HED version is not necessary for validation. The `BidsDataset` only holds information about the relevant `.tsv` and `.json` files, not the imaging data. The constructor has a number of parameters that restrict which of these files are considered. The relevant JSON files are all read in, but the `.tsv` content is only loaded when needed.

### Working with individual event files
### Validating a tabular file

```{index} TabularInput, event files, tsv files
```
Expand All @@ -246,12 +210,14 @@ issues = tabular.validate(schema)
def_dict = tabular.get_def_dict(schema)
```

## Working with sidecars
### Using sidecars

```{index} Sidecar class, JSON sidecar, sidecar validation
```

### Loading and validating sidecars
Sidecars are JSON files that are a BIDS mechanism for containing metadata. NWB has an equivalent mechanism.

#### Validating a sidecar

```python
from hed import Sidecar, load_schema_version
Expand All @@ -266,7 +232,7 @@ sidecar = Sidecar("task-rest_events.json")
issues = sidecar.validate(schema)
```

### Extracting definitions
#### Extracting definitions

```{index} definitions, DefinitionDict, get_def_dict
```
Expand All @@ -282,7 +248,7 @@ sidecar = Sidecar("task-rest_events.json")
def_dict = sidecar.get_def_dict(schema)
```

### Saving sidecars
#### Saving sidecars

```python
from hed import Sidecar
Expand All @@ -296,6 +262,75 @@ sidecar.save_as_json("output_sidecar.json")
json_string = sidecar.get_as_json_string()
```

### Schema validation

```{index} schema validation, check_compliance, SchemaValidator, ComplianceSummary
```

HED schemas can be validated for compliance — checking that attribute domains, ranges, and semantic rules are all satisfied. There are three levels of access depending on how much control you need.

#### Quick check via HedSchema

The simplest approach calls `check_compliance()` directly on a loaded schema. It returns a list of issue dictionaries in the standard HED issue format, so you can use `get_printable_issue_string` just like any other HED validation result:

```python
from hed import load_schema_version
from hed.errors import get_printable_issue_string

schema = load_schema_version("8.4.0")
issues = schema.check_compliance()
if issues:
print(get_printable_issue_string(issues, title="Schema compliance issues"))
else:
print("Schema is compliant")
```

Pass `check_for_warnings=False` to suppress formatting warnings and report only errors:

```python
issues = schema.check_compliance(check_for_warnings=False)
```

#### Getting a structured summary

The returned list carries a `compliance_summary` attribute (`ComplianceSummary`) that provides a human-readable report of what was checked:

```python
issues = schema.check_compliance()
print(issues.compliance_summary.get_summary())
```

The summary shows each of the five checks (prerelease version, prologue/epilogue, invalid characters, attributes, and duplicate names) with pass/fail status, entry counts, and sub-check details.

#### Using SchemaValidator directly

For fine-grained control you can instantiate `SchemaValidator` and run individual checks:

```python
from hed.errors.error_reporter import ErrorHandler
from hed.schema.schema_validation.compliance import SchemaValidator

schema = load_schema_version("8.4.0")
error_handler = ErrorHandler(check_for_warnings=True)
sv = SchemaValidator(schema, error_handler)

# Run only the checks you need
issues = sv.check_attributes()
issues += sv.check_invalid_characters()
```

The five available checks are:

| Method | What it validates |
| ------------------------------- | -------------------------------------------------------- |
| `check_if_prerelease_version()` | Warns if the version is newer than all known releases |
| `check_prologue_epilogue()` | Validates characters in prologue and epilogue text |
| `check_invalid_characters()` | Validates entry names and descriptions for illegal chars |
| `check_attributes()` | Domain, range, and semantic validation of all attributes |
| `check_duplicate_names()` | Detects duplicate entry names within or across libraries |

Each method returns a list of issue dictionaries and updates `sv.summary` (a `ComplianceSummary` instance) with what was checked.

## Jupyter notebooks

```{index} Jupyter notebooks, examples, workflows
Expand Down
3 changes: 0 additions & 3 deletions hed/schema/schema_validation_util.py

This file was deleted.

2 changes: 1 addition & 1 deletion hed/scripts/add_hed_ids.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from hed.scripts.hed_script_util import get_prerelease_path
from hed.scripts.schema_script_util import get_prerelease_path
from hed.scripts.hed_convert_schema import convert_and_update
import argparse
from hed.schema.schema_io.df_util import convert_filenames_to_dict
Expand Down
2 changes: 1 addition & 1 deletion hed/scripts/hed_convert_schema.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from hed.scripts.hed_script_util import sort_base_schemas, validate_all_schemas, add_extension
from hed.scripts.schema_script_util import sort_base_schemas, validate_all_schemas, add_extension
from hed.schema.schema_io import load_dataframes, save_dataframes
from hed.schema.schema_io.hed_id_util import update_dataframes_from_schema
from hed.schema.hed_schema_io import load_schema, from_dataframes
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion hed/scripts/validate_schemas.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import sys
from hed.scripts.hed_script_util import validate_all_schemas, sort_base_schemas
from hed.scripts.schema_script_util import validate_all_schemas, sort_base_schemas
from hed.errors import get_printable_issue_string
import argparse

Expand Down
2 changes: 0 additions & 2 deletions hed/validator/util/class_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,6 @@ def _get_problem_indices(self, stripped_value, class_name, start_index=0):
if indices:
indices = [(char, index + start_index) for index, char in indices]
return indices
# value_classes = original_tag.value_classes.values()
# allowed_characters = schema_validation_util.get_allowed_characters(original_tag.value_classes.values())

def _check_value_class(self, original_tag, stripped_value, report_as):
"""Return any issues found if this is a value tag,
Expand Down
2 changes: 1 addition & 1 deletion tests/scripts/test_hed_convert_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import os
from hed import load_schema, load_schema_version
from hed.schema import HedSectionKey, HedKey
from hed.scripts.hed_script_util import add_extension
from hed.scripts.schema_script_util import add_extension
from hed.scripts.hed_convert_schema import convert_and_update
import contextlib

Expand Down
2 changes: 1 addition & 1 deletion tests/scripts/test_script_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import shutil
from hed import load_schema_version
from hed.scripts.hed_script_util import (
from hed.scripts.schema_script_util import (
add_extension,
sort_base_schemas,
validate_all_schema_formats,
Expand Down