Skip to content

Commit 1614d7f

Browse files
committed
feat: add more stations + major refont
1 parent 2693b1d commit 1614d7f

458 files changed

Lines changed: 1748795 additions & 4595 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/run_tests.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
name: test
2+
3+
on:
4+
push:
5+
branches:
6+
- "master"
7+
pull_request:
8+
9+
jobs:
10+
test:
11+
name: "test"
12+
runs-on: "${{ matrix.os }}"
13+
strategy:
14+
fail-fast: false
15+
matrix:
16+
os: ["ubuntu-latest"]
17+
python: ["3.11"]
18+
defaults:
19+
run:
20+
shell: "bash -eo pipefail {0}"
21+
22+
steps:
23+
- uses: "actions/checkout@v3"
24+
- uses: "actions/setup-python@v3"
25+
with:
26+
python-version: "${{ matrix.python }}"
27+
- uses: "actions/cache@v3"
28+
id: "cache"
29+
with:
30+
path: "${{ env.pythonLocation }}"
31+
key: "test-${{ runner.os }}-${{ env.pythonLocation }}-${{ hashFiles('pyproject.toml', 'requirements/*') }}"
32+
- run: "python --version"
33+
- run: "python -mpip install -U pip"
34+
- run: "python -mpip --version"
35+
- run: "python -mpip install -r requirements/requirements-dev.txt"
36+
- run: "python -mpip install ./"
37+
- name: "Run tests"
38+
run: "make cov"

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ test:
2121

2222
cov:
2323
coverage erase
24-
python -WError -m pytest --cov=ioc_cleanup --cov-report term-missing --durations=10 --pdb
24+
python -WError -m pytest --cov=ioc_cleanup --cov-report term-missing --durations=10
2525

2626
deps:
2727
pre-commit run poetry-lock -a

README.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# IOC Cleanup
2+
3+
`ioc_cleanup` provides a reproducible, transparent, and traceable workflow for cleaning tide gauge (sea level) data from IOC (Intergovernmental Oceanographic Commission) stations worldwide.
4+
5+
![demo](./assets/presentation.png)
6+
7+
## Motivation & Concept
8+
9+
Cleaning tide gauge data is often:
10+
* manual,
11+
* poorly documented,
12+
* hard to reproduce,
13+
* and difficult to review or share.
14+
15+
This project proposes a community-driven, version-controlled approach to data cleaning, where all cleaning decisions are explicitly recorded and can be audited or improved over time.
16+
17+
**What this approach enables**
18+
19+
* Flagging timestamps or time ranges affected by:
20+
* bad or corrupt data
21+
* sensor breakpoints
22+
* singular phenomena (e.g. tsunamis, meteo-tsunamis, seiches, or unidentified events)
23+
* Fully reproducible cleaning
24+
* Transparent and traceable decisions stored in plain JSON
25+
* Peer review of cleaning decisions via GitHub
26+
* Easy extension to other datasets (e.g. GESLA, NDBC)
27+
* Gradual growth in station coverage through community contributions
28+
29+
## Repository Overview
30+
31+
This repository contains a set of Python routines to **clean IOC sea level data** using **declarative JSON transformations**.
32+
33+
### Core idea
34+
35+
The core asset of this repository is the set of **JSON files** located in `./transformations/`.
36+
37+
Each JSON file describes:
38+
39+
* the valid time window
40+
* dropped timestamps
41+
* dropped time ranges
42+
* breakpoints
43+
* metadata and notes
44+
45+
Together, these JSON files define the transformation from **raw data to clean signal**.
46+
47+
## Caveats and limitations
48+
Please be aware of the following:
49+
* ❌ This repository does NOT contain IOC data
50+
* Data download is not handled internally
51+
* Examples (in this `README` or in `tests`) use the [`searvey`](https://github.com/oceanmodeling/searvey) package
52+
* Step changes in data are currently only flagged via the `breakpoints` item in the JSOn
53+
* No offset correction is applied
54+
* Vertical datums are not addressed
55+
* Distinguishing noise (e.g. boat wakes) from real physical events can be difficult for noisy sensors
56+
* Cleaning decisions are inherently subjective
57+
* Different operators may disagree on what should be discarded
58+
59+
60+
## Getting Started
61+
### Prerequisites
62+
* Python 3.11 (recommended).
63+
* **~24GB** of free disk space for storing raw and processed data.
64+
65+
### Installation
66+
67+
```bash
68+
git clone https://github.com/seareport/ioc_cleanup.git
69+
pip install -r requirements.txt
70+
```
71+
72+
## Usage
73+
example with one station: `abed` (Aberdeen), sensor `bub`
74+
```python
75+
station = "abed"
76+
sensor = "bub"
77+
```
78+
79+
### Download Raw Data:
80+
81+
```python
82+
import searvey
83+
df_raw = searvey.fetch_ioc_station(station, "2020-01-01", "2026-01-01")
84+
```
85+
86+
### Apply Cleaning Transformation:
87+
88+
```python
89+
import ioc_cleanup as C
90+
91+
trans = C.load_transformation_from_path(
92+
"../transformations/maya_pwl.json"
93+
)
94+
df_clean = C.transform(df, trans)
95+
```
96+
97+
Example for `maya` station:
98+
99+
![example](./assets/maya_example.png)
100+
101+
## Transformation Files (JSON)
102+
All transformation logic lives in `./transformations/`.
103+
### Example JSON:
104+
```json
105+
{
106+
"ioc_code": "abed",
107+
"sensor": "bub",
108+
"notes": "",
109+
"skip": false,
110+
"wip": false,
111+
"start": "2020-01-01T00:00:00",
112+
"end": "2026-01-01T00:00:00",
113+
"high": null,
114+
"low": null,
115+
"dropped_date_ranges": [
116+
["2022-03-27 03:00:00", "2022-03-27 03:45:00"],
117+
["2023-03-26 03:00:00", "2023-03-26 03:45:00"]
118+
],
119+
"dropped_timestamps": [
120+
"2022-09-30T14:45:00",
121+
"2022-09-30T15:30:00",
122+
"2022-10-02T06:45:00",
123+
"2022-10-02T07:00:00",
124+
"2023-06-21T00:15:00",
125+
"2024-04-24T11:00:00",
126+
"2024-09-07 12:00:00"
127+
],
128+
"breakpoints": []
129+
}
130+
```
131+
#### Field descriptions
132+
133+
* `ioc_code` : IOC station code
134+
* `sensor` : sensor identifier
135+
* `notes` : free-text comments
136+
* `skip` : skip this station entirely
137+
* `wip` : mark transformation as work-in-progress
138+
* `start`, `end` : valid data window
139+
* `high`, `low` : optional value thresholds
140+
* `dropped_date_ranges` : continuous time ranges to remove
141+
* `dropped_timestamps` : individual timestamps to remove
142+
* `breakpoints` : timestamps where sensor behavior changes
143+
144+
## Downloading IOC Data in Bulk
145+
Shortcut functions are provided to download, load, and clean data.
146+
147+
### Example: download all IOC stations for 2025
148+
149+
150+
```python
151+
import ioc_cleanup as C
152+
ioc_all = C.get_meta()
153+
year = 2025
154+
for station in ioc_all.ioc_code.tolist():
155+
C.download_year_station(station, year, data_folder="../data")
156+
```
157+
This downloads station data as Parquet files into:
158+
```bash
159+
./data/2025
160+
```
161+
### Important: the architecture used for archiving the files is as follows:
162+
```
163+
./data/
164+
├── 2020
165+
├── 2021
166+
├── 2022
167+
├── 2023
168+
├── 2024
169+
└── 2025
170+
```
171+
to be able to scale up the number of years for the cleaning in the future
172+
173+
## Interactive Cleaning Dashboard
174+
175+
### Run the dashboard
176+
177+
```bash
178+
python -mpanel serve dashboard/cleanup_dashboard.py
179+
```
180+
181+
you will directed to this to:
182+
183+
![dashboard](./assets/dashboard_light.png)
184+
185+
#### How stations are discovered
186+
187+
* The station list is defined by files in `./transformations/`
188+
* To add a station, create a file following this convention:
189+
190+
```php-template
191+
./transformations/<ioc_code>_<sensor>.json
192+
```
193+
194+
### Error handling
195+
196+
Dark mode can be enabled using the toggle in the top-right corner.
197+
198+
![error](./assets/dashboard_error.png)
199+
200+
### Dark mode
201+
202+
You can activate dark mode by clicking on the top right switch
203+
204+
![error](./assets/dashboard_dark.png)
205+
206+
## Contributing
207+
208+
Contributions are very welcome!
209+
210+
### How to contribute
211+
212+
1. Fork the repository
213+
2. Add or update a JSON transformation file
214+
3. Use the dashboard to clean or flag data
215+
4. Submit a pull request with a clear description of your changes
216+
217+
### Areas for improvement
218+
219+
* Add more IOC stations
220+
* Extend the cleaned time range (currently 2020–2025)

assets/dashboard_dark.png

254 KB
Loading

assets/dashboard_error.png

62.6 KB
Loading

assets/dashboard_light.png

246 KB
Loading

assets/maya_example.png

199 KB
Loading

assets/presentation.png

1.45 MB
Loading

dashboard/cleanup_dashboard.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
from __future__ import annotations
2+
3+
import ioc_cleanup as C
4+
5+
6+
def main():
7+
return C.select_points()
8+
9+
10+
if __name__ == "__main__":
11+
main()

data

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/home/tomsail/work/gist/Best_ioc_stations_for_cleanup/data/

0 commit comments

Comments
 (0)