Skip to content

Commit f01e781

Browse files
doc: update readme (#52)
Co-authored-by: Ciaran Sweet <[email protected]>
1 parent a64f0ed commit f01e781

File tree

2 files changed

+41
-16
lines changed

2 files changed

+41
-16
lines changed

README.md

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -81,20 +81,20 @@ echo $HARBOR_PASSWORD | docker login w9mllyot.c1.de1.container-registry.ovh.net
8181

8282
On macOS, the linux architecture needs to be specified when building the image with the flag `--platform linux/amd64` :
8383
```bash
84-
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v0 --platform linux/amd64 .
84+
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging --platform linux/amd64 .
8585
```
8686

8787
on linux:
8888

8989
```bash
90-
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v0 .
90+
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging .
9191
```
9292

9393

9494

9595
- Push to container registry:
9696
```bash
97-
docker push w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v0
97+
docker push w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging
9898
```
9999

100100
- Once the new image is pushed, run the example [Notebook](operator-tools/submit_stac_items_notebook.ipynb) and verify that workflows are running in [Argo Workflows](https://argo-workflows.hub-eopf-explorer.eox.at/workflows/devseed-staging)
@@ -148,21 +148,27 @@ kubectl get wf -n devseed-staging --watch
148148

149149
## Pipeline
150150

151-
**Flow:** STAC item URL → Extract zarr → Convert to GeoZarr → Upload S3 → Register STAC item → Add visualization links
151+
**Flow:** STAC item URL → Extract zarr → Convert to GeoZarr → Upload S3 → Register STAC item → Optimize Storage → Add visualization links
152152

153-
**Processing:**
154-
1. **convert_v0.py** - Fetch STAC item, extract zarr URL, convert to cloud-optimized GeoZarr, upload to S3
155-
2. **register.py** - Create STAC item with asset hrefs, add projection metadata and TiTiler links, register to catalog
153+
**Processing Steps:**
154+
155+
**V0 Pipeline (2 steps):**
156+
1. **Convert** - Fetch STAC item, extract zarr URL, convert to cloud-optimized GeoZarr, upload to S3
157+
2. **Register** - Create STAC item with asset hrefs, add projection metadata and TiTiler links, register to catalog
158+
159+
**V1 Pipeline (3 steps):**
160+
1. **Convert** - S2-optimized conversion with enhanced performance
161+
2. **Register** - Enhanced registration with alternate extension and consolidated assets
162+
3. **Change Storage Tier** - Optimize storage costs by moving data to appropriate S3 storage class (default: `EXPRESS_ONEZONE`)
156163

157164
**Runtime:** ~15-20 minutes per item
158165

159166
**Stack:**
160-
161167
- Processing: eopf-geozarr, Dask, Python 3.13
162168
- Storage: S3 (OVH)
163169
- Catalog: pgSTAC, TiTiler
164170

165-
**Infrastructure:** Deployment configuration and infrastructure details are maintained in [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)
171+
**Infrastructure & Workflow Details:** For complete workflow architecture, event flow, and deployment configuration, see [platform-deploy data-pipeline README](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)
166172

167173
---
168174

@@ -197,10 +203,22 @@ kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp \
197203

198204
```
199205
scripts/
200-
├── convert_v0.py # Zarr → GeoZarr conversion and S3 upload
201-
└── register.py # STAC item creation and catalog registration
206+
├── convert_v0.py # Generic Zarr → GeoZarr converter (V0 pipeline)
207+
├── convert_v1_s2.py # S2-optimized GeoZarr converter (V1 pipeline)
208+
├── register_v0.py # Basic STAC registration (V0 pipeline)
209+
├── register_v1.py # Enhanced STAC registration (V1 pipeline)
210+
├── change_storage_tier.py # S3 storage tier optimization (V1 pipeline step 3)
211+
├── test_complete_workflow.py # Workflow testing script
212+
├── test_gateway_format.py # Gateway format testing
213+
└── README_storage_tier.md # Storage tier management documentation
214+
215+
operator-tools/
216+
├── manage_collections.py # STAC collection management (create/clean/update)
217+
├── submit_test_workflow_wh.py # HTTP webhook submission script
218+
├── submit_stac_items_notebook.ipynb # Batch submission notebook
219+
├── README.md # Operator tools documentation
220+
└── README_collections.md # Collection management guide
202221
203-
operator-tools/ # Tools for submitting workflows
204222
docker/Dockerfile # Container image
205223
tests/ # Unit and integration tests
206224
```
@@ -250,7 +268,8 @@ For infrastructure issues, see platform-deploy troubleshooting: [staging](https:
250268

251269
## Documentation
252270

253-
- **Operator Tools:** [operator-tools/README.md](operator-tools/README.md)
271+
- **Operator Tools:** [operator-tools/README.md](operator-tools/README.md) - Workflow submission and collection management
272+
- **Storage Management:** [scripts/README_storage_tier.md](scripts/README_storage_tier.md) - S3 storage tier optimization
254273
- **Tests:** `tests/` - pytest unit and integration tests
255274
- **Deployment:** [platform-deploy/workspaces/devseed-staging/data-pipeline](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)
256275

operator-tools/README.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,22 @@ Examples below use `devseed-staging`. For production, replace with `devseed`.
2525

2626
Before using these tools, you need to set up port forwarding to access the webhook service:
2727

28+
#### For staging environment:
2829
```bash
2930
# Port forward from the webhook eventsource service (staging)
30-
kubectl port-forward -n devseed-staging svc/eopf-explorer-webhook-eventsource-svc 12000:12000 &
31+
kubectl port-forward -n devseed-staging svc/eopf-explorer-webhook-eventsource-svc 12001:12000 &
32+
```
33+
This makes the webhook endpoint available at `http://localhost:12001/samples`.
3134

32-
# For production, use:
33-
# kubectl port-forward -n devseed svc/eopf-explorer-webhook-eventsource-svc 12000:12000 &
35+
#### For prod environment:
36+
```bash
37+
# Port forward from the webhook eventsource service (production)
38+
kubectl port-forward -n devseed svc/eopf-explorer-webhook-eventsource-svc 12000:12000 &
3439
```
3540

3641
This makes the webhook endpoint available at `http://localhost:12000/samples`.
3742

43+
3844
## Available Tools
3945

4046
### 1. `manage_collections.py` - Collection Management Tool

0 commit comments

Comments
 (0)