Skip to content
187 changes: 120 additions & 67 deletions scripts/README_storage_tier.md → scripts/README_change_storage_tier.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,21 +50,11 @@ uv run python scripts/change_storage_tier.py \
For easier command execution, define the STAC item ID as a variable:

```bash
ITEM_ID="S2B_MSIL2A_20250730T113319_N0511_R080_T29UQP_20250730T135754"
ITEM_ID="S2A_MSIL2A_20251209T123131_N0511_R009_T26SPG_20251209T163109"
```

## Usage

### Basic Usage

Run the script using the STAC item ID variable defined in the setup:

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER
```

### Dry Run

Test the script without making actual changes. Dry-run mode will:
Expand All @@ -75,17 +65,27 @@ Test the script without making actual changes. Dry-run mode will:

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--dry-run
```

### Basic Usage

Run the script using the STAC item ID variable defined in the setup:

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA
```

### With Custom S3 Endpoint

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--s3-endpoint https://s3.de.io.cloud.ovh.net
```

Expand All @@ -96,45 +96,80 @@ Only change storage class for specific parts of the Zarr store:
```bash
# Only process reflectance data
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/reflectance/*"

# Process multiple subdirectories
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/*" \
--include-pattern "quality/*"

# Exclude metadata files
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--exclude-pattern "*.zattrs" \
--exclude-pattern "*.zmetadata"

# Only process 60m resolution data
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "*/r60m/*"
```

## Available Storage Classes

- **STANDARD** - Standard storage tier (default, immediate access, higher cost)
- **GLACIER** - Archive storage tier (lower cost, retrieval required before access)
- **STANDARD_IA** - Archive storage tier (lower cost, retrieval required before access)
- **EXPRESS_ONEZONE** - High-performance storage tier (single availability zone)

### OVH Cloud Storage Classes

**Important**: This script uses OVH Cloud Storage class naming directly to avoid confusion.

**Supported Storage Classes:**
- `STANDARD` - Standard storage (default)
- `STANDARD_IA` - Standard, Infrequent Access (archive storage, low-cost)
- `EXPRESS_ONEZONE` - High Performance (low-latency storage)

**Full AWS to OVH Storage Class Mapping:**

| AWS Storage Class | OVH Storage Class | CLI Value (this script) |
|-------------------|-------------------|------------------------|
| `EXPRESS_ONEZONE` | High Performance | `EXPRESS_ONEZONE` |
| `STANDARD` | Standard | `STANDARD` |
| `INTELLIGENT_TIERING` | Standard | `STANDARD` |
| `STANDARD_IA` | Standard, Infrequent Access | `STANDARD_IA` |
| `ONEZONE_IA` | Standard, Infrequent Access | `STANDARD_IA` |
| `GLACIER_IR` | Standard, Infrequent Access | `STANDARD_IA` |
| `GLACIER` | Standard, Infrequent Access | `STANDARD_IA` |
| `DEEP_ARCHIVE` | Cold Archive | N/A (not supported) |

**Note**: Multiple AWS storage classes map to the same OVH tier. This script uses OVH naming (`STANDARD_IA`) instead of AWS naming (`GLACIER`) to avoid confusion.

**Reference**: [OVH Cloud Storage S3 Location Documentation](https://help.ovhcloud.com/csm/en-public-cloud-storage-s3-location?id=kb_article_view&sysparm_article=KB0047384)

## How It Works

1. Fetches the STAC item from the provided URL
2. Extracts S3 URLs from the `alternate.s3.href` fields in each asset
3. Identifies the root Zarr store location
4. Lists all objects in the Zarr store recursively
4. Lists all objects in the Zarr store recursively (includes storage class in response)
5. Optionally filters objects based on include/exclude patterns
6. Changes the storage class for each object using the S3 API
6. **Optimization**: Skips objects already at target storage class (no API calls)
7. Changes the storage class only for objects that need it using the S3 API

### Performance Optimizations

The script has been optimized to minimize S3 API calls:

- **Storage class from list**: Retrieves storage class during initial listing (no extra `head_object` calls)
- **Smart filtering**: Only makes `copy_object` API calls for objects that actually need to change storage class
- **Progress tracking**: Shows how many objects need changes vs. already correct

## Path Filtering

Expand Down Expand Up @@ -187,8 +222,8 @@ uv run python scripts/register_v1.py \
# 3. Change storage tier (optional)
ITEM_ID="your-item-id"
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA
```

## Error Handling
Expand All @@ -213,8 +248,8 @@ The script provides detailed logging at different levels:
Set the `LOG_LEVEL` environment variable to control verbosity:
```bash
LOG_LEVEL=DEBUG uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA
```

## Examples
Expand All @@ -233,34 +268,52 @@ Use dry-run to see the current storage classes without making changes:

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--dry-run
```

Output example:
```
Summary for S2A_MSIL2A_20250831T103701_N0511_R008_T31TFL_20250831T145420:
Total objects: 1500
Skipped (filtered): 0
Processed: 1500
Succeeded: 1500
Failed: 0

Current storage class distribution:
GLACIER: 300 objects (20.0%)
STANDARD: 1200 objects (80.0%)
Processing: S2A_MSIL2A_20251209T123131_N0511_R009_T26SPG_20251209T163109
Target storage class: STANDARD
Include patterns: */r60m/*
Found 4 S3 URLs
Zarr root: s3://esa-zarr-sentinel-explorer-fra/.../S2A_MSIL2A_20251209T123131_N0511_R009_T26SPG_20251209T163109.zarr
Listing objects in s3://esa-zarr-sentinel-explorer-fra/.../S2A_MSIL2A_20251209T123131_N0511_R009_T26SPG_20251209T163109.zarr/
Found 1058 total objects
After filtering: 260 objects to process, 798 excluded

Initial storage class distribution (before changes):
EXPRESS_ONEZONE: 260 objects (100.0%)
(DRY RUN)

Processing 260 objects...
0 already have target storage class STANDARD
260 need to be changed
Progress: 100/260 objects (38%)
Progress: 200/260 objects (76%)
Progress: 260/260 objects (100%)
============================================================
Summary for S2A_MSIL2A_20251209T123131_N0511_R009_T26SPG_20251209T163109:
Total objects: 1058
Skipped (filtered): 798
Already correct storage class: 0
Changed: 260
Succeeded: 260
Failed: 0
```

**Note**: The storage class distribution is shown before processing starts. When not in dry-run mode, an expected distribution after changes is also displayed. To verify changes were applied, run the same command again - you should see all objects already at the target storage class.

### Preview changes for specific data subset

Test what would happen when archiving only 60m resolution data:

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "*/r60m/*" \
--dry-run
```
Expand All @@ -269,8 +322,8 @@ uv run python scripts/change_storage_tier.py \

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/reflectance/*" \
--dry-run
```
Expand All @@ -279,27 +332,27 @@ uv run python scripts/change_storage_tier.py \

```bash
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--exclude-pattern "*.zattrs" \
--exclude-pattern "*.zmetadata" \
--dry-run
```

### Archive only reflectance data to GLACIER
### Archive only reflectance data to STANDARD_IA

```bash
# First, preview the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/reflectance/*" \
--dry-run

# Then apply the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/reflectance/*"
```

Expand All @@ -308,47 +361,47 @@ uv run python scripts/change_storage_tier.py \
```bash
# Preview first
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/*" \
--exclude-pattern "*/r10m/*" \
--dry-run

# Apply changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--include-pattern "measurements/*" \
--exclude-pattern "*/r10m/*"
```

### Archive old data to GLACIER
### Archive old data to STANDARD_IA

```bash
# Preview the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA \
--dry-run

# Apply the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--storage-class GLACIER
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD_IA
```

### Restore data from GLACIER to STANDARD
### Restore data from STANDARD_IA to STANDARD

```bash
# Preview the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD \
--dry-run

# Apply the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class STANDARD
```

Expand All @@ -357,12 +410,12 @@ uv run python scripts/change_storage_tier.py \
```bash
# Preview the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class EXPRESS_ONEZONE \
--dry-run

# Apply the changes
uv run python scripts/change_storage_tier.py \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a/items/$ITEM_ID \
--stac-item-url https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/$ITEM_ID \
--storage-class EXPRESS_ONEZONE
```
Loading