A Python package for spatial resampling and binning of geospatial data, specifically designed for oceanographic datasets. This tool enables efficient downsampling of high-resolution gridded data onto coarser grids while preserving spatial accuracy through intelligent neighborhood averaging.
If you use this tool in your research, please cite:
@software{map_binning_2025,
author = {Chia-Wei Hsu},
title = {Map Binning Tool: Spatial Resampling for Oceanographic Data},
url = {https://github.com/chiaweh2/map_binning},
doi = {10.5281/zenodo.17095448},
year = {2025}
}The Map Binning Tool provides a robust solution for spatial data aggregation, particularly useful for:
- Downsampling high-resolution oceanographic data (e.g., sea level anomaly, ocean currents)
- Creating consistent multi-resolution datasets
- Reducing computational load while maintaining spatial representativeness
- Processing time-series of gridded data efficiently
The package uses k-d tree algorithms for fast spatial queries and supports both in-memory processing and persistent caching of spatial indices for repeated operations.
- Efficient Spatial Binning: Uses scipy's cKDTree for fast nearest-neighbor searches
- Flexible Grid Support: Works with any xarray-compatible gridded dataset
- Automatic Radius Calculation: Intelligently determines search radius based on target grid spacing
- Persistent Caching: Save and reuse spatial indices using pickle serialization
- Time Series Support: Handles datasets with temporal dimensions
- Memory Efficient: Processes large datasets without excessive memory usage
- Oceanographic Focus: Optimized for CMEMS and similar oceanographic data formats
conda install -c conda-forge map-binningpip install map-binningpip install map-binning[dev]git clone <repository-url>
cd map_binning
pip install -e .import xarray as xr
from map_binning import Binning
# Load your datasets
ds_high = xr.open_dataset('high_resolution_data.nc')
ds_low = xr.open_dataset('low_resolution_grid.nc')
# Initialize the binning tool
binning = Binning(
ds_high=ds_high,
ds_low=ds_low,
var_name='sla', # variable in the dataset to bin (e.g., sea level anomaly)
xdim_name='longitude', # longitude dimension name
ydim_name='latitude', # latitude dimension name
search_radius=0.1 # optional: search radius in degrees
)
# Perform binning
result = binning.mean_binning()# Create binning index and save it for reuse
result = binning.mean_binning(
precomputed_binning_index=False,
pickle_filename="my_binning_index.pkl",
pickle_location="./cache"
)
# Reuse the saved index for subsequent operations
result = binning.mean_binning(
precomputed_binning_index=True,
pickle_filename="my_binning_index.pkl",
pickle_location="./cache"
)The tool automatically handles time dimensions:
# Works seamlessly with time-varying datasets
# Input: (time, lat, lon) -> Output: (time, lat_low, lon_low)
result = binning.mean_binning()Copy .env.template to .env and configure:
# Copernicus Marine Service credentials (if using CMEMS data)
COPERNICUSMARINE_SERVICE_USERNAME=<your_username>
COPERNICUSMARINE_SERVICE_PASSWORD=<your_password>ds_high(xr.Dataset): High-resolution source datasetds_low(xr.Dataset): Low-resolution target grid datasetvar_name(str): Name of the variable to binxdim_name(str, optional): Longitude dimension name (default: 'lon')ydim_name(str, optional): Latitude dimension name (default: 'lat')search_radius(float, optional): Search radius in degrees (auto-calculated if None)
create_binning_index()
Creates a spatial mapping between high and low resolution grids.
mean_binning(precomputed_binning_index=False, pickle_filename=None, pickle_location=None)
Performs spatial binning using mean aggregation.
Parameters:
precomputed_binning_index(bool): Use pre-saved spatial indexpickle_filename(str): Filename for saving/loading spatial indexpickle_location(str): Directory path for pickle files
Returns: xr.DataArray with binned data on the target grid
map_binning/
├── map_binning/ # Main package directory
│ ├── __init__.py # Package initialization
│ ├── binning.py # Core binning algorithms
│ ├── index_store.py # Pickle serialization utilities
│ └── main.py # Command-line interface
├── notebooks/ # Jupyter notebooks for examples
│ └── cmems_nrt_coastal_bin.ipynb
├── tests/ # Unit tests
│ ├── __init__.py
│ └── ...
├── pyproject.toml # Project configuration
├── environment.yml # Conda environment specification
├── .env.template # Environment variables template
└── README.md # This file
- Memory Usage: The tool processes data in chunks and uses efficient numpy operations
- Spatial Index Caching: Save computed spatial indices to avoid recalculation
- Grid Resolution: Performance scales with the product of grid sizes
- Search Radius: Smaller radii improve performance but may miss relevant data points
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Run the test suite (
pytest) - Format your code (
black map_binning/) - Submit a pull request
# Clone and setup development environment
git clone <repository-url>
cd map-binning-project
conda env create -f environment.yml
conda activate map-binning
pip install -e .[dev]
# Run tests
pytest
# Format code
black map_binning/
# Type checking
mypy map_binning/This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Please report bugs and feature requests via GitHub Issues
- Documentation: Additional examples available in the
notebooks/directory - Contact: Chia-Wei Hsu ([email protected])
- Built with support for Copernicus Marine Environment Monitoring Service (CMEMS) data
- Utilizes scipy's efficient spatial algorithms
- Designed for the oceanographic research community