diff --git a/figures/noaa_ER_diagram.png b/figures/noaa_ER_diagram.png new file mode 100644 index 0000000..a27f39e Binary files /dev/null and b/figures/noaa_ER_diagram.png differ diff --git a/pyleotups_logo.png b/figures/pyleotups_logo.png similarity index 100% rename from pyleotups_logo.png rename to figures/pyleotups_logo.png diff --git a/myst.yml b/myst.yml index 5fe4045..04df250 100644 --- a/myst.yml +++ b/myst.yml @@ -10,6 +10,13 @@ project: # To autogenerate a Table of Contents, run "jupyter book init --write-toc" toc: # Auto-generated by `myst init --write-toc` + - file: README.md + - title: Getting Started + children: + - file: notebooks/01_a_DataProvider.md + - file: notebooks/01_b_PyeloTUPSDesign.md + - file: notebooks/01_c_PangaeaCredentialSetup.md + - file: README.md - title: Working with PyleoTUPS children: diff --git a/notebooks/01_a_DataProvider.md b/notebooks/01_a_DataProvider.md new file mode 100644 index 0000000..33bcf87 --- /dev/null +++ b/notebooks/01_a_DataProvider.md @@ -0,0 +1,271 @@ +# Understanding Data Providers: NOAA & PANGAEA + +## Overview + +PyleoTUPS integrates with two major paleoclimate data repositories to provide researchers with unified access to paleoclimate datasets. Understanding how these repositories work is essential for effectively using PyleoTUPS. + +### Data Provider: + +In PyleoTUPS, a "Data Provider" is a paleoclimate repository that: +- Hosts paleoclimate datasets (tree rings, ice cores, marine records, etc.) +- Provides search/query capabilities via an API or web interface +- Stores metadata (location, authors, time periods, variables measured) + +PyleoTUPS works with: +a. NOAA +b. Pangaea + +PyleoTUPS acts as a bridge between you and these repositories, handling API calls, data parsing, and format conversion so you don't have to. + +--- + +## NOAA NCEI Paleoclimate Database + +### What is NOAA? + +The **National Oceanic and Atmospheric Administration (NOAA)** maintains the **NCEI Paleoclimate Global Monitoring Program**, one of the world's largest collections of paleoclimate data. + +### Understanding the NOAA Data Structure + +``` +Study (or "Individual Dataset") +├── Sites (with geographic coordinates) +│ └── Paleo Data +│ └── Data Tables (spreadsheet-like table) +│ └── Files (text, CSV, Excel) +└── Metadata + ├── Authors/Investigators + ├── Funding Information + ├── Publication Citation + └── Links to raw files +``` + +**Key Concepts:** +- **Study**: A research publication or dataset. Each study has a unique NOAA Study ID (e.g., `13156`) +- **Site**: A specific geographic location where measurements were taken +- **Data Table**: The actual data, often embedded in text files with varying file extensions and formats + +NOAA datasets are organized hierarchically: + +![\[noaa_ER_diagram.png\]](../noaa_ER_diagram.png) + +**Entity Relations:** +In NOAA, data is organized in a hierarchical, one-to-many structure: + +- A Study (a publication or dataset) can contain multiple Sites +- Each Site can contain multiple Paleo Data records +- Each Paleo Data entry can include multiple Data Files (e.g., CSV, TXT) +- Each Data File may correspond to one or more Data Tables [Generally, NOAA Template files have one table, however, old files contain multiple tables] + +### NOAA API Endpoints + +PyleoTUPS uses the **NOAA NCEI Paleo Study Search API**: + +``` +Base URL: https://www.ncei.noaa.gov/access/paleo-search/api/study/search.json +``` + +The API accepts a rich set of query parameters [\[View complete list here\]](https://www.ncei.noaa.gov/access/paleo-search/api): + +| Category | Parameter | Example | +|----------|-----------|---------| +| **Identifiers** | `noaa_id`, `xml_id` | `noaa_id=13156` | +| **Text** | `search_text` | `search_text="younger dryas"` | +| **People** | `investigators` | `investigators="Smith, JS"` | +| **Location** | `locations`, `min_lat`, `max_lat`, `min_lon`, `max_lon` | `min_lat=30, max_lat=40` | +| **Data Type** | `data_type_id` | `4` (Corals), `18` (Tree Ring) | +| **Variables** | `variable_name` (cvWhats), `cv_materials`, `cv_seasonalities` | `variable_name="Radial growth"` | +| **Time** | `earliest_year`, `latest_year`, `time_format`, `time_method` | `earliest_year=-8000` | +| **Elevation** | `min_elevation`, `max_elevation` | `min_elevation=0, max_elevation=3000` | +| **Pagination** | `limit`, `skip` | `limit=50, skip=100` | + +### How PyleoTUPS Uses NOAA + +When you call `NOAADataset.search_studies( )`: + +1. **Query Building** → Translates Pythonic parameter names to NOAA API names +2. **API Request** → Makes HTTP GET request to the NOAA study search endpoint +3. **Response Parsing** → Receives JSON containing study metadata and file URLs +4. **Data Registration** → Stores studies internally and builds indexes for efficient file lookups +5. **Returns** → A DataFrame summarizing found studies + +Each study returned includes file URLs pointing to text/CSV/Excel files hosted on NOAA servers. + +### Example NOAA Workflow + +```python +import pyleotups as pt + +ds = pt.NOAADataset() + +# Search by ID (direct lookup) +df = ds.search_studies(noaa_id=13156) + +# Search by location +df = ds.search_studies(min_lat=30, max_lat=40, min_lon=-100, max_lon=-80, limit=20) + +# Search by data type (e.g., Tree rings) +df = ds.search_studies(data_type_id=18, limit=50) + +# Get data from a study +df_data = ds.get_data("some_datatable_id") +``` + +--- + +## PANGAEA Database + +### What is PANGAEA? + +**PANGAEA** is a sophisticated scientific data repository operated by the **Center for Marine Environmental Sciences (MARUM)**. It hosts interdisciplinary datasets, with a growing collection of paleoclimate studies. + +### PANGAEA Data Organization + +PANGAEA organizes datasets differently than NOAA: + +``` +Dataset (standalone publication) +├── Metadata +│ ├── Title and description +│ ├── Authors/Investigators +│ ├── Publication DOI +│ ├── Funding Information +│ └── Topics +├── Data Tables +│ ├── Columns (parameters with units and descriptions) +│ ├── Geographic locations (one or more, often one per row) +│ └── Rows (measurements or observations) +└── Related Datasets + └── Child datasets or related publications +``` + +**Key Concepts:** +- **Dataset**: A standalone data publication with a unique DOI and PANGAEA ID (e.g., `830587`) +- **Collection**: A collection of Datasets with a unique ID. +- **Parameter** (or Column): A variable (e.g., "δ18O", "Age") which closely alligns to cvWhats in NOAA, and generally variableName in TUPS. +- **Event**: For paleoclimate studies, events most closely match to the concept of Sites in NOAA. Contains geographic/temporal metadata. +- **Citation**: Every dataset has a formal data reference that aligns with publication standards. This is different from the publication citation that references the dataset. + +```NOTE: +Unlike NOAA, one Pangaea Dataset contains only one Data Table i.e. 1 csv/tsv type file. However, one Pangaea Dataset can still contain multiple events. +``` +### PANGAEA Query Interface + +PANGAEA uses a **filter-based search model** with advanced query syntax: + +``` +Base URL: https://www.pangaea.de/advanced/search.php +``` + +Query parameters and operators [\[View complete list here\]](https://wiki.pangaea.de/wiki/PANGAEA_search#:~:text=Searches%20in%20specific%20fields): + +| Feature | Syntax | Example | +|---------|--------|---------| +| **Full-text** | `q=` | `q=stable isotopes` | +| **Author** | `author:` | `author:"Khider, D"` | +| **Parameter** | `parameter:` | `parameter:"δ18O"` | +| **Topic** | `topic=` | `topic="Paleontology"` | +| **Geographic** | Bounding box | `minlon=-100&maxlon=-80&minlat=30&maxlat=40` | +| **Operators** | AND, OR, NOT | `(isotopes OR δ18O) AND paleoclimate` | +| **Field Search** | property:value | Multiple field combinations | + +**Logical Operators:** +- `AND` (default): Both conditions must be met +- `OR`: Either condition can be met +- `NOT`: Exclude results matching the term +- Parentheses `()`: Group terms for precedence + +PyleoTUPS contructs this Pangaea query. + +### How PyleoTUPS Uses PANGAEA + +When you call `PangaeaDataset.search_studies(**kwargs)`: + +1. **Query Building** → Translates Python parameters into PANGAEA query syntax +2. **Query Execution** → Makes requests to PANGAEA search API via `pangaeapy` library +3. **Result Processing** → Retrieves dataset metadata and constructs summary DataFrames +4. **ID Registration** → Stores dataset IDs and metadata for later data retrieval +5. **Returns** → A DataFrame summarizing found datasets + +PyleoTUPS uses the **`pangaeapy`** library (an existing wrapper for PANGAEA API) under the hood to handle low-level API interactions. + +### Example PANGAEA Workflow + +```python +import pyleotups as pt + +ds = pt.PangaeaDataset() + +# Search by ID (direct lookup) +df = ds.search_studies(study_ids=830587) + +# Search by text +df = ds.search_studies(search_text="stable isotopes", limit=20) + +# Search by parameter +df = ds.search_studies(variable_name="δ18O", limit=20) + +# Search with geographic bounds +df = ds.search_studies(min_lat=-10, max_lat=10, min_lon=120, max_lon=160) + +# Get data from a dataset +df_data = ds.get_data(830587) +``` + +--- + +## Comparison: NOAA vs. PANGAEA + +### Data Model + +| Aspect | NOAA | PANGAEA | +|--------|------|---------| +| **Structure** | Hierarchical (Study → Site → PaleoData → DataTable) | Flat (Dataset with multiple parameters) | +| **Geography** | Multiple sites per study | One or more events/locations per dataset | +| **Primary Focus** | Paleoclimate proxy records | Interdisciplinary geoscience data | +| **File Formats** | Legacy text formats, NOAA Templated Text formats, CSV, Excel | Standardized table format (tab-delimited), net-cdf | +| **Metadata** | Rich hierarchical structure | Standardized metadata fields | + +### Query Capabilities + +| Feature | NOAA | PANGAEA | +|---------|------|---------| +| **ID-Based Search** | Yes (NOAA ID, XML ID) | Yes (DOI, numeric ID) | +| **Full-Text** | Yes (Oracle syntax) | Yes (faceted search) | +| **Variable Filter** | Via cvWhats (controlled vocab) | Via parameter name (text-based) | +| **Geographic** | Bounding box | Bounding box | +| **Time Range** | Explicit earliest/latest year | Implicit in data timestamps | +| **Data Type** | Filtered via dataTypeID | Filtered via topic | +| **Authors** | Supported | Supported | +| **Multi-value Logic** | AND/OR operators | AND/OR operators | + +### Data Access + +| Aspect | NOAA | PANGAEA | +|--------|------|---------| +| **Files** | Links to remote files (text, CSV, Excel) | Tables accessed via API or download | +| **Parsing** | Complex legacy formats → requires dedicated parser | Standardized format → no parsing needed | +| **File Handling** | PyleoTUPS downloads and parses | pangaeapy handles retrieval | +| **Metadata in Data** | Embedded within files | Separate dataset-level metadata | + +--- + +## For PyleoTUPS Users +- **Unified API**: Search both repositories with consistent Python syntax +- **Flexible workflow**: Choose the repository that best fits your research needs +- **Data integration**: Make use of datasets from multiple sources with metadata intact +- **Proxy comparison**: Cross-validate findings using multiple independent datasets + +--- + +## Summary + +PyleoTUPS bridges NOAA and PANGAEA by: + +1. **Normalizing search parameters** into repository-specific query formats +2. **Abstracting repository differences** so users think in terms of paleoclimate concepts, not backend APIs +3. **Parsing diverse file formats** (especially NOAA's legacy formats) +4. **Providing unified access** to both searches and data retrieval +5. **Preserving metadata** throughout the data pipeline + +In the next section, we'll explore how PyleoTUPS' architecture enables this unified interface. \ No newline at end of file diff --git a/notebooks/01_b_PyleoTUPSDesign.md b/notebooks/01_b_PyleoTUPSDesign.md new file mode 100644 index 0000000..4d62860 --- /dev/null +++ b/notebooks/01_b_PyleoTUPSDesign.md @@ -0,0 +1,414 @@ +# PyleoTUPS Functionalities: A Unified Interface to Paleoclimate Data + +## Overview + +PyleoTUPS presents a **unified Python interface** to two fundamentally different paleoclimate data repositories: **NOAA NCEI Paleoclimate** and **PANGAEA**. Although these repositories organize data differently, use different search interfaces, and provide data in different formats, PyleoTUPS lets you work with both using the **same Python methods and parameters**. + +This document explains what PyleoTUPS does for you and how to use it, regardless of which repository you're accessing. + +For detailed information about NOAA and PANGAEA themselves, see `DataProviders.md`. + +--- + +## Why a Unified Interface? + +### The Problem +- **NOAA** has a hierarchical structure with rich metadata but complex legacy file formats +- **PANGAEA** has clean standardized tables but a simpler, flatter organization +- **Researchers** want to access both but don't want to learn two completely different APIs + +### The Solution +PyleoTUPS wraps both backends with a single Python API: +- Same method names work for both (`search_studies()`, `get_data()`, etc.) +- Same parameter names for common searches (`search_text`, `min_lat`, `max_lat`, etc.) +- Consistent output format (pandas DataFrames with predictable columns) +- Automatic handling of provider differences (you don't need to worry about them) + +### What This Means for You +Write your Python code in one style. It works with NOAA, PANGAEA, or both. + +--- + +## The Unified Interface: Visual Overview + +``` +┌────────────────────────────────────────────────────────────────┐ +│ YOUR PYTHON CODE (Same for Both) │ +│ │ +│ dataset = pt.NOAADataset() │ +│ # OR │ +│ dataset = pt.PangaeaDataset() │ +│ │ +│ # SAME METHODS & PARAMETERS FOR BOTH │ +│ results = dataset.search_studies( │ +│ search_text="tree rings", ← Works for both │ +│ min_lat=30, max_lat=40, ← Works for both │ +│ investigators="Smith, J", ← Works for both │ +│ limit=50 ← Works for both │ +│ ) │ +└────────────────────────┬───────────────────────────────────────┘ + │ + ┌───────────────┴────────────────┐ + ▼ ▼ + ╔─────────────╗ ╔──────────────╗ + ║ NOAA API ║ ║ PANGAEA API ║ + ║ (Converted ║ ║ (Converted ║ + ║ internally) ║ ║ internally) ║ + ╚─────────────╝ ╚──────────────╝ + │ │ + └───────────────┬────────────────┘ + ▼ + ╔──────────────────────────────────╗ + │ SAME OUTPUT (pandas DataFrame) │ + │ ├─ Identical column names │ + │ ├─ Predictable structure │ + │ └─ Ready for analysis │ + ╚──────────────────────────────────╝ +``` + +--- + +## The Six Core Methods (Work Identically for Both) + +All of these methods work the same way whether you're using `NOAADataset` or `PangaeaDataset`: + +### 1. search_studies() + +**What it does**: Search and filter studies based on your criteria + +**Common parameters** (work for both providers): +- `search_text` – Free-text search (e.g., "δ18O", "paleoclimate reconstructions") +- `min_lat`, `max_lat`, `min_lon`, `max_lon` – Geographic bounding box (degrees) +- `investigators` – Author/researcher names +- `variable_name` – Measured variable (e.g., "ring width", "oxygen isotope") +- `limit` – Maximum number of results (default: 100) +- `skip` – Offset for pagination + +**Returns**: DataFrame with study overview +- Columns: `study_id`, `title`, `authors`, `year`, `location`, `data_type`, `summary` + +**Example**: +```python +ds = pt.NOAADataset() # Same code works for pt.PangaeaDataset() +results = ds.search_studies( + search_text="tree rings", + min_lat=30, + max_lat=40, + limit=20 +) +print(results.head()) # Identical format from either provider +``` + +--- + +### 2. get_summary() + +**What it does**: Get a comprehensive summary of all studies you've found + +**Parameters**: None + +**Returns**: DataFrame with all registered studies +- Columns: `study_id`, `title`, `authors`, `year`, `location`, `data_type`, `geographic_bounds`, `summary_info` + +**Use case**: Review all studies from your search before downloading data + +**Example**: +```python +ds.search_studies(search_text="paleoclimate") +summary = ds.get_summary() +# See all studies found +``` + +--- + +### 3. get_publications() + +**What it does**: Get publication and citation information for all studies + +**Parameters**: None + +**Returns**: DataFrame with bibliographic information +- Columns: `study_id`, `citation`, `doi`, `journal`, `year`, `authors` + +**Use case**: Properly cite the datasets you download in your research papers + +**Example**: +```python +publications = ds.get_publications() +# Get ready-to-use citations for your bibliography +print(publications[['citation', 'doi']]) +``` + +--- + +### 4. get_funding() + +**What it does**: Get funding source information for all studies + +**Parameters**: None + +**Returns**: DataFrame with funding details +- Columns: `study_id`, `funding_agency`, `grant_number`, `award_title` + +**Use case**: Understand who funded the research, acknowledge funding sources + +**Example**: +```python +funding = ds.get_funding() +# Shows which agencies (NSF, NOAA, etc.) supported each study +``` + +--- + +### 5. get_geo() + +**What it does**: Get geographic and site-level information for all studies + +**Parameters**: None + +**Returns**: DataFrame with location details +- Columns: `site_id`, `latitude`, `longitude`, `elevation`, `site_name`, `location_details` + +**Use case**: Understand where measurements were taken, plan field visits, visualize study locations + +**Example**: +```python +locations = ds.get_geo() +# All sites with coordinates—ready for mapping +print(locations[['site_name', 'latitude', 'longitude']]) +``` + +--- + +### 6. get_data(identifier) + +**What it does**: +- With pyleotups, you can easily access the NOAA datasets written in the text files into pandas Dataframe automatically. Our rule based parsers extract the tables for most of the files of majority formats. + + +**Parameters**: +- `identifier` – Study ID in NOAA or DOI in Pangaea + +**Returns**: DataFrame with measurements + metadata +- **Columns**: Time axis (age, depth, or year), measured values (δ18O, ring width, etc.), uncertainty +- **Attributes**: Units, variable descriptions, measurement methods, data source, study metadata + +**Use case**: Get the actual numbers to analyze in your research + +**Example**: +```python +# Get data from a specific study +data = ds.get_data("") + +# Access measurements +print(data) +# Output: age, value, uncertainty columns + +# Access metadata attached to the data +print(data.attrs) +# Output: variables, StudyId, etc. +``` + +--- + +## Provider-Specific Features (Optional Extensions) + +For most uses, the common parameters above are sufficient. However, each provider has additional filters available if you need them. + +### NOAA-Specific Search Parameters + +Available when using `NOAADataset.search_studies()`: + +| Parameter | Purpose | Example | +|-----------|---------|---------| +| `noaa_id` | Direct lookup by NOAA Study ID | `noaa_id=13156` | +| `data_type_id` | Filter by archive type | `data_type_id=18` (tree rings), `4` (corals), `1` (boreholes) | +| `locations` | Hierarchical geographic filter | `locations="Africa>Kenya"` | +| `species` | Tree species filter (4-letter codes) | `species="PIAM"` (Pinus aristata) | +| `cv_materials` | Material type filter | Follow NOAA PaST thesaurus | +| `cv_seasonalities` | Seasonal information | Follow NOAA PaST thesaurus | +| `earliest_year`, `latest_year` | Time window | `earliest_year=-8000, latest_year=-1000` (BP/CE) | +| `min_elevation`, `max_elevation` | Elevation range in meters | `min_elevation=0, max_elevation=3000` | +| `reconstruction` | Climate reconstructions only | `reconstruction=True` | + +**Refer [this link](https://www.ncei.noaa.gov/access/paleo-search/api) for detailed options** + +**When to use**: When `search_text` and location filters aren't specific enough + +**Example**: +```python +noaa_ds = pt.NOAADataset() +# Advanced search: tree ring data from high elevations in the Alps +results = noaa_ds.search_studies( + data_type_id=18, # Tree rings + locations="Europe>Alps", # Hierarchical location + min_elevation=1500, # High elevation + limit=50 +) +``` + +--- + +### PANGAEA-Specific Search Parameters + +Available when using `PangaeaDataset.search_studies()`: + +| Parameter | Purpose | Example | +|-----------|---------|---------| +| `topic` | Filter by research topic | `topic="Paleontology"`, `"Cryosphere"`, `"Oceans"`, `"Atmosphere"` | + +**When to use**: To filter by broad research categories + +**Example**: +```python +pangaea_ds = pt.PangaeaDataset() +# Find all paleontology-related datasets +results = pangaea_ds.search_studies( + topic="Paleontology", + search_text="holocene", + limit=50 +) +``` + +--- + +## Side-by-Side Comparison + +``` +SHARED (Both Providers) PROVIDER-SPECIFIC +═══════════════════════════════════ ════════════════════════════════ + +Methods: NOAA-Only Filters: +├─ search_studies() ├─ data_type_id (archive type) +├─ get_summary() ├─ species (tree codes) +├─ get_publications() ├─ elevation range +├─ get_funding() ├─ time window (CE/BP) +├─ get_geo() ├─ hierarchical locations +└─ get_data() └─ controlled vocabulary + +Common Parameters: PANGAEA-Only: +├─ search_text └─ topic (Paleontology, etc.) +├─ min_lat, max_lat +├─ min_lon, max_lon +├─ investigators +├─ variable_name +├─ limit +└─ skip + +Same Output Format: Internal Differences (You Don't See): +├─ columns identical ├─ NOAA: Various file formats +├─ column names consistent ├─ PANGAEA: Standardized format +└─ data types predictable └─ Speed (NOAA slower for first download) +``` + +--- + +## What Information You Get + +### From search_studies() or get_summary() + +``` +Columns you receive: +├── study_id : Unique identifier +├── title : Study title +├── authors : Research team members +├── year : Publication year +├── data_type : Type of paleoclimate data +├── geographic_bounds : Region covered (lat/lon) +└── summary_info : Brief description +``` + +### From get_publications() + +``` +Columns you receive: +├── study_id : Which study? +├── citation : Full bibliographic reference +├── doi : Digital Object Identifier +├── journal : Journal name +└── year : Published year +``` + +### From get_funding() + +``` +Columns you receive: +├── study_id : Which study? +├── funding_agency : Organization (NSF, NOAA, etc.) +├── grant_number : Award ID +└── award_title : Project name +``` + +### From get_geo() + +``` +Columns you receive: +├── site_id : Unique site identifier +├── latitude : Degrees North +├── longitude : Degrees East +├── elevation : Meters above sea level +├── site_name : Location name +└── location_details : Additional geographic info +``` + +### From get_data(identifier) + +``` +A Pandas Dataframe with optional columns like: +├── age / depth / year : Time axis (format varies by data type) +├── value : The actual measurement (δ18O, ring width, etc.) +└── uncertainty : Measurement error / precision + +``` + +--- + +## Practical Differences You Might Notice + +| Aspect | NOAA | PANGAEA | What to Expect | +|--------|------|---------|-----------------| +| **Search options** | Many detailed filters | Simpler filters | NOAA for precise queries, PANGAEA for broad searches | +| **Data extraction** | Slower (parses legacy formats) | Faster (clean format) | NOAA first download takes longer | +| **Metadata detail** | Site-level (sites within studies) | Dataset-level | NOAA tells you exactly where each point was measured | +| **Data consistency** | Varies (legacy formats) | Standardized (FAIR) | PANGAEA easier to combine across studies | +| **Search breadth** | Fewer results (curated) | More results (interdisciplinary) | May need more filtering for PANGAEA | + +--- + +## Quick Reference: What to Use + +| Your Goal | Use This Method | Works With | +|-----------|-----------------|-----------| +| Find relevant studies | `search_studies()` + common parameters | Both NOAA & PANGAEA | +| Review all results | `get_summary()` | Both | +| Cite your datasets | `get_publications()` | Both | +| Acknowledge funders | `get_funding()` | Both | +| Map study locations | `get_geo()` | Both | +| Get actual measurements | `get_data(study_id)` | Both | +| Need NOAA-specific filters | Add `data_type_id`, `species`, `elevation`, etc. | NOAA only | +| Filter PANGAEA by topic | Add `topic` parameter | PANGAEA only | + +--- + +## Key Takeaway + +**Majority of your work uses the common methods.** Only use provider-specific parameters when you need fine-grained control. + +The unified interface means: +- ✓ Learn PyleoTUPS once, use it everywhere +- ✓ Switch between NOAA and PANGAEA with one line of code +- ✓ Get consistent output regardless of provider +- ✓ Focus on your paleoclimate research, not API differences + +--- + +## Next Sections: + +- Setting up credentials for [Pangaea](01_c_PangaeaCredentialSetup.md). +- [Working with PyeloTUPS](02_a_NOAAObject.ipynb) (Tutorials using the NOAA & Pangaea Dataset objects) + +## References: + +- PyleoTUPS Documentation: https://pyleotups.readthedocs.io/en/latest/ +- NOAA API Documentation: https://www.ncei.noaa.gov/access/paleo-search/api +- Pangaea Search API Documentation: https://wiki.pangaea.de/wiki/PANGAEA_search diff --git a/notebooks/01_c_PangaeaCredentialSetup.md b/notebooks/01_c_PangaeaCredentialSetup.md new file mode 100644 index 0000000..3d52f15 --- /dev/null +++ b/notebooks/01_c_PangaeaCredentialSetup.md @@ -0,0 +1,139 @@ +# Setting Up PANGAEA Credentials + +## Overview + +Some datasets on PANGAEA are protected and require authentication to access. This means you'll need to set up credentials (a special access key) before you can download certain data. This guide walks you through the process step by step, even if you're not familiar with programming. + +PANGAEA credentials are personal to you and should be kept secure. We'll show you two easy ways to manage them: using built-in PyleoTUPS tools or a simple file-based approach. + +## Why Do You Need Credentials? + +- **Protected Datasets**: Some research data on PANGAEA is restricted to authorized users only +- **Your Account**: Credentials are linked to your PANGAEA account +- **Security**: This ensures data is used appropriately and citations are tracked + +If you're only accessing public datasets, you might not need credentials. But for complete access to paleoclimate data, setting this up is recommended. + +## Step 1: Obtaining Your PANGAEA API Token + +First, you need to get your personal access token from PANGAEA. This is a unique code that identifies you. + +### Instructions: + +1. **Create or Log In to Your Account** + - Go to: https://www.pangaea.de/user/login.php + - If you don't have an account, create one (it's free for researchers) + +2. **Access Your Profile** + - After logging in, find your user profile or account settings + +3. **Find Your API Token** + - Look for your "API login token" or "access key" in your profile + - This is usually a long string of letters and numbers + - Copy this token - keep it private! + +> **Important**: This token is unique to your account. Don't share it with others, and don't post it online. + +## Step 2: Securely Storing Your Token + +To keep your credentials safe and avoid typing them repeatedly, we'll store them securely. You have two options: + +### Option A: Using PyleoTUPS Built-in Tools (Recommended) + +PyleoTUPS provides simple functions to save and load your credentials automatically. + +#### Save Your Credentials (One-Time Setup) + +Run this code in a Python notebook or script: + +```python +from pyleotups import save_pangaea_credentials + +# Replace 'your_token_here' with your actual API token +save_pangaea_credentials("") +``` + +This securely saves your token in a hidden location on your computer. + +#### Load Credentials in Your Code + +Whenever you need to use PANGAEA data, load your credentials like this: + +```python +from pyleotups import load_pangaea_credentials + +# This gets your saved token +pan_api = load_pangaea_credentials() +``` + +The variable `pan_api` now contains your token and can be used with PyleoTUPS. + +### Option B: Manual Storage with .env File + +If you prefer more control or are working in different environments, you can store credentials in a .env file. + +#### Create a .env File + +1. In your project folder, create a new file named exactly .env +2. Add this line to the file: + ``` + PANGAEA_API="your_pangaea_token_here" + ``` +3. Replace `your_pangaea_token_here` with your actual token +4. **Important**: Enclose the token in double quotes + +#### Security Notes: +- Add .env to your .gitignore file if using version control (Git) +- Never share or upload .env files +- Keep the file in your project root directory + +## Step 3: Using Credentials in Your Research + +Once stored, you can use your credentials with PyleoTUPS: + +### For Manual .env Method: + +```python +from dotenv import load_dotenv +import os + +# Load the .env file +load_dotenv() + +# Get your token +pan_api = os.getenv("PANGAEA_API") +``` + +### Initialize PyleoTUPS with Credentials: + +```python +import pyleotups as pt + +# Create a PANGAEA dataset object with your credentials +dataset = pt.PangaeaDataset(auth_token=pan_api) +``` + +Now you can search and download protected datasets! + +## Troubleshooting + +### "Credentials not found" error: +- Make sure you've run the save function or created the .env file correctly +- Check that your token is copied exactly (no extra spaces) + +### "Invalid token" error: +- Verify your token from PANGAEA profile +- Tokens can expire - you may need to generate a new one + +### Still can't access data: +- Some datasets may have additional restrictions +- Contact the dataset authors or PANGAEA support + +## Next Steps + +With credentials set up, you can now: +- Access protected PANGAEA datasets +- Use the full PyleoTUPS functionality for paleoclimate research +- Proceed to tutorials on searching and downloading data + +For more information about PyleoTUPS capabilities, see the Next Guides on [NOAAObject](./02_a_NOAAObject.ipynb) and [PangaeaObject](./02_b_PANGAEAObject.ipynb). \ No newline at end of file