Skip to content

Latest commit

 

History

History
229 lines (173 loc) · 6.97 KB

File metadata and controls

229 lines (173 loc) · 6.97 KB

CLI Usage

Ingestion and Export

The following example shows how to minimally ingest a 3D seismic stack into a local MDIO file. Only one lossless copy will be made.

There are many more options, please see the CLI Reference.

$ mdio segy import \
    path_to_segy_file.segy \
    path_to_mdio_file.mdio \
    -loc 181,185 \
    -names inline,crossline

To export the same file back to SEG-Y format, the following command should be executed.

$ mdio segy export \
    path_to_mdio_file.mdio \
    path_to_segy_file.segy

Cloud Connection Strings

MDIO supports I/O on major cloud service providers. The cloud I/O capabilities are supported using the fsspec and its specialized version for:

  • Amazon Web Services (AWS S3) - s3fs
  • Google Cloud Provider (GCP GCS) - gcsfs
  • Microsoft Azure (Datalake Gen2) - adlfs

Any other file-system supported by fsspec will also be supported by MDIO. However, we will focus on the major providers here.

The protocols that help choose a backend (i.e. s3://, gs://, or az://) can be passed prepended to the MDIO path.

The connection string can be passed to the command-line-interface (CLI) using the -storage-{input,output, --storage-options-{input,output} flag as a JSON string or the Python API with the storage_options_{input,output} keyword argument as a Python dictionary.

On Windows clients, JSON strings are passed to the CLI with a special escape character.

For instance a JSON string:
```json
{"key": "my_super_private_key", "secret": "my_super_private_secret"}
```
must be passed with an escape character `\` for inner quotes as:
```shell
"{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```
whereas, on Linux bash this works just fine:
```shell
'{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```
If this done incorrectly, you will get an invalid JSON string error from the CLI.

Amazon Web Services

Credentials can be automatically fetched from pre-authenticated AWS CLI. See here for the order s3fs checks them. If it is not pre-authenticated, you need to pass --storage-options-{input,output}.

Prefix: s3://

Storage Options: key: The auth key from AWS secret: The auth secret from AWS

Using UNIX:

mdio segy import \
  path/to/my.segy \
  s3://bucket/prefix/my.mdio \
  --header-locations 189,193 \
  --storage-options-output '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'

Using Windows (note the extra escape characters \):

mdio segy import \
  path/to/my.segy \
  s3://bucket/prefix/my.mdio \
  --header-locations 189,193 \
  --storage-options-output "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"

Google Cloud Provider

Credentials can be automatically fetched from pre-authenticated gcloud CLI. See here for the order gcsfs checks them. If it is not pre-authenticated, you need to pass --storage-options-{input-output}.

GCP uses service accounts to pass authentication information to APIs.

Prefix: gs:// or gcs://

Storage Options: token: The service account JSON value as string, or local path to JSON

Using a service account:

mdio segy import \
  path/to/my.segy \
  gs://bucket/prefix/my.mdio \
  --header-locations 189,193 \
  --storage-options-output '{"token": "~/.config/gcloud/application_default_credentials.json"}'

Using browser to populate authentication:

mdio segy import \
  path/to/my.segy \
  gs://bucket/prefix/my.mdio \
  --header-locations 189,193 \
  --storage-options-output '{"token": "browser"}'

Microsoft Azure

There are various ways to authenticate with Azure Data Lake (ADL). See here for some details. If ADL is not pre-authenticated, you need to pass --storage-options-{input,output}.

Prefix: az:// or abfs://

Storage Options: account_name: Azure Data Lake storage account name account_key: Azure Data Lake storage account access key

mdio segy import \
  path/to/my.segy \
  az://bucket/prefix/my.mdio \
  --header-locations 189,193 \
  --storage-options-output '{"account_name": "myaccount", "account_key": "my_super_private_key"}'

Advanced Cloud Features

There are additional functions provided by fsspec. These are advanced features and we refer the user to read fsspec documentation. Some useful examples are:

  • Caching Files Locally
  • Remote Write Caching
  • File Buffering and random access
  • Mount anything with FUSE

Buffered Reads in Ingestion

MDIO v0.8.2 introduces the MDIO__IMPORT__CLOUD_NATIVE environment variable to optimize SEG-Y header scans by balancing bandwidth usage with read latency through buffered reads.

When to Use: This variable is most effective in high-throughput environments like cloud-based ingestion systems but can also improve performance for mechanical drives or slow connections.

How to Enable: Set the variable to one of {"True", "1", "true"}. For example:

$ export MDIO__IMPORT__CLOUD_NATIVE="true"

How It Works: Buffered reads minimize millions of remote requests during SEG-Y header scans:

  • Cloud Environments: Ideal for high-throughput connections between cloud ingestion machines and object stores.
  • Slow Connections: Bandwidth is the bottleneck, may be faster without it.
  • Local Reads: May benefit mechanical drives; SSDs typically perform fine without it.

While buffered reads process the file twice, the tradeoff improves ingestion performance and reduces object-store request costs.

Chaining fsspec Protocols

When combining advanced protocols like simplecache and using a remote store like s3 the URL can be chained like simplecache::s3://bucket/prefix/file.mdio. When doing this the --storage-options-{input,output} argument must explicitly state parameters for the cloud backend and the extra protocol. For the above example it would look like this:

{
  "s3": {
    "key": "my_super_private_key",
    "secret": "my_super_private_secret"
  },
  "simplecache": {
    "cache_storage": "/custom/temp/storage/path"
  }
}

In one line:

{"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"}

CLI Reference

MDIO provides a convenient command-line-interface (CLI) to do various tasks.

For each command / subcommand you can provide --help argument to get information about usage.

.. typer:: mdio.cli:app
    :prog: mdio
    :theme: monokai
    :width: 100
    :show-nested:
    :make-sections: