The following example shows how to minimally ingest a 3D seismic stack into a local MDIO file. Only one lossless copy will be made.
There are many more options, please see the CLI Reference.
$ mdio segy import \
path_to_segy_file.segy \
path_to_mdio_file.mdio \
-loc 181,185 \
-names inline,crosslineTo export the same file back to SEG-Y format, the following command should be executed.
$ mdio segy export \
path_to_mdio_file.mdio \
path_to_segy_file.segyMDIO supports I/O on major cloud service providers. The cloud I/O capabilities are supported using the fsspec and its specialized version for:
- Amazon Web Services (AWS S3) - s3fs
- Google Cloud Provider (GCP GCS) - gcsfs
- Microsoft Azure (Datalake Gen2) - adlfs
Any other file-system supported by fsspec will also be supported by MDIO. However,
we will focus on the major providers here.
The protocols that help choose a backend (i.e. s3://, gs://, or az://) can be passed
prepended to the MDIO path.
The connection string can be passed to the command-line-interface (CLI) using the
-storage-{input,output, --storage-options-{input,output} flag as a JSON string or the Python API with
the storage_options_{input,output} keyword argument as a Python dictionary.
On Windows clients, JSON strings are passed to the CLI with a special escape character.
For instance a JSON string:
```json
{"key": "my_super_private_key", "secret": "my_super_private_secret"}
```
must be passed with an escape character `\` for inner quotes as:
```shell
"{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```
whereas, on Linux bash this works just fine:
```shell
'{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```
If this done incorrectly, you will get an invalid JSON string error from the CLI.
Credentials can be automatically fetched from pre-authenticated AWS CLI.
See here for the order s3fs
checks them. If it is not pre-authenticated, you need to pass --storage-options-{input,output}.
Prefix:
s3://
Storage Options:
key: The auth key from AWS
secret: The auth secret from AWS
Using UNIX:
mdio segy import \
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options-output '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'Using Windows (note the extra escape characters \):
mdio segy import \
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options-output "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"Credentials can be automatically fetched from pre-authenticated gcloud CLI.
See here for the order gcsfs
checks them. If it is not pre-authenticated, you need to pass --storage-options-{input-output}.
GCP uses service accounts to pass authentication information to APIs.
Prefix:
gs:// or gcs://
Storage Options:
token: The service account JSON value as string, or local path to JSON
Using a service account:
mdio segy import \
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options-output '{"token": "~/.config/gcloud/application_default_credentials.json"}'Using browser to populate authentication:
mdio segy import \
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options-output '{"token": "browser"}'There are various ways to authenticate with Azure Data Lake (ADL).
See here for some details.
If ADL is not pre-authenticated, you need to pass --storage-options-{input,output}.
Prefix:
az:// or abfs://
Storage Options:
account_name: Azure Data Lake storage account name
account_key: Azure Data Lake storage account access key
mdio segy import \
path/to/my.segy \
az://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options-output '{"account_name": "myaccount", "account_key": "my_super_private_key"}'There are additional functions provided by fsspec. These are advanced features and we refer
the user to read fsspec documentation.
Some useful examples are:
- Caching Files Locally
- Remote Write Caching
- File Buffering and random access
- Mount anything with FUSE
MDIO v0.8.2 introduces the MDIO__IMPORT__CLOUD_NATIVE environment variable to optimize
SEG-Y header scans by balancing bandwidth usage with read latency through buffered reads.
When to Use: This variable is most effective in high-throughput environments like cloud-based ingestion systems but can also improve performance for mechanical drives or slow connections.
How to Enable: Set the variable to one of {"True", "1", "true"}. For example:
$ export MDIO__IMPORT__CLOUD_NATIVE="true"How It Works: Buffered reads minimize millions of remote requests during SEG-Y header scans:
- Cloud Environments: Ideal for high-throughput connections between cloud ingestion machines and object stores.
- Slow Connections: Bandwidth is the bottleneck, may be faster without it.
- Local Reads: May benefit mechanical drives; SSDs typically perform fine without it.
While buffered reads process the file twice, the tradeoff improves ingestion performance and reduces object-store request costs.
When combining advanced protocols like simplecache and using a remote store like s3 the
URL can be chained like simplecache::s3://bucket/prefix/file.mdio. When doing this the
--storage-options-{input,output} argument must explicitly state parameters for the cloud backend and the
extra protocol. For the above example it would look like this:
{
"s3": {
"key": "my_super_private_key",
"secret": "my_super_private_secret"
},
"simplecache": {
"cache_storage": "/custom/temp/storage/path"
}
}In one line:
{"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"}MDIO provides a convenient command-line-interface (CLI) to do various tasks.
For each command / subcommand you can provide --help argument to
get information about usage.
.. typer:: mdio.cli:app
:prog: mdio
:theme: monokai
:width: 100
:show-nested:
:make-sections: