A lightweight Nextflow workflow designed for testing infrastructure.
After configuring and deploying your infrastructure, validating the functionality of a Nextflow pipeline is crucial. Common issues include network problems, permission discrepancies, and invalid paths. The purpose of this pipeline is to execute a series of straightforward tasks to ensure they function properly. By employing simple tasks that test specific functionalities, we can pinpoint and resolve individual issues effectively.
This pipeline is intentionally minimalistic, arriving with no additional configuration. Users are required to tailor the run settings to their specific infrastructure.
All tests are consolidated in the main.nf file, prioritizing simplicity and readability. If any test is challenging to comprehend, please raise an issue.
To execute locally, use the following command:
nextflow run seqeralabs/nf-canaryThis will run on your local machine with default settings. Successful execution of all tests indicates a correctly configured Nextflow installation. Any failures suggest potential issues with your configuration.
Configure your infrastructure using the appropriate configuration files listed in the Nextflow documentation. Run the pipeline with the additional configuration to execute it on your setup.
nf-canary adheres to the --outdir convention established by nf-core to determine the output file destination. If not specified, output files are written to the work directory under a subfolder named outputs.
The container is specified by the container parameter, which defaults to quay.io/biocontainers/ubuntu:24.04. If you wish to use a different container, you can specify an alternative using the --container parameter.
params.container = 'docker.io/ubuntu:24.04'Skip individual tests by name using the --skip parameter, e.g., --skip TEST_INPUT. Multiple tests can be specified with comma-delimited values, e.g., --skip TEST_CREATE_FILE,TEST_PASS_FILE. The parameter is case-insensitive.
By default, all tests are executed. However, you can selectively run a specific test with the --run parameter, e.g., --run TEST_INPUT. When using this parameter, only the selected tests will run. Multiple tests can be specified with comma-delimited values, e.g., --run TEST_CREATE_FILE,TEST_PASS_FILE. Case insensitive.
Certain tests depend on the successful execution of previous tests and will be skipped if an upstream test fails. The dependencies for each test are listed below:
TEST_CREATE_FILE:
- TEST_PASS_FILE
TEST_CREATE_FOLDER:
- TEST_PASS_FOLDEREach test includes a brief comment explaining its purpose. In case of failure, review the comment to understand the intended outcome and compare it to the error message reported by Nextflow.
This process should succeed automatically with exit status 0.
This process creates a file on the worker machine and then moves it to the working directory.
This process creates an empty file on the worker machine and then moves it to the working directory.
This process creates a folder in the working directory.
This process retrieves a file from the working directory and reads its contents on the worker machine.
Tests a shell script in the bin/ directory that creates a single file.
Note: Enabled only if the parameter --remoteFile is specified.
This process retrieves a file from a remote resource to the worker machine and reads its contents. Use the --remoteFile parameter, e.g., to download this README:
nextflow run seqeralabs/nf-canary --remoteFile 'https://raw.githubusercontent.com/seqeralabs/nf-canary/main/README.md'Use this parameter to specify a file to access during runtime.
This process stages a file from the working directory to the worker node, copies it, and stages it back to the working directory.
This process stages a folder from the working directory to the worker node, copies it, and stages it back to the working directory.
This process creates a file on the worker machine and writes it to the publishDir directory. By default, this is written to a subfolder called output in the working directory, but it can be overridden using the --outdir parameter. Use this to demonstrate the ability to publish to the relevant output directory.
This process creates a folder on the worker machine and writes it to the publishDir directory. By default, this is written to a subfolder called output in the working directory, but it can be overridden using the --outdir parameter. Use this to demonstrate the ability to publish to the relevant output directory.
This process should fail immediately but be ignored using the default configuration.
Tests moving a file within the working directory.
Tests moving the contents of a folder to a new folder within the working directory.
Test a process can accept a value as input.
Note: Enabled only if the parameter --gpu is specified.
This process tests the ability to use a GPU. It uses the pytorch conda environment to test CUDA is available and working. This is disabled by default as it requires a GPU to be available which may not be true.
Note: Enabled only if the parameter --fusion is specified (or set to true by a profile).
This process runs the fusion-doctor diagnostics tool to validate the Fusion filesystem configuration. It checks system requirements (e.g. kernel version, memory, disk space, etc.), FUSE device availability, and cloud bucket accessibility among others. The process produces a JSON diagnostic report published to ${outdir}/fusion/.
Use a built-in profile to enable Fusion validation with predefined thresholds:
nextflow run seqeralabs/nf-canary -profile fusion_aws_recommendedTip
If in doubt, start with the recommended tier:
- Choose
lowfor small workloads or quick smoke tests, andhighfor large-scale production with big datasets. - Choose
recommendedto match Seqera's documented minimum requirements for production workloads with Fusion. - Choose
high(or a custom threshold of 400 GB+ storage) if your pipeline processes files larger than 100 GB.
Note
Seqera Platform will auto-select NVMe-based instance families when Fusion is enabled (e.g. m6id, c6id, r6id) and the "Fast instance storage" toggle is active in the CE settings. Fusion can also work without NVMe instances, but in this case the EBS disk shall be bumped to 100 GB (gp3, 325 MB/s); this is what the low profile validates.
| Profile | Disk | Memory | Kernel | Based on |
|---|---|---|---|---|
fusion_aws_low |
100 GB | 4 GB | 5.10+ | EBS gp3 (no NVMe) |
fusion_aws_recommended |
200 GB | 8 GB | 5.10+ | Seqera docs minimum |
fusion_aws_high |
474 GB | 16 GB | 5.10+ | m6id.2xlarge NVMe |
Google Cloud
Note
Seqera Platform auto-selects families supporting local SSDs (e.g. n2, c2, n2d) and provisions a 375 GB NVMe SSD per job.
| Profile | Disk | Memory | Kernel | Based on |
|---|---|---|---|---|
fusion_google_low |
50 GB | 4 GB | 5.15+ | GCP persistent disk |
fusion_google_recommended |
375 GB | 8 GB | 5.15+ | 1x local NVMe SSD |
fusion_google_high |
750 GB | 16 GB | 5.15+ | 2x local NVMe SSDs |
Note
Unlike AWS and Google Cloud, there is no auto-selection: the user must pick the VM size. Seqera recommends E-series with a d suffix (e.g. Standard_E8d_v5, Standard_E16d_v5).
| Profile | Disk | Memory | Kernel | Based on |
|---|---|---|---|---|
fusion_azure_low |
75 GB | 4 GB | 5.15+ | Standard_E2d_v5 temp disk |
fusion_azure_recommended |
300 GB | 8 GB | 5.15+ | Standard_E8d_v5 temp disk |
fusion_azure_high |
600 GB | 16 GB | 5.15+ | Standard_E16d_v5 temp disk |
In AWS Batch, the Seqera Platform UI allows selecting instance families (e.g. m6id) but not specific sizes. Small instances in an otherwise valid family may not meet the recommended disk threshold. For example, on AWS the m6id family NVMe ranges from 118 GB (.large) to 1,900 GB (.8xlarge), with only .xlarge and above meeting the 200 GB threshold.
If fusion-doctor reports a disk requirement failure, request more CPUs/memory to get a larger instance, or use the low profile for small tasks.
You can override the default requirements using command-line parameters:
nextflow run seqeralabs/nf-canary \
--fusion \
--fusion_kernel_version_min "5.10" \
--fusion_memory_capacity_gb_min 8 \
--fusion_disk_capacity_gb_min 100Available parameters:
--fusion_kernel_version_min- Minimum Linux kernel version (e.g., "5.10")--fusion_memory_capacity_gb_min- Minimum memory in GB (e.g., 8)--fusion_disk_capacity_gb_min- Minimum disk space in GB (e.g., 100)--fusion_cache_path- Path for Fusion cache directory (default:/tmp)--fusion_read_write_buckets- Comma-separated list of read-write bucket URIs--fusion_read_only_buckets- Comma-separated list of read-only bucket URIs