Skip to content

Fgumi initial commit#165

Open
freerkvandijk wants to merge 26 commits intodevfrom
fgumi
Open

Fgumi initial commit#165
freerkvandijk wants to merge 26 commits intodevfrom
fgumi

Conversation

@freerkvandijk
Copy link
Copy Markdown

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

matthdsm and others added 3 commits June 23, 2025 12:48
- module updates
- faster bclconvert output handling
- modified output structure, featuring qc reports per library and enhanced run QC
- fastq output when aligner equals false, with falco report
- per sample analysis definition for more flexibility
@freerkvandijk freerkvandijk requested a review from matthdsm April 7, 2026 13:31
"umi_aware": {
"meta": ["umi_aware"],
"type": "boolean",
"description": "Run markdup in UMI-aware mode. This applies to Samtools only and requires the UMI to be in the read name.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this, but add another option

"umi_aware": {
"meta": ["umi_aware"],
"type": "boolean",
"description": "Run markdup in UMI-aware mode. This applies to Samtools only and requires the UMI to be in the read name.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this, but add another option

ext.prefix = { "${meta.id}.fgumi.unmapped" }
ext.args = {
[
"--read-structures ${params.fgumi_read_structures}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using pipeline wide params for sample specific settings, instead use meta values

//// FGUMI extract (step 1)
withName: '.*FASTQ_TO_CRAM:FGUMI_EXTRACT' {
cpus = 8
memory = 32.GB
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarked? Seems high

"${params.fgumi_snap_extra_args}",
].join(" ").trim()
}
ext.args2 = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snap params should be equivalent to default alignment config

"--strategy ${params.fgumi_group_strategy}",
"--edits ${params.fgumi_group_edits}",
"--threads ${task.cpus}",
"--queue-memory ${params.fgumi_queue_memory}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be handled in the module code and depend on process.memory

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}.fgumi.unmapped"
def sample_name = meta.samplename ?: meta.id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume meta.samplename and meta.library do not exist to make it portable to nf-core

prefix = task.ext.prefix ?: "${meta.id}.fgumi.unmapped"
def sample_name = meta.samplename ?: meta.id
def library_name = meta.library ?: meta.id
def input_files = (reads instanceof List ? reads : [reads]).collect { read -> "${read}" }.join(' ')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextflow automatically parses lists, so this isn't needed

--ref ${fasta} \
${args}

fgumi sort \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate modules here?

-o -sam - \
-t ${task.cpus} \
${snap_args} \
| samtools sort \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop samtools, and let snap sort the output


// MODULES
include { BIOBAMBAM_BAMSORMADUP } from "../../../modules/nf-core/biobambam/bamsormadup/main.nf"
include { FGUMI_DUPLEX_METRICS } from "../../../modules/local/fgumi/duplexmetrics/main.nf"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put all of this in a subworkflow

// ALIGNMENT([meta,fastq], index, sort)
ch_meta_reads_aligner_index_fasta_datatype.dna
.branch { meta, reads, aligner, index, fasta ->
umi: meta.umi_aware == true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not override existing settings, this will break samtools UMI handling

.set { ch_cram_crai }
ch_cram_crai.dump(tag: "FASTQ_TO_CRAM: cram and crai", pretty: true)

// Keep a dedicated channel for UMI-aware sample CRAM outputs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y THO?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add test data in the repo, use nf-cmgg/test-datasets or data available on S3

Comment on lines +61 to +62
markdup: bamsormadup
umi_aware: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is inherently wrong, bamsormadup does not support UMI markdup

Copy link
Copy Markdown
Member

@matthdsm matthdsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

"--read-structures ${params.fgumi_read_structures}",
params.fgumi_extract_umis_from_read_names ? "--extract-umis-from-read-names" : "",
"--read-structures ${meta.fgumi_read_structures ?: params.fgumi_read_structures}",
((meta.fgumi_extract_umis_from_read_names != null ? meta.fgumi_extract_umis_from_read_names : params.fgumi_extract_umis_from_read_names) ? "--extract-umis-from-read-names" : ""),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the params

"-sa",
"-xf 2",
meta.readgroup ? "-R \"@RG\\t" + meta.readgroup.findResults { rg -> rg.value?.trim() ? "${rg.key}:${rg.value}" : null }.join("\\t") + "\"" : "",
"${params.fgumi_snap_extra_args}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this param, should go in modules.config

"--family-size-histogram ${meta.id}.fgumi.group.family_size_histogram.txt",
].join(" ").trim()
}
ext.strategy = { params.fgumi_group_strategy }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only ext.args is allowed

ext.prefix = { "${meta.id}.fgumi.filter" }
ext.args = {
[
"--min-reads ${params.fgumi_filter_min_reads}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no params

def library_name = meta.library ?: meta.id
def input_files = (reads instanceof List ? reads : [reads]).collect { read -> "${read}" }.join(' ')
// Keep module portable: only meta.id is assumed, with optional task.ext overrides.
def sample_name = task.ext.sample_name ?: meta.id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only ext.args is allowed, check out nf-core/modules for examples of complex module config

fgumi group \
--input ${bam} \
--output ${prefix}.bam \
--strategy ${strategy} \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that isn't resource related must be specified in modules.config under ext.args

fgumi sort \
--input ${bam} \
--output ${prefix}.bam \
--order coordinate \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hard code options

Comment on lines +42 to +49
FGUMI_SIMPLEX(
FGUMI_GROUP.out.bam
)

FGUMI_DUPLEX_METRICS(
FGUMI_GROUP.out.bam
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the workflow differentiate between duplex and simplex?

input[0] = [
[id: "test"],
file("${projectDir}/tests/inputs/fgumi/grouped.bam", checkIfExists: true)
file("${projectDir}/tests/inputs/fgumi/template.bam", checkIfExists: true)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test-data in repo

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add test with data, not only stubs

umi_filtered_consensus_bam = FASTQ_TO_CRAM.out.filtered_consensus_bam
umi_duplex_metrics = FASTQ_TO_CRAM.out.duplex_metrics
umi_crams = FASTQ_TO_CRAM.out.umi_cram_crai
umi_crams = FASTQ_TO_CRAM.out.cram_crai.filter { meta, _cram, _crai -> meta.fgumi_aware == true }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of separate outputs?

Copy link
Copy Markdown
Member

@matthdsm matthdsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More being difficult

Comment on lines 245 to 249
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some params left over.
threads and memory need to go in to module code

Comment on lines +303 to +305
"--strategy ${params.fgumi_group_strategy}",
"--edits ${params.fgumi_group_edits}",
"--compression-level ${params.fgumi_compression_level}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params -> meta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants