Skip to content

Commit 1be9fd6

Browse files
committed
centralize all paramters (except single/pair commands into check_aligner_params). Add information to user via check_aligner
1 parent 8aa4632 commit 1be9fd6

File tree

15 files changed

+437
-287
lines changed

15 files changed

+437
-287
lines changed

README.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ AliNe (Alignment in Nextflow) - RNAseq DNAseq
77

88
AliNe is a pipeline written in Nextflow that aims to efficiently align reads against a reference using the tools of your choice.
99

10+
Input: file, list of file, folder or csv
11+
Output: Coordinate sorted BAM file.
12+
1013
## Table of Contents
1114

1215
* [Foreword](#foreword)
@@ -45,7 +48,7 @@ You can choose to run one or several aligner in parallel.
4548

4649
| Tool | Single End (short reads) | Paired end (short reads) | Pacbio | ONT |
4750
| --- | --- | --- | --- | --- |
48-
| bbmap ||| ⚠️ | ⚠️ |
51+
| bbmap ||| ✅ use mapPacBio.sh | ✅ use mapPacBio.sh |
4952
| bowtie ||| ⚠️ | ⚠️ |
5053
| bowtie2 ||| ⚠️ | ⚠️ |
5154
| bwaaln || ✅ R1 and R2 independently aligned then merged with bwa sampe | ⚠️ | ⚠️ |
@@ -56,7 +59,7 @@ You can choose to run one or several aligner in parallel.
5659
| hisat2 ||| ⚠️ | ⚠️ |
5760
| kallisto ||| ⚠️ | ⚠️ |
5861
| last | ⚠️ | ⚠️ R1 and R2 independently aligned then merged with maf-convert |||
59-
| minimap2 | ⚠️ | ⚠️ |||
62+
| minimap2 | | |||
6063
| ngmlr | ⚠️ | ⚠️ R1 and R2 independently aligned then merged with cat |||
6164
| novoalign |||| ⚠️ |
6265
| nucmer || ✅ R1 and R2 independently aligned then merged with cat | ⚠️ | ⚠️ |
@@ -322,21 +325,26 @@ On success you should get a message looking like this:
322325
--help prints the help section
323326
324327
General Parameters
325-
--reads path to the reads folder or (remote) file (commma separated list of remote file accepted).
326-
If a folder is provided, all the files with proper extension are detected.
328+
--reads path to the reads file, folder or csv. If a folder is provided, all the files with proper extension in the folder will be used. You can provide remote files (commma separated list).
327329
file extension expected : <.fastq.gz>, <.fq.gz>, <.fastq> or <.fq>
328-
for paired reads extra <_R1_001> or <_R2_001> is expected where <R> and <_001> are optional. e.g. <sample_id_1.fastq.gz>, <sample_id_R1.fastq.gz>, <sample_id_R1_001.fastq.gz>)
330+
for paired reads extra <_R1_001> or <_R2_001> is expected where <R> and <_001> are optional. e.g. <sample_id_1.fastq.gz>, <sample_id_R1.fastq.gz>, <sample_id_R1_001.fastq.gz>)
331+
csv input expects 6 columns: sample, fastq_1, fastq_2, strandedness, read_type and data_type.
332+
fastq_2 is optional and can be empty. Strandedness, read_type and data_type expect same values as corresponding AliNe parameters; If a value is provided via AliNe parameter, it will override the value in the csv file.
333+
Example of csv file:
334+
sample,fastq_1,fastq_2,strandedness,read_type,data_type
335+
control1,path/to/data1.fastq.gz,,auto,short_single,rna
336+
control2,path/to/data2_R1.fastq.gz,path/to/data2_R2.fastq.gz,auto,short_paired,rna
329337
--reference path to the reference file (fa, fa.gz, fasta or fasta.gz)
330338
--aligner aligner(s) to use among this list (comma or space separated) [bbmap, bowtie, bowtie2, bwaaln, bwamem, bwamem2, bwasw, graphmap2, hisat2, kallisto, minimap2, novoalign, nucmer, ngmlr, star, subread, sublong]
331339
--outdir path to the output directory (default: alignment_results)
332340
--annotation [Optional][used by graphmap2, STAR, subread] Absolute path to the annotation file (gtf or gff3)
333341
334342
Type of input reads
343+
--data_type type of data among this list [DNA, RNA] (no default)
335344
--read_type type of reads among this list [short_paired, short_single, pacbio, ont] (default: short_paired)
336-
--library_type Set the library_type of your reads (default: auto). In auto mode salmon will guess the library type for each sample.
345+
--strandedness Set the library_type of your reads (default: auto). In auto mode salmon will guess the library type for each sample.
337346
If you know the library type you can set it to one of the following: [U, IU, MU, OU, ISF, ISR, MSF, MSR, OSF, OSR]. See https://salmon.readthedocs.io/en/latest/library_type.html for more information.
338347
In such case the sample library type will be used for all the samples.
339-
--skip_libray_usage Skip the usage of library type provided by the user or guessed by salmon.
340348
341349
Extra steps
342350
--trimming_fastp run fastp for trimming (default: false)
@@ -392,8 +400,11 @@ Here the description of typical ouput you will get from AliNe:
392400
├── mean_read_length # Folder with mean read length computed in bash (optional - done if selected aligners need the info and no value provided by the user)
393401
│ └── sample1_seqkit_trim_sampled_read_length.txt # Mean read length for sample1
394402
395-
├── salmon_libtype # Librairy information (read orientation and strand information) detected via Salmon
396-
│ └── sample1_lib_format_counts.json # Librairy information detectected for sample1
403+
├── salmon_libtype # Library information (read orientation and strand information) detected via Salmon
404+
│ └── sample1_lib_format_counts.json # Library information detectected for sample1
405+
|
406+
├── aline_updated_params
407+
| └── sample1.txt # File resuming the parameters automatically set by AliNe
397408
|
398409
├── alignment # Folder gathering all alignment output (indicies, sorted bam and logs)
399410
│ ├── aligner1 # Folder gathering data produced by aligner

aline.nf

Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ if (params.help) { exit 0, helpMSG() }
8888
//*************************************************
8989
// STEP 1 - PARAMS CHECK
9090
//*************************************************
91-
def aline_processed_params = "aline_processed_params"
91+
def aline_processed_params = "aline_updated_params"
9292
def path_reads = params.reads
9393
def via_csv = false
9494
if ( path_reads.endsWith('.csv') ){
@@ -119,9 +119,9 @@ if( !params.aligner ){
119119
// check read library type parameter
120120
println """check strandedness parameter: ..."""
121121
if (! params.strandedness ){
122-
println """ No value provided, strandedness set to null! (equivalent to unstranded)"""
122+
println """ No value provided by --strandedness"""
123123
if( via_csv ) {
124-
println """ This value will replace any strandedness value found in your csv!"""
124+
println """ value will be taken from the csv file."""
125125
}
126126
}
127127
else {
@@ -233,12 +233,15 @@ Report Parameters
233233
234234
Aligner Parameters (provided by user)
235235
"""
236-
println printAlignerOptions(aligner_list, aline_processed_params)
237-
236+
println printAlignerOptions(aligner_list)
237+
println """
238+
Aligner parameters (updated by Aline)
239+
Available at : ${params.outdir}/${aline_processed_params}
240+
"""
238241
//*************************************************
239242
// STEP 2 - Include needed modules
240243
//*************************************************
241-
include {read_length; check_aligner_params} from "$baseDir/modules/bash.nf"
244+
include {read_length; check_aligner; check_aligner_params} from "$baseDir/modules/bash.nf"
242245
include {bbmap_index; bbmap} from "$baseDir/modules/bbmap.nf"
243246
include {bowtie_index; bowtie} from "$baseDir/modules/bowtie.nf"
244247
include {bowtie2_index; bowtie2} from "$baseDir/modules/bowtie2.nf"
@@ -577,17 +580,13 @@ Please specify the read type either by including a read_type column in the input
577580
params.debug && raw_reads.view()
578581
}
579582

580-
// --------------------- set aligner params ----------------------
581-
// Add annotation file within the tool options if annotation provided
582-
// Add specific options for aligner according to the read type
583-
println """check aligner parameters ..."""
584-
check_aligner_params( raw_reads, aligner_list, annotation.collect(), aline_processed_params )
585-
params.debug && raw_reads.view()
583+
// -------------- Warning aligner read_type usage ---------------
584+
log.info """Check aligner ..."""
585+
check_aligner( raw_reads, aligner_list )
586586

587587
// Initialize channels
588588
Channel.empty().set{logs}
589-
Channel.empty().set{sorted_bam}
590-
589+
591590
// extra params
592591
salmon_index_done = false // to avoid multiple calls to salmon_index
593592
// ------------------------------------------------------------------------------------------------
@@ -743,9 +742,19 @@ Please specify the read type either by including a read_type column in the input
743742
reads = sample_to_notguess.concat(sample_to_guess_done)
744743
params.debug && reads.view()
745744

745+
// ------------------------------------------------------------------------------------------------
746+
// ADAPT ALIGNER PARAMETERS
747+
// ------------------------------------------------------------------------------------------------
748+
log.info """Adapt aligner parameters ..."""
749+
check_aligner_params( raw_reads, aligner_list, annotation.collect(), aline_processed_params )
750+
params.debug && raw_reads.view()
751+
746752
// ------------------------------------------------------------------------------------------------
747753
// ALIGNEMENT
748754
// ------------------------------------------------------------------------------------------------
755+
// Initialize sorted_bam channel
756+
Channel.empty().set{sorted_bam}
757+
749758
params.debug && log.info('library type alignment')
750759
// ------------------- BBMAP -----------------
751760
if ("bbmap" in aligner_list ){
@@ -1260,9 +1269,15 @@ def helpMSG() {
12601269
--help prints the help section
12611270
12621271
General Parameters
1263-
--reads path to the reads file or folder. If a folder is provided, all the files with proper extension in the folder will be used. You can provide remote files (commma separated list).
1272+
--reads path to the reads file, folder or csv. If a folder is provided, all the files with proper extension in the folder will be used. You can provide remote files (commma separated list).
12641273
file extension expected : <.fastq.gz>, <.fq.gz>, <.fastq> or <.fq>
1265-
for paired reads extra <_R1_001> or <_R2_001> is expected where <R> and <_001> are optional. e.g. <sample_id_1.fastq.gz>, <sample_id_R1.fastq.gz>, <sample_id_R1_001.fastq.gz>)
1274+
for paired reads extra <_R1_001> or <_R2_001> is expected where <R> and <_001> are optional. e.g. <sample_id_1.fastq.gz>, <sample_id_R1.fastq.gz>, <sample_id_R1_001.fastq.gz>)
1275+
csv input expects 6 columns: sample, fastq_1, fastq_2, strandedness, read_type and data_type.
1276+
fastq_2 is optional and can be empty. Strandedness, read_type and data_type expect same values as corresponding AliNe parameters; If a value is provided via AliNe paramter, it will override the value in the csv file.
1277+
Example of csv file:
1278+
sample,fastq_1,fastq_2,strandedness,read_type,data_type
1279+
control1,path/to/data1.fastq.gz,,auto,short_single,rna
1280+
control2,path/to/data2_R1.fastq.gz,path/to/data2_R2.fastq.gz,auto,short_paired,rna
12661281
--reference path to the reference file (fa, fa.gz, fasta or fasta.gz)
12671282
--aligner aligner(s) to use among this list (comma or space separated) ${align_tools}
12681283
--outdir path to the output directory (default: alignment_results)
@@ -1310,7 +1325,7 @@ def helpMSG() {
13101325
"""
13111326
}
13121327

1313-
def printAlignerOptions(aligner_list, aline_processed_params) {
1328+
def printAlignerOptions(aligner_list) {
13141329
def sentence = ""
13151330
if ("bbmap" in aligner_list){
13161331
sentence += """
@@ -1403,10 +1418,6 @@ def printAlignerOptions(aligner_list, aline_processed_params) {
14031418
subread parameters
14041419
subread_options : ${params.subread_options}
14051420
"""}
1406-
sentence += """
1407-
Aligner parameters processed by Aline can be retrieved in ${params.outdir}/${aline_processed_params} file.
1408-
"""
1409-
14101421
return sentence
14111422
}
14121423

0 commit comments

Comments
 (0)