Download, sort and merge the alignments generated by DistMap.

Description

Download, sort and merge the alignments generated by DistMap (Pandey & Schlötterer 2013).

This tool scan the folder provided as input for multi-part BAM/SAM/CRAM files (e.g. 'part-*'), sort and merge them by batches (in the temp directory) and finally merge all the batches into a single output file.

Arguments

Required Arguments

Argument name(s) Type Description
--input
-I
String Input folder to look for Distmap multi-part file results. Expected to be in an HDFS file system.
--output
-O
String Output SAM/BAM/CRAM file.

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
--gcs_max_retries
-gcs_retries
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
-h
boolean false display the help message
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
--addOutputSAMProgramRecord
-addOutputSAMProgramRecord
boolean true If true, adds a PG tag to created SAM/BAM/CRAM files.
--createOutputBamIndex
-OBI
boolean true If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
--createOutputBamMD5
-OBM
boolean false If true, create a MD5 digest for any BAM/SAM/CRAM file created
--forceOverwrite
-forceOverwrite
Boolean false Force output overwriting if it exists
--QUIET Boolean false Whether to suppress job-summary info on System.err.
--readValidationStringency
-VS
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
--reference
-R
String null Reference sequence file. Required for CRAM input/output.
--secondsBetweenProgressUpdates
-secondsBetweenProgressUpdates
double 10.0 Output traversal statistics every time this many seconds elapse.
--SORT_ORDER
-SO
SortOrder coordinate Sort order of output file

Possible values: unsorted, queryname, coordinate, duplicate, unknown
--TMP_DIR List[File] [] Undocumented option
--use_jdk_deflater
-jdk_deflater
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use_jdk_inflater
-jdk_inflater
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
-verbosity
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
--disable-success-check boolean false Disable the check for the _SUCCESS file on the input folder.
--noRemoveTaskProgramGroup boolean false Do not remove the @PG lines generated by every task in the MapReduce Distmap run (default is remove completely). Note: it might merge @PG tags if they are completely equal.
--numberOfParts int 100 Number of part files to download, merge and pre-sort at the same time. Reduce this number if you have memory errors.
--showHidden
-showHidden
boolean false display hidden arguments