Download, sort and merge the alignments generated by DistMap.
Description
Download, sort and merge the alignments generated by DistMap (Pandey & Schlötterer 2013).
This tool scan the folder provided as input for multi-part BAM/SAM/CRAM files (e.g. 'part-*'), sort and merge them by batches (in the temp directory) and finally merge all the batches into a single output file.
Note: The results are expected to be located in the Hadoop FileSystem (HDFS) and the
output file in the local computer for following usage, but it is not required.
Arguments
Required Arguments
Argument name(s) | Type | Description |
---|---|---|
--input -I |
String | Input folder to look for Distmap multi-part file results. Expected to be in an HDFS file system. |
--output -O |
String | Output SAM/BAM/CRAM file. |
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--arguments_file |
List[File] | [] | read one or more arguments files and add them to the command line |
--gcs_max_retries -gcs_retries |
int | 20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection |
--help -h |
boolean | false | display the help message |
--version |
boolean | false | display the version number for this tool |
Optional Common Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--addOutputSAMProgramRecord -addOutputSAMProgramRecord |
boolean | true | If true, adds a PG tag to created SAM/BAM/CRAM files. |
--createOutputBamIndex -OBI |
boolean | true | If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. |
--createOutputBamMD5 -OBM |
boolean | false | If true, create a MD5 digest for any BAM/SAM/CRAM file created |
--forceOverwrite -forceOverwrite |
Boolean | false | Force output overwriting if it exists |
--QUIET |
Boolean | false | Whether to suppress job-summary info on System.err. |
--readValidationStringency -VS |
ValidationStringency | SILENT | Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Possible values: STRICT, LENIENT, SILENT |
--reference -R |
String | null | Reference sequence file. Required for CRAM input/output. |
--secondsBetweenProgressUpdates -secondsBetweenProgressUpdates |
double | 10.0 | Output traversal statistics every time this many seconds elapse. |
--SORT_ORDER -SO |
SortOrder | coordinate | Sort order of output file Possible values: unsorted, queryname, coordinate, duplicate, unknown |
--TMP_DIR |
List[File] | [] | Undocumented option |
--use_jdk_deflater -jdk_deflater |
boolean | false | Whether to use the JdkDeflater (as opposed to IntelDeflater) |
--use_jdk_inflater -jdk_inflater |
boolean | false | Whether to use the JdkInflater (as opposed to IntelInflater) |
--verbosity -verbosity |
LogLevel | INFO | Control verbosity of logging. Possible values: ERROR, WARNING, INFO, DEBUG |
Advanced Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--disable-success-check |
boolean | false | Disable the check for the _SUCCESS file on the input folder. |
--noRemoveTaskProgramGroup |
boolean | false | Do not remove the @PG lines generated by every task in the MapReduce Distmap run (default is remove completely). Note: it might merge @PG tags if they are completely equal. |
--numberOfParts |
int | 100 | Number of part files to download, merge and pre-sort at the same time. Reduce this number if you have memory errors. |
--showHidden -showHidden |
boolean | false | display hidden arguments |