Converts any kind of ReadTools source to Distmap format.
Description
Converts to the Distmap format ( Pandey & Schlötterer 2013) any kind of ReadTools source, including information from the barcodes (BC tag) in the read name (Illumina format) to allow keeping sample data.
If requested, it also applies a trimming/filtering pipeline to the reads.
See additional information in the following pages:
Arguments
Required Arguments
Argument name(s) | Type | Description |
---|---|---|
--input -I |
String | BAM/SAM/CRAM/FASTQ source of reads. |
--output -O |
String | Output in Distmap format. Expected to be in an HDFS file system. |
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--arguments_file |
List[File] | [] | read one or more arguments files and add them to the command line |
--disable3pTrim -D3PT |
boolean | false | Disable 3’-trimming. Cannot be true when argument disable5pTrim (D5PT) is true. |
--disable5pTrim -D5PT |
boolean | false | Disable 5’-trimming. May be useful for downstream mark of duplicate reads, usually identified by the 5’ mapping position. Cannot be true when disable3pTrim (D3P) is true. |
--gcs_max_retries -gcs_retries |
int | 20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection |
--help -h |
boolean | false | display the help message |
--readFilter -RF |
List[String] | [] | Read filters to be applied in the distmap pipeline |
--trimmer -TM |
List[String] | [] | Trimmers to be applied in the distmap pipeline |
--version |
boolean | false | display the version number for this tool |
Optional Common Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--barcodeInReadName -barcodeInReadName |
boolean | false | Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files. |
--forceEncoding -forceEncoding |
FastqQualityFormat | null | Force original quality encoding of the input files. Possible values: Solexa, Illumina, Standard |
--forceOverwrite -forceOverwrite |
Boolean | false | Force output overwriting if it exists |
--input2 -I2 |
String | null | BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end). |
--interleavedInput -interleaved |
boolean | false | Interleaved input. |
--QUIET |
Boolean | false | Whether to suppress job-summary info on System.err. |
--rawBarcodeSequenceTags -rawBarcodeSequenceTags |
List[String] | [BC] | Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary. |
--readValidationStringency -VS |
ValidationStringency | SILENT | Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Possible values: STRICT, LENIENT, SILENT |
--reference -R |
String | null | Reference sequence file. Required for CRAM input. |
--secondsBetweenProgressUpdates -secondsBetweenProgressUpdates |
double | 10.0 | Output traversal statistics every time this many seconds elapse. |
--TMP_DIR |
List[File] | [] | Undocumented option |
--use_jdk_deflater -jdk_deflater |
boolean | false | Whether to use the JdkDeflater (as opposed to IntelDeflater) |
--use_jdk_inflater -jdk_inflater |
boolean | false | Whether to use the JdkInflater (as opposed to IntelInflater) |
--verbosity -verbosity |
LogLevel | INFO | Control verbosity of logging. Possible values: ERROR, WARNING, INFO, DEBUG |
Advanced Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--hdfsBlockSize -hdfsBlockSize |
Long | null | Block-size (in bytes) for files in HDFS. If not provided, use default configuration. |
--showHidden -showHidden |
boolean | false | display hidden arguments |