Converts any kind of ReadTools source to Distmap format.

Description

Converts to the Distmap format ( Pandey & Schlötterer 2013) any kind of ReadTools source, including information from the barcodes (BC tag) in the read name (Illumina format) to allow keeping sample data.

If requested, it also applies a trimming/filtering pipeline to the reads.

See additional information in the following pages:

Arguments

Required Arguments

Argument name(s) Type Description
--input
-I
String BAM/SAM/CRAM/FASTQ source of reads.
--output
-O
String Output in Distmap format. Expected to be in an HDFS file system.

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
--disable3pTrim
-D3PT
boolean false Disable 3’-trimming. Cannot be true when argument disable5pTrim (D5PT) is true.
--disable5pTrim
-D5PT
boolean false Disable 5’-trimming. May be useful for downstream mark of duplicate reads, usually identified by the 5’ mapping position. Cannot be true when disable3pTrim (D3P) is true.
--gcs_max_retries
-gcs_retries
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
-h
boolean false display the help message
--readFilter
-RF
List[String] [] Read filters to be applied in the distmap pipeline
--trimmer
-TM
List[String] [] Trimmers to be applied in the distmap pipeline
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
--barcodeInReadName
-barcodeInReadName
boolean false Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files.
--forceEncoding
-forceEncoding
FastqQualityFormat null Force original quality encoding of the input files.

Possible values: Solexa, Illumina, Standard
--forceOverwrite
-forceOverwrite
Boolean false Force output overwriting if it exists
--input2
-I2
String null BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end).
--interleavedInput
-interleaved
boolean false Interleaved input.
--QUIET Boolean false Whether to suppress job-summary info on System.err.
--rawBarcodeSequenceTags
-rawBarcodeSequenceTags
List[String] [BC] Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary.
--readValidationStringency
-VS
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
--reference
-R
String null Reference sequence file. Required for CRAM input.
--secondsBetweenProgressUpdates
-secondsBetweenProgressUpdates
double 10.0 Output traversal statistics every time this many seconds elapse.
--TMP_DIR List[File] [] Undocumented option
--use_jdk_deflater
-jdk_deflater
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use_jdk_inflater
-jdk_inflater
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
-verbosity
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
--hdfsBlockSize
-hdfsBlockSize
Long null Block-size (in bytes) for files in HDFS. If not provided, use default configuration.
--showHidden
-showHidden
boolean false display hidden arguments