Assigns read groups based on barcode tag(s) for all kind of sources for ReadTools.

Description

Assigns read groups (@RG) using the barcode information present in the raw barcode tag(s).

Read groups are assigned by matching the ones provided in the barcode file against the present in the tag(s), allowing mismatches and unknown bases (Ns) in the sequence. We also discard ambiguous barcodes, defined as the ones where the number of mismatches is at least x mismatches apart (specified with --maximumMismatches) from the second best barcode (at least one mismatch of difference, change by using --minimumDistance). If several indexes are used and none of them identify uniquely the read group, it is assigned by majority vote.

Arguments

Required Arguments

Argument name(s) Type Description
--barcodeFile
-bc
String Tab-delimited file with header for barcode sequences (‘barcode_sequence’ or ‘barcode_sequence_1’ for the first barcode, ‘barcode_sequence_$(number)’ for subsequent if more than one index is used), sample name (‘sample_name’ or ‘barcode_name’) and, optionally, library name (‘library_name’). Barcode file will overwrite any of Read Group arguments for the same information. WARNING: this file should contain all the barcodes present in the multiplexed file.
--input
-I
String BAM/SAM/CRAM/FASTQ source of reads.
--output
-O
String Output SAM/BAM/CRAM file prefix.

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
--gcs_max_retries
-gcs_retries
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
-h
boolean false display the help message
--keepDiscarded
-keepDiscarded
boolean false Keep reads does not assigned to any record in a separate file.
--maximumMismatches
-mm
List[Integer] [0] Maximum number of mismatches allowed for a matched barcode. Specify more than once for apply a different threshold to several indexes.
--maximumN
-maxN
Integer null Maximum number of unknown bases (Ns) allowed in the barcode to consider them. If ‘null’, no threshold will be applied.
--minimumDistance
-md
List[Integer] [1] Minimum distance (difference in number of mismatches) between the best match and the second. Specify more than once for apply a different threshold to several indexes.
--nNoMismatch
-nnm
boolean false Do not count unknown bases (Ns) as mismatch.
--RGCN
-CN
String null Read Group sequencing center name
--RGDT
-DT
Iso8601Date null Read Group run date
--RGLB
-LB
String null Read Group Library
--RGPI
-PI
Integer null Read Group predicted insert size
--RGPL
-PL
PlatformValue null Read Group platform (e.g. illumina, solid)

Possible values: CAPILLARY, LS454, ILLUMINA, SOLID, HELICOS, IONTORRENT, ONT, PACBIO
--RGPM
-PM
String null Read Group platform model
--RGPU
-PU
String null Read Group platform unit (eg. run barcode)
--runName
-runName
String null Run name to add to the ID in the read group information.
--splitLibraryName boolean false Split file by library.
--splitReadGroup boolean false Split file by read group.
--splitSample boolean false Split file by sample.
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
--addOutputSAMProgramRecord
-addOutputSAMProgramRecord
boolean true If true, adds a PG tag to created SAM/BAM/CRAM files.
--barcodeInReadName
-barcodeInReadName
boolean false Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files.
--createOutputBamIndex
-OBI
boolean true If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
--createOutputBamMD5
-OBM
boolean false If true, create a MD5 digest for any BAM/SAM/CRAM file created
--forceEncoding
-forceEncoding
FastqQualityFormat null Force original quality encoding of the input files.

Possible values: Solexa, Illumina, Standard
--forceOverwrite
-forceOverwrite
Boolean false Force output overwriting if it exists
--input2
-I2
String null BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end).
--interleavedInput
-interleaved
boolean false Interleaved input.
--outputFormat
-outputFormat
BamFormat BAM SAM/BAM/CRAM output format.

Possible values: SAM, BAM, CRAM
--QUIET Boolean false Whether to suppress job-summary info on System.err.
--rawBarcodeSequenceTags
-rawBarcodeSequenceTags
List[String] [BC] Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary.
--readValidationStringency
-VS
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
--reference
-R
String null Reference sequence file. Required for CRAM input.
--secondsBetweenProgressUpdates
-secondsBetweenProgressUpdates
double 10.0 Output traversal statistics every time this many seconds elapse.
--TMP_DIR List[File] [] Undocumented option
--use_jdk_deflater
-jdk_deflater
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use_jdk_inflater
-jdk_inflater
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
-verbosity
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
--showHidden
-showHidden
boolean false display hidden arguments