Assigns read groups based on barcode tag(s) for all kind of sources for ReadTools.


Assigns read groups (@RG) using the barcode information present in the raw barcode tag(s).

Read groups are assigned by matching the ones provided in the barcode file against the present in the tag(s), allowing mismatches and unknown bases (Ns) in the sequence. We also discard ambiguous barcodes, defined as the ones where the number of mismatches is at least x mismatches apart (specified with --maximumMismatches) from the second best barcode (at least one mismatch of difference, change by using --minimumDistance). If several indexes are used and none of them identify uniquely the read group, it is assigned by majority vote.


Required Arguments

Argument name(s) Type Description
String Tab-delimited file with header for barcode sequences (‘barcode_sequence’ or ‘barcode_sequence_1’ for the first barcode, ‘barcode_sequence_$(number)’ for subsequent if more than one index is used), sample name (‘sample_name’ or ‘barcode_name’) and, optionally, library name (‘library_name’). Barcode file will overwrite any of Read Group arguments for the same information. WARNING: this file should contain all the barcodes present in the multiplexed file.
String BAM/SAM/CRAM/FASTQ source of reads.
String Output SAM/BAM/CRAM file prefix.

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
boolean false display the help message
boolean false Keep reads does not assigned to any record in a separate file.
List[Integer] [0] Maximum number of mismatches allowed for a matched barcode. Specify more than once for apply a different threshold to several indexes.
Integer null Maximum number of unknown bases (Ns) allowed in the barcode to consider them. If ‘null’, no threshold will be applied.
List[Integer] [1] Minimum distance (difference in number of mismatches) between the best match and the second. Specify more than once for apply a different threshold to several indexes.
boolean false Do not count unknown bases (Ns) as mismatch.
String null Read Group sequencing center name
Iso8601Date null Read Group run date
String null Read Group Library
Integer null Read Group predicted insert size
PlatformValue null Read Group platform (e.g. illumina, solid)

String null Read Group platform model
String null Read Group platform unit (eg. run barcode)
String null Run name to add to the ID in the read group information.
--splitLibraryName boolean false Split file by library.
--splitReadGroup boolean false Split file by read group.
--splitSample boolean false Split file by sample.
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
boolean true If true, adds a PG tag to created SAM/BAM/CRAM files.
boolean false Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files.
boolean true If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
boolean false If true, create a MD5 digest for any BAM/SAM/CRAM file created
FastqQualityFormat null Force original quality encoding of the input files.

Possible values: Solexa, Illumina, Standard
Boolean false Force output overwriting if it exists
String null BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end).
boolean false Interleaved input.
BamFormat BAM SAM/BAM/CRAM output format.

Possible values: SAM, BAM, CRAM
--QUIET Boolean false Whether to suppress job-summary info on System.err.
List[String] [BC] Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary.
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
String null Reference sequence file. Required for CRAM input.
double 10.0 Output traversal statistics every time this many seconds elapse.
--TMP_DIR List[File] [] Undocumented option
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
boolean false display hidden arguments