Assigns read groups based on barcode tag(s) for all kind of sources for ReadTools.
Description
Assigns read groups (@RG) using the barcode information present in the raw barcode tag(s).
Read groups are assigned by matching the ones provided in the barcode file against the
present in the tag(s), allowing mismatches and unknown bases (Ns) in the sequence. We also
discard ambiguous barcodes, defined as the ones where the number of mismatches is at least x
mismatches apart (specified with --maximumMismatches
) from the second best barcode
(at least one mismatch of difference, change by using --minimumDistance
).
If several indexes are used and none of them identify uniquely the read group, it is assigned by
majority vote.
Warning: If several barcodes are present and one of them
identify uniquely the read group, this is assigned directly. Thus, it is recommended to provide
all the barcodes present in the library to the parameter.
Note: For pair-end reads, only one read is used to assign the barcode.
Arguments
Required Arguments
Argument name(s) | Type | Description |
---|---|---|
--barcodeFile -bc |
String | Tab-delimited file with header for barcode sequences (‘barcode_sequence’ or ‘barcode_sequence_1’ for the first barcode, ‘barcode_sequence_$(number)’ for subsequent if more than one index is used), sample name (‘sample_name’ or ‘barcode_name’) and, optionally, library name (‘library_name’). Barcode file will overwrite any of Read Group arguments for the same information. WARNING: this file should contain all the barcodes present in the multiplexed file. |
--input -I |
String | BAM/SAM/CRAM/FASTQ source of reads. |
--output -O |
String | Output SAM/BAM/CRAM file prefix. |
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--arguments_file |
List[File] | [] | read one or more arguments files and add them to the command line |
--gcs_max_retries -gcs_retries |
int | 20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection |
--help -h |
boolean | false | display the help message |
--keepDiscarded -keepDiscarded |
boolean | false | Keep reads does not assigned to any record in a separate file. |
--maximumMismatches -mm |
List[Integer] | [0] | Maximum number of mismatches allowed for a matched barcode. Specify more than once for apply a different threshold to several indexes. |
--maximumN -maxN |
Integer | null | Maximum number of unknown bases (Ns) allowed in the barcode to consider them. If ‘null’, no threshold will be applied. |
--minimumDistance -md |
List[Integer] | [1] | Minimum distance (difference in number of mismatches) between the best match and the second. Specify more than once for apply a different threshold to several indexes. |
--nNoMismatch -nnm |
boolean | false | Do not count unknown bases (Ns) as mismatch. |
--RGCN -CN |
String | null | Read Group sequencing center name |
--RGDT -DT |
Iso8601Date | null | Read Group run date |
--RGLB -LB |
String | null | Read Group Library |
--RGPI -PI |
Integer | null | Read Group predicted insert size |
--RGPL -PL |
PlatformValue | null | Read Group platform (e.g. illumina, solid) Possible values: CAPILLARY, LS454, ILLUMINA, SOLID, HELICOS, IONTORRENT, ONT, PACBIO |
--RGPM -PM |
String | null | Read Group platform model |
--RGPU -PU |
String | null | Read Group platform unit (eg. run barcode) |
--runName -runName |
String | null | Run name to add to the ID in the read group information. |
--splitLibraryName |
boolean | false | Split file by library. |
--splitReadGroup |
boolean | false | Split file by read group. |
--splitSample |
boolean | false | Split file by sample. |
--version |
boolean | false | display the version number for this tool |
Optional Common Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--addOutputSAMProgramRecord -addOutputSAMProgramRecord |
boolean | true | If true, adds a PG tag to created SAM/BAM/CRAM files. |
--barcodeInReadName -barcodeInReadName |
boolean | false | Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files. |
--createOutputBamIndex -OBI |
boolean | true | If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. |
--createOutputBamMD5 -OBM |
boolean | false | If true, create a MD5 digest for any BAM/SAM/CRAM file created |
--forceEncoding -forceEncoding |
FastqQualityFormat | null | Force original quality encoding of the input files. Possible values: Solexa, Illumina, Standard |
--forceOverwrite -forceOverwrite |
Boolean | false | Force output overwriting if it exists |
--input2 -I2 |
String | null | BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end). |
--interleavedInput -interleaved |
boolean | false | Interleaved input. |
--outputFormat -outputFormat |
BamFormat | BAM | SAM/BAM/CRAM output format. Possible values: SAM, BAM, CRAM |
--QUIET |
Boolean | false | Whether to suppress job-summary info on System.err. |
--rawBarcodeSequenceTags -rawBarcodeSequenceTags |
List[String] | [BC] | Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary. |
--readValidationStringency -VS |
ValidationStringency | SILENT | Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Possible values: STRICT, LENIENT, SILENT |
--reference -R |
String | null | Reference sequence file. Required for CRAM input. |
--secondsBetweenProgressUpdates -secondsBetweenProgressUpdates |
double | 10.0 | Output traversal statistics every time this many seconds elapse. |
--TMP_DIR |
List[File] | [] | Undocumented option |
--use_jdk_deflater -jdk_deflater |
boolean | false | Whether to use the JdkDeflater (as opposed to IntelDeflater) |
--use_jdk_inflater -jdk_inflater |
boolean | false | Whether to use the JdkInflater (as opposed to IntelInflater) |
--verbosity -verbosity |
LogLevel | INFO | Control verbosity of logging. Possible values: ERROR, WARNING, INFO, DEBUG |
Advanced Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--showHidden -showHidden |
boolean | false | display hidden arguments |