Standardizes quality and format for all kind of sources for ReadTools.
Description
General tool for standardize any kind of read source (both raw and mapped reads).
This tool outputs a SAM/BAM/CRAM file as defined in the SAM specifications:
- Quality encoding: the Standard quality is Sanger. Quality is detected automatically,
but it could be forced with
--forceEncoding
- Raw barcodes: the standard barcode tags are BC for the sequence and QT for the
quality. To correctly handle information in a SAM/BAM/CRAM file with misencoded barcode tags,
one of the following options could be used:
- Barcodes in the read name: use
--barcodeInReadName
option. This may be useful for files produced by mapping a multiplexed library stored as FASTQ files. - Barcodes in a different tag(s): use
--rawBarcodeSequenceTags
. This may be useful if the barcode is present in a different tag (e.g., when using illumina2bam with dual indexing, the second index will be in the B2 tag)
- Barcodes in the read name: use
- FASTQ file(s): the output is an unmapped SAM/BAM/CRAM file with the quality header added to the CO tag. The raw barcode is extracted from the read name if present independently of the read name encoding (Casava or Illumina legacy). In the case of the Casava's read name encoding, the PF binary tag is also updated.
Warning: If several barcode indexes are present, barcodes are separated by hyphens and
qualities by space as defined in the SAM
specifications.
Note: FASTQ files does not require the
--barcodeInReadName
option.Arguments
Required Arguments
Argument name(s) | Type | Description |
---|---|---|
--input -I |
String | BAM/SAM/CRAM/FASTQ source of reads. |
--output -O |
String | Output SAM/BAM/CRAM file. |
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--arguments_file |
List[File] | [] | read one or more arguments files and add them to the command line |
--gcs_max_retries -gcs_retries |
int | 20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection |
--help -h |
boolean | false | display the help message |
--version |
boolean | false | display the version number for this tool |
Optional Common Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--addOutputSAMProgramRecord -addOutputSAMProgramRecord |
boolean | true | If true, adds a PG tag to created SAM/BAM/CRAM files. |
--barcodeInReadName -barcodeInReadName |
boolean | false | Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files. |
--createOutputBamIndex -OBI |
boolean | true | If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. |
--createOutputBamMD5 -OBM |
boolean | false | If true, create a MD5 digest for any BAM/SAM/CRAM file created |
--forceEncoding -forceEncoding |
FastqQualityFormat | null | Force original quality encoding of the input files. Possible values: Solexa, Illumina, Standard |
--forceOverwrite -forceOverwrite |
Boolean | false | Force output overwriting if it exists |
--input2 -I2 |
String | null | BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end). |
--interleavedInput -interleaved |
boolean | false | Interleaved input. |
--QUIET |
Boolean | false | Whether to suppress job-summary info on System.err. |
--rawBarcodeQualityTag -rawBarcodeQualityTag |
List[String] | [] | Use the qualities encoded in this tag(s) as raw barcode qualities. Requires –rawBarcodeSequenceTags. WARNING: this tag(s) will be removed/updated as necessary. |
--rawBarcodeSequenceTags -rawBarcodeSequenceTags |
List[String] | [BC] | Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary. |
--readValidationStringency -VS |
ValidationStringency | SILENT | Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Possible values: STRICT, LENIENT, SILENT |
--reference -R |
String | null | Reference sequence file. Required for CRAM input. |
--secondsBetweenProgressUpdates -secondsBetweenProgressUpdates |
double | 10.0 | Output traversal statistics every time this many seconds elapse. |
--TMP_DIR |
List[File] | [] | Undocumented option |
--use_jdk_deflater -jdk_deflater |
boolean | false | Whether to use the JdkDeflater (as opposed to IntelDeflater) |
--use_jdk_inflater -jdk_inflater |
boolean | false | Whether to use the JdkInflater (as opposed to IntelInflater) |
--verbosity -verbosity |
LogLevel | INFO | Control verbosity of logging. Possible values: ERROR, WARNING, INFO, DEBUG |
Advanced Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--showHidden -showHidden |
boolean | false | display hidden arguments |