Standardizes quality and format for all kind of sources for ReadTools.

Description

General tool for standardize any kind of read source (both raw and mapped reads).

This tool outputs a SAM/BAM/CRAM file as defined in the SAM specifications:

  • Quality encoding: the Standard quality is Sanger. Quality is detected automatically, but it could be forced with --forceEncoding
  • Raw barcodes: the standard barcode tags are BC for the sequence and QT for the quality. To correctly handle information in a SAM/BAM/CRAM file with misencoded barcode tags, one of the following options could be used:
    • Barcodes in the read name: use --barcodeInReadName option. This may be useful for files produced by mapping a multiplexed library stored as FASTQ files.
    • Barcodes in a different tag(s): use --rawBarcodeSequenceTags. This may be useful if the barcode is present in a different tag (e.g., when using illumina2bam with dual indexing, the second index will be in the B2 tag)
  • FASTQ file(s): the output is an unmapped SAM/BAM/CRAM file with the quality header added to the CO tag. The raw barcode is extracted from the read name if present independently of the read name encoding (Casava or Illumina legacy). In the case of the Casava's read name encoding, the PF binary tag is also updated.

Arguments

Required Arguments

Argument name(s) Type Description
--input
-I
String BAM/SAM/CRAM/FASTQ source of reads.
--output
-O
String Output SAM/BAM/CRAM file.

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
--gcs_max_retries
-gcs_retries
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
-h
boolean false display the help message
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
--addOutputSAMProgramRecord
-addOutputSAMProgramRecord
boolean true If true, adds a PG tag to created SAM/BAM/CRAM files.
--barcodeInReadName
-barcodeInReadName
boolean false Use the barcode encoded in SAM/BAM/CRAM read names. Note: this is not necessary for input FASTQ files.
--createOutputBamIndex
-OBI
boolean true If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
--createOutputBamMD5
-OBM
boolean false If true, create a MD5 digest for any BAM/SAM/CRAM file created
--forceEncoding
-forceEncoding
FastqQualityFormat null Force original quality encoding of the input files.

Possible values: Solexa, Illumina, Standard
--forceOverwrite
-forceOverwrite
Boolean false Force output overwriting if it exists
--input2
-I2
String null BAM/SAM/CRAM/FASTQ the second source of reads (if pair-end).
--interleavedInput
-interleaved
boolean false Interleaved input.
--QUIET Boolean false Whether to suppress job-summary info on System.err.
--rawBarcodeQualityTag
-rawBarcodeQualityTag
List[String] [] Use the qualities encoded in this tag(s) as raw barcode qualities. Requires –rawBarcodeSequenceTags. WARNING: this tag(s) will be removed/updated as necessary.
--rawBarcodeSequenceTags
-rawBarcodeSequenceTags
List[String] [BC] Include the barcodes encoded in this tag(s) in the read name. Note: this is not necessary for input FASTQ files. WARNING: this tag(s) will be removed/updated as necessary.
--readValidationStringency
-VS
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
--reference
-R
String null Reference sequence file. Required for CRAM input.
--secondsBetweenProgressUpdates
-secondsBetweenProgressUpdates
double 10.0 Output traversal statistics every time this many seconds elapse.
--TMP_DIR List[File] [] Undocumented option
--use_jdk_deflater
-jdk_deflater
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use_jdk_inflater
-jdk_inflater
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
-verbosity
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
--showHidden
-showHidden
boolean false display hidden arguments