Computes proper-paired reads statistics over windows

Description

Computes statistics for properly-paired reads over non-overlapping windows.

Statistics are computed only for proper reads (mapped on the same contig). Nevertheless, statistics might use only single-read information (e.g., ContainIndelCounter, which counts the number of reads on the window containing indels) or from both pairs (e.g., PairIntegerTagCounter for NM<2 would count the number of reads on the window where both reads on the pair has more than 2 mismatches stored in the NM tag).

Caveats

  • Pair-end data is required even for computing only single read statistics.
  • Coordinate-sorted SAM/BAM/CRAM is required.
  • Intervals are not allowed in this tool. The statistics are computed over the genome.
  • It is recommended that the file includes all the pair-end data (not only a subset of the reads). Otherwise, missing pairs would not be used for the statistics.

Arguments

Required Arguments

Argument name(s) Type Description
--count-pair-int-tag-list List[String] Integer SAM tag to count for pairs
--count-pair-int-tag-operator-list List[RelationalOperator] Operation for the integer SAM tag (with respect to the threshold). Should be specified the same number of times as –count-pair-int-tag-list
--count-pair-int-tag-threshold-list List[Integer] Threshold for the integer SAM tag (with respect to the operation). Should be specified the same number of times as –count-pair-int-tag-list
--input
-I
List[String] BAM/SAM/CRAM file containing reads
--output
-O
String Tab-delimited output file with the statistic over the windows. A header defines the order of each statistic and the first column the window in the form contig:start-end.
--window-size Integer Window size to perform the analysis

Optional Arguments

Argument name(s) Type Default value(s) Description
--arguments_file List[File] [] read one or more arguments files and add them to the command line
--cloud-index-prefetch-buffer
-CIPB
int -1 Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset.
--cloud-prefetch-buffer
-CPB
int 40 Size of the cloud-only prefetch buffer (in MB; 0 to disable).
--contig List[String] [] Limit the computation to the provided contig(s). This argument is used instead of interval arguments and might be removed in the future if intervals are supported.
--disable-bam-index-caching
-DBIC
boolean false If true, don’t cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified.
--disable-sequence-dictionary-validation
-disable-sequence-dictionary-validation
boolean false If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!
--gcs_max_retries
-gcs_retries
int 20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
-h
boolean false display the help message
--interval-merging-rule
-imr
IntervalMergingRule ALL By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead.

Possible values: ALL, OVERLAPPING_ONLY
--intervals
-L
List[String] [] One or more genomic intervals over which to operate
--reference
-R
String null Reference sequence
--sites-only-vcf-output boolean false If true, don’t emit genotype fields when writing vcf file output.
--stat Set[Statistic] [] Statistics to compute (currently only for single-reads)
--version boolean false display the version number for this tool

Optional Common Arguments

Argument name(s) Type Default value(s) Description
--add-output-sam-program-record
-add-output-sam-program-record
boolean true If true, adds a PG tag to created SAM/BAM/CRAM files.
--add-output-vcf-command-line
-add-output-vcf-command-line
boolean true If true, adds a command line header line to created VCF files.
--create-output-bam-index
-OBI
boolean true If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
--create-output-bam-md5
-OBM
boolean false If true, create a MD5 digest for any BAM/SAM/CRAM file created
--create-output-variant-index
-OVI
boolean true If true, create a VCF index when writing a coordinate-sorted VCF file.
--create-output-variant-md5
-OVM
boolean false If true, create a a MD5 digest any VCF file created.
--disableReadFilter
-DF
List[String] [] Read filters to be disabled before analysis
--disableToolDefaultReadFilters
-disableToolDefaultReadFilters
boolean false Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on)
--exclude-intervals
-XL
List[String] [] Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals).
--forceOverwrite
-forceOverwrite
Boolean false Force output overwriting if it exists
--interval-exclusion-padding
-ixp
int 0 Use this to add padding to the intervals specified using -XL. For example, ‘-XL 1:100’ with a padding value of 20 would turn into ‘-XL 1:80-120’. This is typically used to add padding around targets when analyzing exomes.
--interval-padding
-ip
int 0 Use this to add padding to the intervals specified using -L. For example, ‘-L 1:100’ with a padding value of 20 would turn into ‘-L 1:80-120’. This is typically used to add padding around targets when analyzing exomes.
--interval-set-rule
-isr
IntervalSetRule UNION By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 –interval-set-rule INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.

Possible values: UNION, INTERSECTION
--lenient
-LE
boolean false Lenient processing of VCF files
--QUIET Boolean false Whether to suppress job-summary info on System.err.
--read-index
-read-index
List[String] [] Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically.
--read-validation-stringency
-VS
ValidationStringency SILENT Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Possible values: STRICT, LENIENT, SILENT
--readFilter
-RF
List[String] [] Read filters to be applied before analysis
--seconds-between-progress-updates
-seconds-between-progress-updates
double 10.0 Output traversal statistics every time this many seconds elapse
--sequence-dictionary
-sequence-dictionary
String null Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file.
--TMP_DIR List[File] [] Undocumented option
--use_jdk_deflater
-jdk_deflater
boolean false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use_jdk_inflater
-jdk_inflater
boolean false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
-verbosity
LogLevel INFO Control verbosity of logging.

Possible values: ERROR, WARNING, INFO, DEBUG

Advanced Arguments

Argument name(s) Type Default value(s) Description
--do-not-print-all boolean false If set, skip printing windows with 0 reads
--showHidden
-showHidden
boolean false display hidden arguments