Computes proper-paired reads statistics over windows
Warning: This a EXPERIMENTAL tool and should not be used for production
Description
Computes statistics for properly-paired reads over non-overlapping windows.
Statistics are computed only for proper reads (mapped on the same contig). Nevertheless,
statistics might use only single-read information (e.g., ContainIndelCounter
,
which counts the number of reads on the window containing indels) or from both pairs (e.g.,
PairIntegerTagCounter
for NM<2 would count the number of reads on the window where
both reads on the pair has more than 2 mismatches stored in the NM tag).
Caveats
- Pair-end data is required even for computing only single read statistics.
- Coordinate-sorted SAM/BAM/CRAM is required.
- Intervals are not allowed in this tool. The statistics are computed over the genome.
- It is recommended that the file includes all the pair-end data (not only a subset of the reads). Otherwise, missing pairs would not be used for the statistics.
Warning: Please, note that disabling default read filters on this tool will produce
wrong results.
Note: In this tool, proper pairs are defined as mapping on the same contig, without
taking into consideration the SAM flag (0x2).
Arguments
Required Arguments
Argument name(s) | Type | Description |
---|---|---|
--count-pair-int-tag-list |
List[String] | Integer SAM tag to count for pairs |
--count-pair-int-tag-operator-list |
List[RelationalOperator] | Operation for the integer SAM tag (with respect to the threshold). Should be specified the same number of times as –count-pair-int-tag-list |
--count-pair-int-tag-threshold-list |
List[Integer] | Threshold for the integer SAM tag (with respect to the operation). Should be specified the same number of times as –count-pair-int-tag-list |
--input -I |
List[String] | BAM/SAM/CRAM file containing reads |
--output -O |
String | Tab-delimited output file with the statistic over the windows. A header defines the order of each statistic and the first column the window in the form contig:start-end. |
--window-size |
Integer | Window size to perform the analysis |
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--arguments_file |
List[File] | [] | read one or more arguments files and add them to the command line |
--cloud-index-prefetch-buffer -CIPB |
int | -1 | Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset. |
--cloud-prefetch-buffer -CPB |
int | 40 | Size of the cloud-only prefetch buffer (in MB; 0 to disable). |
--contig |
List[String] | [] | Limit the computation to the provided contig(s). This argument is used instead of interval arguments and might be removed in the future if intervals are supported. |
--disable-bam-index-caching -DBIC |
boolean | false | If true, don’t cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified. |
--disable-sequence-dictionary-validation -disable-sequence-dictionary-validation |
boolean | false | If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk! |
--gcs_max_retries -gcs_retries |
int | 20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection |
--help -h |
boolean | false | display the help message |
--interval-merging-rule -imr |
IntervalMergingRule | ALL | By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead. Possible values: ALL, OVERLAPPING_ONLY |
--intervals -L |
List[String] | [] | One or more genomic intervals over which to operate |
--reference -R |
String | null | Reference sequence |
--sites-only-vcf-output |
boolean | false | If true, don’t emit genotype fields when writing vcf file output. |
--stat |
Set[Statistic] | [] | Statistics to compute (currently only for single-reads) |
--version |
boolean | false | display the version number for this tool |
Optional Common Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--add-output-sam-program-record -add-output-sam-program-record |
boolean | true | If true, adds a PG tag to created SAM/BAM/CRAM files. |
--add-output-vcf-command-line -add-output-vcf-command-line |
boolean | true | If true, adds a command line header line to created VCF files. |
--create-output-bam-index -OBI |
boolean | true | If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. |
--create-output-bam-md5 -OBM |
boolean | false | If true, create a MD5 digest for any BAM/SAM/CRAM file created |
--create-output-variant-index -OVI |
boolean | true | If true, create a VCF index when writing a coordinate-sorted VCF file. |
--create-output-variant-md5 -OVM |
boolean | false | If true, create a a MD5 digest any VCF file created. |
--disableReadFilter -DF |
List[String] | [] | Read filters to be disabled before analysis |
--disableToolDefaultReadFilters -disableToolDefaultReadFilters |
boolean | false | Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on) |
--exclude-intervals -XL |
List[String] | [] | Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals). |
--forceOverwrite -forceOverwrite |
Boolean | false | Force output overwriting if it exists |
--interval-exclusion-padding -ixp |
int | 0 | Use this to add padding to the intervals specified using -XL. For example, ‘-XL 1:100’ with a padding value of 20 would turn into ‘-XL 1:80-120’. This is typically used to add padding around targets when analyzing exomes. |
--interval-padding -ip |
int | 0 | Use this to add padding to the intervals specified using -L. For example, ‘-L 1:100’ with a padding value of 20 would turn into ‘-L 1:80-120’. This is typically used to add padding around targets when analyzing exomes. |
--interval-set-rule -isr |
IntervalSetRule | UNION | By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 –interval-set-rule INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set. Possible values: UNION, INTERSECTION |
--lenient -LE |
boolean | false | Lenient processing of VCF files |
--QUIET |
Boolean | false | Whether to suppress job-summary info on System.err. |
--read-index -read-index |
List[String] | [] | Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically. |
--read-validation-stringency -VS |
ValidationStringency | SILENT | Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Possible values: STRICT, LENIENT, SILENT |
--readFilter -RF |
List[String] | [] | Read filters to be applied before analysis |
--seconds-between-progress-updates -seconds-between-progress-updates |
double | 10.0 | Output traversal statistics every time this many seconds elapse |
--sequence-dictionary -sequence-dictionary |
String | null | Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file. |
--TMP_DIR |
List[File] | [] | Undocumented option |
--use_jdk_deflater -jdk_deflater |
boolean | false | Whether to use the JdkDeflater (as opposed to IntelDeflater) |
--use_jdk_inflater -jdk_inflater |
boolean | false | Whether to use the JdkInflater (as opposed to IntelInflater) |
--verbosity -verbosity |
LogLevel | INFO | Control verbosity of logging. Possible values: ERROR, WARNING, INFO, DEBUG |
Advanced Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--do-not-print-all |
boolean | false | If set, skip printing windows with 0 reads |
--showHidden -showHidden |
boolean | false | display hidden arguments |