Filter out reads that are over-soft-clipped
Description
Filter out reads where the number of bases without soft-clips (M, I, X, and = CIGAR operators) is lower than a threshold.
This filter is intended to filter out reads that are potentially from foreign organisms. From experience with sequencing of human DNA we have found cases of contamination by bacterial organisms; the symptoms of such contamination are a class of reads with only a small number of aligned bases and additionally many soft-clipped bases. This filter is intended to remove such reads.
Note: Consecutive soft-clipped blocks are treated as a single block. For example, 1S2S10M1S2S is treated as 3S10M3S
Arguments
Optional Arguments
Argument name(s) | Type | Default value(s) | Description |
---|---|---|---|
--dontRequireSoftClipsBothEnds |
boolean | false | Allow a read to be filtered out based on having only 1 soft-clipped block. By default, both ends must have a soft-clipped block, setting this flag requires only 1 soft-clipped block |
--filterTooShort |
int | 30 | Minimum number of aligned bases |