Filter out reads that are over-soft-clipped

Description

Filter out reads where the number of bases without soft-clips (M, I, X, and = CIGAR operators) is lower than a threshold.

This filter is intended to filter out reads that are potentially from foreign organisms. From experience with sequencing of human DNA we have found cases of contamination by bacterial organisms; the symptoms of such contamination are a class of reads with only a small number of aligned bases and additionally many soft-clipped bases. This filter is intended to remove such reads.

Note: Consecutive soft-clipped blocks are treated as a single block. For example, 1S2S10M1S2S is treated as 3S10M3S

Arguments

Optional Arguments

Argument name(s) Type Default value(s) Description
--dontRequireSoftClipsBothEnds boolean false Allow a read to be filtered out based on having only 1 soft-clipped block. By default, both ends must have a soft-clipped block, setting this flag requires only 1 soft-clipped block
--filterTooShort int 30 Minimum number of aligned bases