Keep only reads that are well-formed

Description

Tests whether a read is "well-formed" – that is, is free of major internal inconsistencies and issues that could lead to errors downstream. If a read passes this filter, the rest of the engine should be able to process it without blowing up.

Well-formed reads definition

  • Alignment coordinates: start larger than 0 and end after the start position.
  • Alignment agrees with header: contig exists and start is within its range.
  • Read Group and Sequence are present
  • Consistent read length: bases match in length with the qualities and the CIGAR string.</b>
  • Do not contain skipped regions: represented by the 'N' operator in the CIGAR string.

See additional information in the following pages: