How ReadTools handle barcode information
ReadTools handle automatically barcode information in the read name for FASTQ files with Illumina legacy and Casava formatting.
For SAM/BAM/CRAM files, we assume that the sample barcode is encoded in the ‘BC’ tag as described in the SAM specifications.
In case that these files were obtained from FASTQ files without ReadTools, it may be possible that the barcode is still attached to the read name. Please, use the option
--barcodeInReadName for that input formats.
Barcode file format
The barcode file format of ReadTools follows the same format as the ExtractIlluminaBarcodes tool from Picard: tab-delimited table with named columns for including information. It could contain other columns, but the following are used for assigning Read Groups by barcode:
- barcode_sequence or barcode_sequence_1: sequence for the first barcode.
- barcode_sequence_2: sequence for the second barcode in dual indexed libraries. Only required if more than one barcode is expected.
- sample_name or barcode_name: name for the sample, which will appear in the
SMRead Group field.
- library_name: the name for the library, which will appear in the
LBRead Group field.
sample_name library_name barcode_sequence barcode_sequence_2 sample1 lib1 ATTACTCG ATAGAGGC sample2 lib2 TCCGGAGA CCTATCCT sample3 lib3 CGCTCATT GGCTCTGA sample4 lib4 GAGATTCC AGGCGAAG sample5 lib5 ATTCAGAA TAATCTTA sample6 lib1 GAATTCGT CAGGACGT sample7 lib2 CTGAAGCT GTACTGAC sample8 lib3 TAATGCGC TATAGCCT sample9 lib4 CGGCTATG ATAGAGGC sample1 lib5 TCCGCGAA CCTATCCT