Long Ranger2.0, printed on 05/16/2022
The longranger run pipeline outputs an indexed BAM file containing position-sorted, aligned reads. Each read in this BAM file has Chromium barcode and phasing information attached. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available online.
Chromium barcode information for each read is stored as TAG fields:
|Z||Chromium barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences. Use this for analysis.|
|Z||Sample index (I7) read.|
|Z||Sample index (I7) read quality. Phred scores as reported by sequencer.|
|Z||Raw Chromium barcode sequence. This read is subject to sequencing errors. Do not use for analysis.|
|Z||Raw Chromium barcode read quality. Phred scores as reported by sequencer.|
BX tag includes a suffix with a dash separator followed by a number:
This number denotes what we call a GEM group, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.
The following tags will also be present on reads that were confidently assigned to a haplotype.
|Z||Phase set containing this read|
|i||Haplotype of the molecule that generated the read|
|i||Global molecule identifier for molecule that generated this read|
Phase sets, defined in the VCF standard,
are regions within which identified haplotypes are mutually consistent. As a
HP tags are only comparable between reads that share a
PS. By definition, adjacent phase sets lack sufficient
Linked-Reads to determine the relationship between their haplotypes.