Long Ranger1.3, printed on 11/24/2024
The principal output of the longranger run pipeline includes aligned reads with barcode and phasing information in BAM format, phased SNPs and indels in VCF format, and SV calls and candidates in BEDPE format. These are all standard file formats designed to interoperate with existing tools, and the additional information produced by the GemCode Platform are included as standards-compliant fields when appropriate.
The following information assumes you are already familiar with the VCF format. A full description of the VCF standard is available online.
Long Ranger encodes SNP and indel phasing information in the GT
(genotype) and PS
(phase set) fields specified by the VCF standard.
It also stores barcode information supporting the phasing for each variant in
the BX
field using a comma-delimited string of the form
BC_STRINGref,BC_STRINGalt1,BC_STRINGalt2,...
Typically only the reference allele (BC_STRINGref) and one alternate allele (BC_STRINGalt1) will be defined. |
These BC_STRING
s are semicolon-delimited strings consisting of underscore-delimited strings:
BC1_QUAL1-1_QUAL1-2_...;BC2_QUAL2-1_QUAL2-2...
Where BC1 is the first barcode, and QUAL1-1 and QUAL1-2 are the observed Phred qualities of attaching those barcodes to the given allele.
For example, a BX
field that contains
AAAA_40_38;CCCC_40,GGGG_39
encodes two BC_STRINGs--one for the reference allele (AAAA_40_38;CCCC_40) and one for the alternate allele (GGGG_39):
the reference allele had three reads supporting it:
two with the barcode AAAA (one with Phred score of 40 and one with Phred score of 38)
one with the barcode CCCC and a Phred score of 40
the alternate allele has one read supporting it: one with barcode GGGG with Phred score of 39