GemCode Genome & Exome

Phased VCF

The principal output of the longranger run pipeline includes aligned reads with barcode and phasing information in BAM format, phased SNPs and indels in VCF format, and SV calls and candidates in BEDPE format. These are all standard file formats designed to interoperate with existing tools, and the additional information produced by the GemCode Platform are included as standards-compliant fields when appropriate.

The following information assumes you are already familiar with the VCF format. A full description of the VCF standard is available online.

Phased VCF Files

Long Ranger encodes SNP and indel phasing information in the GT (genotype) and PS (phase set) fields specified by the VCF standard. It also stores barcode information supporting the phasing for each variant in the BX field using a comma-delimited string of the form


These BC_STRINGs are semicolon-delimited strings consisting of underscore-delimited strings:


Where BC1 is the first barcode, and QUAL1-1 and QUAL1-2 are the observed Phred qualities of attaching those barcodes to the given allele.

For example, a BX field that contains


encodes two BC_STRINGs--one for the reference allele (AAAA_40_38;CCCC_40) and one for the alternate allele (GGGG_39):