Software  ›   pipelines

# Barcoded BAM

The longranger pipeline outputs an indexed BAM file containing position-sorted, aligned reads. Each read in this BAM file has Chromium barcode and phasing information attached. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available at the hts-specs website. Long Ranger follows standardized tags, and also adds some additional tags that are in the process of being standardized. 10x also provides a bamtofastq tool to convert BAM files produced by longranger back to FASTQs that can be used to re-run longranger.

## BAM Barcode Tags

Chromium barcode information for each read is stored as TAG fields:

TagTypeDescription
BXZChromium barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences. Use this for analysis.
BCZSample index (I7) read.
QTZSample index (I7) read quality. Phred scores as reported by sequencer.
RXZRaw Chromium barcode sequence. This read is subject to sequencing errors. Do not use for analysis.
QXZRaw Chromium barcode read quality. Phred scores as reported by sequencer.
TRZSequence of the 7 trimmed bases following the barcode sequence at the start of R1. Can be used to reconstruct the original R1 sequence.
TQZQuality values of the 7 trimmed bases following the barcode sequence at the start of R1. Can be used to reconstruct the original R1 quality values.

The BX tag includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCATCG-1

This number denotes what we call a GEM group, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.

## BAM Phasing Tags

The following tags will also be present on reads that were confidently assigned to a haplotype.

TagTypeDescription
PCiPhred-scaled confidence that this read was phased correctly.
PSiPhase set containing this read. This corresponds to the phase set (PS) field in the VCF file. The value is the position of the first SNP in the phase block.
HPiHaplotype of the molecule that generated the read.
MIiGlobal molecule identifier for molecule that generated this read.

Phase sets, defined in the VCF standard, are regions within which identified haplotypes are mutually consistent. As a result, HP tags are only comparable between reads that share a common PS. By definition, adjacent phase sets lack sufficient Linked-Reads to determine the relationship between their haplotypes.

## Lariat Alignment Tags

The Lariat aligner used by Long Ranger uses the long range information carried in the barcodes to improve mapping into duplicated regions of the genome. Lariat emits extra non-standard tags that indicate how the alignment results were affected by the molecule inference process. If Lariat finds strong evidence that a molecule must exist at particular locus, it can boost the MAPQ of ambiguous reads that have an alignment inside the molecule by roughly 40 MAPQ points.

For more details see the paper that Lariat is based on: Read clouds uncover variation in complex regions of the human genome (Bishara et al)

TagTypeDescription
ASiDefined in SAM spec. The alignment score of read to genome sequence, at the mapping location selected for this read. The score includes match, indel, and clipping and mate pairing penalties, but excluding any molecule scoring. Note that the scaling and offset of this field differ from BWA. Perfect alignments will have a score of 0, and alignment penalties will reduce the score below 0.
XSiThe alignment score of read to genome sequence (see AS tag), at the second best mapping for this read. Because Lariat also considers molecule scoring when selecting the best mapping, it may not choose the one with with the best reported alignment score. For this reason XS may be greater than AS.
AMA1 if this alignment in a long molecule, 0 otherwise. Alignments in long molecules will have their MAPQ boosted above alternative alignments not in molecules.
XMA1 if second best alignment is in a long molecule, 0 otherwise.
XTiIndicate if there is tandem duplication affecting this alignment. 1 if second best alignment is in the same molecule as the best alignment, 0 otherwise.