Software  ›   pipelines

# Barcoded BAM

The cellranger pipeline outputs an indexed BAM file containing position-sorted reads aligned to the genome and transcriptome. Reads aligned to the transcriptome across exon junctions in the genome have a large gap in its CIGAR string i.e. 35M225N64M. Each read in this BAM file has Chromium cellular and molecular barcode information attached. Cell Ranger modifies MAPQ values; see the MM tag below. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available online.

## BAM Barcode Tags

Chromium cellular and molecular barcode information for each read is stored as TAG fields:

TagTypeDescription
CBZChromium cellular barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences.
CRZChromium cellular barcode sequence as reported by the sequencer.
CYZChromium cellular barcode read quality. Phred scores as reported by sequencer.
UBZChromium molecular barcode sequence that is error-corrected among other molecular barcodes with the same cellular barcode and gene alignment.
URZChromium molecular barcode sequence as reported by the sequencer.
UYZChromium molecular barcode read quality. Phred scores as reported by sequencer.
TRZTrimmed sequence. For the Single Cell 3' v1 chemistry, this is trailing sequence following the UMI on Read 2. For the Single Cell 3' v2 chemistry, this is trailing sequence following the cell and molecular barcodes on Read 1.

The cell barcode CB tag includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCAT-1

This number denotes what we call a GEM well, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.

## BAM Alignment Tags

The following tags will also be present on reads that mapped to the genome and overlapped an exon by at least one base pair. A read may align to multiple transcripts and genes, but it is only considered confidently mapped to the transcriptome if it mapped to a single gene.

TagTypeDescription
TXZPresent in reads aligned to the same strand as the transcripts in this semicolon-separated list that are compatible with this alignment. Transcripts are specified with the transcript_id key in the reference GTF attribute column. The format of each entry is [transcript_id],[strand][pos],[cigar], where strand is either + or -, pos is the alignment offset in transcript coordinates, and cigar is the CIGAR string in transcript coordinates.
ANZ Present for reads that are aligned to the antisense strand of annotated transcripts. If intron counts are not included (default, with include_introns=false), this tag is the same as the TX tag. If introns are included (include_introns=true), the AN tag contains the corresponding antisense gene identifier values (starting with ENSG) rather than transcript identifier values (starting with ENST).
GXZSemicolon-separated list of gene IDs that are compatible with this alignment. Gene IDs are specified with the gene_id key in the reference GTF attribute column.
GNZSemicolon-separated list of gene names that are compatible with this alignment. Gene names are specified with gene_name key in the reference GTF attribute column.
MMiSet to 1 if the genome-aligner (STAR) originally gave a MAPQ < 255 (it multi-mapped to the genome) and Cell Ranger changed it to 255 because the read overlapped exactly one gene.
REASingle character indicating the region type of this alignment (E = exonic, N = intronic, I = intergenic).
paiThe number of poly-A nucleotides trimmed from the 3' end of read 2. Up to 10% mismatches are permitted.
tsiThe number of template switch oligo (TSO) nucleotides trimmed from the 5' end of read 2. Up to 3 mismatches are permitted. The 30-bp TSO sequence is AAGCAGTGGTATCAACGCAGAGTACATGGG.
xfiExtra alignment flags. The bits of this tag are interpreted as follows:
• 1 - The read is confidently mapped to a feature
• 2 - The read maps to a feature that the majority of other reads with this UMI did not
• 4 - This read pair maps to a discordant pair of genes, and is not treated as a UMI count
• 8 - This read is representative for a transcriptomic molecule and can be treated as a UMI count
• 16 - This read maps to exactly one feature, and is identical to bit 8 for transcriptomic reads. Notably, this bit is set when a feature barcode read is treated as a UMI count, while bit 8 is not
• 32 - This read was removed by targeted UMI filtering.

## Feature Barcode Tags

Sequencing data passed in as a Feature Barcode library type is not aligned to the genome. The BAM file will contain unaligned records for these reads, with the following tags representing the Feature Barcode sequence extracted from the read, and the feature reference it was matched to, if any. The BAM read sequence will contain all the bases outside of the cell barcode and UMI regions.

TagTypeDescription
fbZChromium Feature Barcode sequence that is error-corrected and confirmed against known features barcode sequences from the feature reference.
frZChromium Feature Barcode sequence as reported by the sequencer.
fqZChromium Feature Barcode read quality. Phred scores as reported by sequencer.
fxZFeature identifier matched to this Feature Barcode read. Specified in the id column of the feature reference.