Software  ›   pipelines
If your question is not answered here, please email us at:  support@10xgenomics.com

10x Genomics
Chromium Single Cell Gene Expression

Barcoded BAM

The cellranger pipeline outputs an indexed BAM file containing position-sorted reads aligned to the genome and transcriptome. Reads aligned to the transcriptome across exon junctions in the genome have a large gap in its CIGAR string i.e. 35M225N64M. Each read in this BAM file has Chromium cellular and molecular barcode information attached. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available online.

BAM Barcode Tags

Chromium cellular and molecular barcode information for each read is stored as TAG fields:

TagTypeDescription
CBZChromium cellular barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences.
CRZChromium cellular barcode sequence as reported by the sequencer.
CQZChromium cellular barcode read quality. Phred scores as reported by sequencer.
UBZChromium molecular barcode sequence that is error-corrected among other molecular barcodes with the same cellular barcode and gene alignment.
URZChromium molecular barcode sequence as reported by the sequencer.
UQZChromium molecular barcode read quality. Phred scores as reported by sequencer.
BCZSample index (I5) read.
QTZSample index (I5) read quality. Phred scores as reported by sequencer.

The cell barcode CB tag includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCAT-1

This number denotes what we call a GEM group, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.

BAM Alignment Tags

The following tags will also be present on reads that mapped to the genome and overlapped an exon by at least one base pair. A read may align to multiple transcripts and genes, but it is only considered confidently mapped to the transcriptome it if mapped to a single gene.

TagTypeDescription
TXZSemicolon-separated list of transcripts which read aligns to. Transcripts are specified with the transcript_id key in the reference GTF attribute column.
GXZSemicolon-separated list of gene IDs which read aligns to. Gene IDs are specified with the gene_id key in the reference GTF attribute column.
GNZSemicolon-separated list of gene names which read aligns to. Gene names are specified with gene_name key in the reference GTF attribute column.