Cell Ranger3.1, printed on 12/03/2024
The cellranger pipeline outputs an indexed BAM file containing position-sorted reads aligned to the genome and transcriptome. Reads aligned to the transcriptome across exon junctions in the genome have a large gap in its CIGAR string i.e. 35M225N64M. Each read in this BAM file has Chromium cellular and molecular barcode information attached. Cell Ranger modifies MAPQ values; see the MM tag below. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available online.
Chromium cellular and molecular barcode information for each read is stored as TAG fields:
Tag | Type | Description |
---|---|---|
CB | Z | Chromium cellular barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences. |
CR | Z | Chromium cellular barcode sequence as reported by the sequencer. |
CY | Z | Chromium cellular barcode read quality. Phred scores as reported by sequencer. |
UB | Z | Chromium molecular barcode sequence that is error-corrected among other molecular barcodes with the same cellular barcode and gene alignment. |
UR | Z | Chromium molecular barcode sequence as reported by the sequencer. |
UY | Z | Chromium molecular barcode read quality. Phred scores as reported by sequencer. |
BC | Z | Sample index read. |
QT | Z | Sample index read quality. Phred scores as reported by sequencer. |
TR | Z | Trimmed sequence. For the Single Cell 3' v1 chemistry, this is trailing sequence following the UMI on Read 2. For the Single Cell 3' v2 chemistry, this is trailing sequence following the cell and molecular barcodes on Read 1. |
xf | i | Extra alignment flags. The bit flags can be interpreted as follows: 1 - The read is confidently mapped to a feature; 2 - The read maps to a feature that the majority of other reads with this UMI did not; 8 - This read is representative for the molecule and can be treated as a UMI count. Bits 4, 16 and 32 are used internally by 10X. |
The cell barcode CB
tag includes a suffix with a dash separator followed by a number:
AGAATGGTCTGCAT-1
This number denotes what we call a GEM well, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.
The following tags will also be present on reads that mapped to the genome and overlapped an exon by at least one base pair. A read may align to multiple transcripts and genes, but it is only considered confidently mapped to the transcriptome it if mapped to a single gene.
Sequencing data passed in as a Feature Barcoding library type is not aligned to the genome, but processed by the Feature Barcoding read processor. The BAM file will contain unaligned records for these reads, with the following tags representing the Feature Barcode sequence extracted from the read, and the feature reference it was matched to, if any. The BAM read sequence will contain all the bases outside of the cell barcode and UMI regions.
Tag | Type | Description |
---|---|---|
fb | Z | Chromium Feature Barcode sequence that is error-corrected and confirmed against known features barcode sequences from the feature reference. |
fr | Z | Chromium Feature Barcode sequence as reported by the sequencer. |
fq | Z | Chromium Feature Barcode read quality. Phred scores as reported by sequencer. |
fx | Z | Feature identifier matched to this Feature Barcode read. Specified in the id column of the feature reference. |