Cell Ranger1.0, printed on 11/17/2024
The cellranger pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI. This file is required by the R kit in order to produce read-subsampled gene-barcode matrices.
Each dataset in the molecule info file corresponds to a single column. Each row corresponds to a unique (cell-barcode, UMI, gene) tuple. There is an additional row per (cell-barcode, UMI) tuple that aggregates information about reads that could not be confidently mapped to a gene.
Column | Type | Description |
---|---|---|
barcode | uint64 | 2-bit encoded processed cell-barcode sequence. |
gem_group | uint8 | When a sample is split across multiple channels, the GEM group identifies which channel a barcode came from. |
gene | uint32 | An integer corresponding to the gene this putative molecule mapped to. This is a zero-based index into the barcodes.tsv file that accompanies the gene-barcode matrices. When set to the maximum gene index + 1, this row describes reads that did not map confidently to any gene. |
umi | uint32 | 2-bit encoded processed UMI sequence. |
reads | uint32 | Number of reads that confidently mapped to this putative molecule. |
nonconf_mapped_reads | uint32 | The number of reads with this cell-barcode and UMI that mapped to the genome but did not map confidently to any gene. |
unmapped_reads | uint32 | The number of reads with this cell-barcode and UMI that did not map to the genome. |
The cell-barcode and UMI sequences are 2-bit encoded as follows: