Cell Ranger1.0, printed on 04/11/2021
The cellranger pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI. This file is required by the R kit in order to produce read-subsampled gene-barcode matrices.
Each dataset in the molecule info file corresponds to a single column. Each row corresponds to a unique (cell-barcode, UMI, gene) tuple. There is an additional row per (cell-barcode, UMI) tuple that aggregates information about reads that could not be confidently mapped to a gene.
|uint64||2-bit encoded processed cell-barcode sequence.|
|uint8||When a sample is split across multiple channels, the GEM group identifies which channel a barcode came from.|
|uint32||An integer corresponding to the gene this putative molecule mapped to. This is a zero-based index into the barcodes.tsv file that accompanies the gene-barcode matrices. When set to the maximum gene index + 1, this row describes reads that did not map confidently to any gene.|
|uint32||2-bit encoded processed UMI sequence.|
|uint32||Number of reads that confidently mapped to this putative molecule.|
|uint32||The number of reads with this cell-barcode and UMI that mapped to the genome but did not map confidently to any gene.|
|uint32||The number of reads with this cell-barcode and UMI that did not map to the genome.|