HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Molecule info

The cellranger pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI. This file is required by the R kit in order to produce read-subsampled gene-barcode matrices.

Molecule info columns

Each dataset in the molecule info file corresponds to a single column. Each row corresponds to a unique (cell-barcode, UMI, gene) tuple. There is an additional row per (cell-barcode, UMI) tuple that aggregates information about reads that could not be confidently mapped to a gene.

ColumnTypeDescription
barcodeuint642-bit encoded processed cell-barcode sequence.
gem_groupuint8When a sample is split across multiple channels, the GEM group identifies which channel a barcode came from.
geneuint32An integer corresponding to the gene this putative molecule mapped to. This is a zero-based index into the barcodes.tsv file that accompanies the gene-barcode matrices. When set to the maximum gene index + 1, this row describes reads that did not map confidently to any gene.
umiuint322-bit encoded processed UMI sequence.
readsuint32Number of reads that confidently mapped to this putative molecule.
nonconf_mapped_readsuint32The number of reads with this cell-barcode and UMI that mapped to the genome but did not map confidently to any gene.
unmapped_readsuint32The number of reads with this cell-barcode and UMI that did not map to the genome.

2-bit encoding

The cell-barcode and UMI sequences are 2-bit encoded as follows: