HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Molecule info

The cellranger pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI. This file is required by the R kit in order to produce read-subsampled gene-barcode matrices. This HDF5 file contains data corresponding to the observed molecules, as well as data corresponding to the reference transcriptome that was used.

Molecule info columns

The following datasets in the molecule info file correspond to columns of a table. Each row of that table corresponds to a unique (cell-barcode, UMI, gene) tuple. There is an additional row per (cell-barcode, UMI) tuple that aggregates information about reads that could not be confidently mapped to a gene.

ColumnTypeDescription
barcodeuint642-bit encoded processed cell-barcode sequence.
barcode_corrected_readsuint32Number of reads within this putative molecule that had their cell-barcode corrected.
conf_mapped_uniq_read_posuint32Number of unique read mapping positions associated with this putative molecule.
gem_groupuint8Integer label that distinguishes data coming from distinct 10x GEM reactions (such as different channels or chips).
geneuint32A zero-based index into the gene_ids field (see next section), indicating the gene to which this putative molecule was mapped. When set to the maximum gene index + 1, this row describes reads that did not map confidently to any gene.
genomeuint32A zero-based index into the genome_ids field (see next section), indicating the genome to which this putative molecule was mapped. When set to the maximum genome index + 1, this row describes reads that did not map confidently to any genome.
nonconf_mapped_readsuint32The number of reads with this cell-barcode and UMI that mapped to the genome but did not map confidently to any gene.
readsuint32Number of reads that confidently mapped to this putative molecule.
umiuint322-bit encoded processed UMI sequence.
umi_corrected_readsuint32Number of reads within this putative molecule that had their UMI corrected.
unmapped_readsuint32The number of reads with this cell-barcode and UMI that did not map to the genome.

Molecule reference columns

In addition, the molecule info has a few datasets corresponding to the reference transcriptome(s) associated with this analysis.

ColumnTypeDescription
gene_idsstringThe Ensembl gene IDs contained in this reference. The gene column defined in the previous section is an index into this array.
gene_namesstringThe common gene symbol associated with each of the above gene_ids.
genome_idsstringThe list of genomes represented in this reference. In most cases, this will be a single genome. The genome column defined in the previous section is an index into this array.

2-bit encoding

The cell-barcode and UMI sequences are 2-bit encoded as follows: