Cell Ranger3.0, printed on 06/09/2023
The cellranger pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid barcode and valid UMI and were assigned with high confidence to a gene or Feature Barcode. This HDF5 file contains data corresponding to the observed molecules, as well as data about the libraries, feature set(s), and barcode lists used for the analysis.
The following HDF5 datasets in the molecule info file correspond to columns of a table. Each row of that table corresponds to a unique (UMI, cell-barcode, feature) tuple indicating the feature best supported by the reads (i.e., including PCR duplicates) assigned to that UMI and cell-barcode. (If two or more features are tied for the number of supporting reads, as may happen for genes with very low mappability, then one row is output for each of the tied features.)
|uint64||A zero-based index into the |
|uint32||Number of reads associated with this putative molecule that were confidently mapped to the assigned feature.|
|uint32||A zero-based index into the feature list (see next section), indicating the feature to which this putative molecule was assigned.|
|uint16||Integer label that distinguishes data coming from distinct 10x GEM reactions (such as different channels or chips).|
|uint16||A zero-based index into the |
|uint32||2-bit encoded (see note below) processed (i.e. corrected) UMI sequence.|
In addition, the molecule info file has datasets corresponding to information about the libraries, barcode list(s), and feature set(s) that were used in the analysis.
At the top level of the HDF5 file hierarchy, the
library_info datasets provide information about the experiments contained in this analysis:
|string||A list of all cell-barcodes associated with this experiment (including those that were not observed). The |
|string||A JSON-formatted array of objects, where each object contains metadata for a single library. Each library will at a minimum contain the metadata |
The HDF5 group
barcode_info gives information regarding the barcodes that were called as cells during the analysis. This HDF5 group contains two columns
|string||A list of all genome references used for gene expression libraries in this analysis.|
|uint8||A matrix with three columns that contains one row per passing cell-barcode. Each row is a tuple |
The HDF5 group
features contains information regarding the feature reference(s) used for the analysis. Each dataset within the
features group represents a column in a table containing one row per feature. Values in the
feature_idx column described in the previous section provide indices into the rows of this hypothetical table.
In addition to the columns described below, user-specified tags may also be present. The dataset
_all_tag_keys contains a list of user-specified tags as well as built-in tags (
|string||The type of feature reference to which this feature belongs (Gene Expression, CRISPR Guide Capture, Antibody Capture, or Custom).|
|string||The genome reference for a given feature (e.g., "GRCh38" or "mm10"). For non-gene expression features, this entry is an empty string.|
|string||The unique id corresponding to this feature (for example, an Ensembl gene ID).|
|string||A human-readable name associated with this feature (for example, the common name associated with a gene).|
|string||[Feature Barcoding only] Specifies how to extract the Feature Barcode sequence from the read.|
|string||[Feature Barcoding only] Specifies which RNA sequencing read ("R1" or "R2") contains the Feature Barcode.|
|string||[Feature Barcoding only] Nucleotide barcode sequence associated with this feature (e.g., a sgRNA protospacer sequence).|
(root) | ├─ barcode_idx ├─ barcode_info [HDF5 group] │ ├─ genomes │ └─ pass_filter ├─ barcodes ├─ count ├─ feature_idx ├─ features [HDF5 group] │ ├─ _all_tag_keys │ ├─ feature_type │ ├─ genome │ ├─ id │ ├─ name │ ├─ pattern [Feature Barcoding only] │ ├─ read [Feature Barcoding only] │ └─ sequence [Feature Barcoding only] ├─ gem_group ├─ library_idx ├─ library_info ├─ metrics [HDF5 group; see below] └─ umi
The UMI sequences are 2-bit encoded as follows:
Please note that the cell-barcode sequences do not have this encoding; they are stored as plain strings in the
library_info HDF5 group.
metrics group is intended for internal use by the Cell Ranger pipeline; users should view metrics using the Cell Ranger metrics outputs.
The attributes of
metrics group contain pipeline metrics stored as serialized Python objects (using