Cell Ranger ARC1.0, printed on 05/28/2023
The cellranger-arc pipeline outputs an HDF5 file containing per-molecule information for all molecules that contain a valid barcode and valid UMI and were assigned with high confidence to a gene. This HDF5 file contains data corresponding to the observed molecules, as well as data about the libraries, features, and barcode lists used for the analysis.
(root) ├─ barcode_idx ├─ barcode_info [HDF5 group] │ ├─ genomes │ └─ pass_filter ├─ barcodes ├─ count ├─ feature_idx ├─ features [HDF5 group] │ ├─ _all_tag_keys │ ├─ feature_type │ ├─ genome │ ├─ id │ └─ name ├─ gem_group ├─ library_idx ├─ library_info ├─ metrics_json ├─ umi └─ umi_type
The following HDF5 datasets in the molecule information file correspond to columns of a table. Each row of that table corresponds to a unique molecule specified by (UMI, cell-barcode, feature) tuple. This tuple indicates the feature best supported by the reads (including PCR duplicates) assigned to that unique pairing of UMI and 10x barcode.
|uint64||A zero-based index into the |
|uint32||Number of reads associated with this putative molecule that were confidently mapped to the assigned feature.|
|uint32||A zero-based index into the |
|uint16||Integer label that distinguishes data derived from distinct 10x GEM reactions (such as different chip or chip channels).|
|uint16||A zero-based index into the |
|uint32||2-bit encoded (see note below) processed (i.e. corrected) UMI sequence.|
|uint32||A boolean array specifying whether the molecule aligned to an exonic (1) or intronic (0) region of the associated feature.|
library_info datasets provide information about the experiments contained in this analysis.
|string||A list of all 10x barcodes associated with this experiment (including those that were not observed). The |
|string||A JSON-formatted array of objects, where each object contains metadata for a single library. Each library will at a minimum contain the metadata |
The HDF5 group
barcode_info provides information regarding the barcodes that were called as cells during the analysis. This HDF5 group contains two columns.
|string||A list of all genome references used in this analysis. In most cases, this will be a single genome.|
|uint64||A matrix with three columns that contains one row per cell-barcode. Each row is a tuple |
The HDF5 group
features contains information regarding the feature reference used for the analysis. The datasets within the
features group represent columns of a table containing one row per feature. Values in the
feature_idx column described in the previous section provide indices into the rows of this table.
In addition to the columns described below,
_all_tag_keys contains a list of built-in tags (
|string||The type of feature reference to which this feature belongs (Gene Expression).|
|string||The genome reference for a given feature (e.g., "GRCh38" or "mm10").|
|string||The The Ensembl gene ID corresponding to this feature.|
|string||The common gene symbol associated with each of the above |
The UMI sequences are 2-bit encoded as follows:
Note that the cell-barcode sequences do not have this encoding. Instead, they are stored as plain strings in the
library_info/barcodes HDF5 dataset.
metrics_json dataset contains pipeline metrics in JSON format that are used internally by Cell Ranger. Users should view metrics using the Cell Ranger ARC metrics outputs.