10x Genomics
Visium Spatial Gene Expression

Space Ranger2.0, printed on 07/26/2024

Molecule Info

The spaceranger pipeline outputs an HDF5 file (molecule_info.h5) containing per-molecule information for all molecules that contain a valid barcode, valid UMI, and were assigned with high confidence to a gene or protein. This HDF5 file contains data corresponding to the observed molecules, as well as data about the libraries, features set(s), and barcode lists used for the analysis. Refer to the HDF5 Matrix format for more general information.

HDF5 file hierarchy
Per-molecule columns
Reference columns
2-bit encoding

HDF5 File Hierarchy

molecule_info.h5
│   ├── file_version
│   ├── filetype
├── barcode_idx
├── barcode_info [HDF5 group]
│   ├── genomes
│   └── pass_filter
├── barcodes
├── count
├── feature_idx
├── features [HDF5 group]
│   ├── _all_tag_keys
│   ├── feature_type
│   ├── genome
│   ├── id
│   ├── name
│   └── target_sets [HDF5 group]
│       ├── target panel CSV  [For Targeted GEX]
│       └── probe set reference CSV  [For Visium FFPE]   
├── gem_group
├── library_idx
├── library_info
├── metrics_json  [Contains Slide Serial Number and Capture Area information if supplied]
├── probe_idx          ---------------------|
├── probes [HDF5 group]                     |
│   ├── feature_id                          | [For Visium FFPE]
│   ├── feature_name                        |
│   ├── probe_id                            |
│   └── region         ---------------------|
│       [Present when v2 probe set reference CSV is used]      
├── umi
└── umi_type

The contents of the .h5 file can be examined using HDFView software or the h5dump command.

h5dump -n molecule_info.h5
 
HDF5 "molecule_info.h5" {
FILE_CONTENTS {
 group      /
 dataset    /barcode_idx
 group      /barcode_info
 dataset    /barcode_info/genomes
 dataset    /barcode_info/pass_filter
 dataset    /barcodes
 dataset    /count
 dataset    /feature_idx
 group      /features
 dataset    /features/_all_tag_keys
 dataset    /features/feature_type
 dataset    /features/genome
 dataset    /features/id
 dataset    /features/name
 group      /features/target_sets
 dataset    /features/target_sets/[target set name]
 dataset    /gem_group
 dataset    /library_idx
 dataset    /library_info
 dataset    /metrics_json
 dataset    /probe_idx
 group      /probes
 dataset    /probes/feature_id
 dataset    /probes/feature_name
 dataset    /probes/probe_id
 dataset    /umi
 dataset    /umi_type
 }
}

Per-molecule columns

The following HDF5 datasets in the molecule info file correspond to columns of a table. Each row of that table corresponds to a unique (UMI, spot-barcode, feature) tuple indicating the feature best supported by the reads (i.e., including PCR duplicates) assigned to that UMI and spot-barcode.

Column	Type	Description
`barcode_idx`	uint64	A zero-based index into the `barcodes` dataset (see next section), indicating the spot-barcode assigned to this putative molecule.
`count`	uint32	Number of reads associated with this putative molecule that were confidently mapped to the assigned feature.
`feature_idx`	uint32	A zero-based index into the feature list (see next section), indicating the feature to which this putative molecule was assigned.
`gem_group`	uint16	Integer label that is currently one (1) for all Space Ranger output.
`library_idx`	uint16	Integer label that is currently one (1) for all Space Ranger output.
`umi`	uint32	2-bit encoded (see note below) processed (i.e. corrected) UMI sequence.
`umi_type`	uint32	A boolean array specifying whether the molecule aligned to an exonic (1) or intronic (0) region of the associated feature.
`probe_idx`	uint32	Present only when probe set reference CSV is used. A zero-based index into the `probes` dataset (see Probe reference section), indicating the probe with which this transcript was captured.

Reference columns

In addition, the molecule info file has datasets corresponding to information about the libraries, barcode list(s), and feature set(s) that were used in the analysis.

Experiment reference

At the top level of the HDF5 file hierarchy, the barcodes, library_info and metrics_json datasets provide information about the experiments contained in this analysis:

Dataset	Type	Description
`barcodes`	string	A list of all spot-barcodes associated with this experiment (including those that were not observed). The `barcode_idx` column described in the previous section contains indices into this list of barcodes. Each spot-barcode sequence has a trailing digit that is currently one (1) in output generated from Space Ranger (e.g., `AGAATGGTCTGCAT-1`).
`library_info`	string	A JSON-formatted array of objects, where each object contains metadata for a single library. Each library will at a minimum contain the metadata `library_id`, `library_type`, and `gem_group`
`metrics_json`	string	Pipeline metrics in JSON format that are used internally by Space Ranger (more details on metrics page). From Space Ranger v2.0 onwards, this file also contains the slide serial number and capture area information if it was supplied to `spaceranger count` pipeline.

Observed spot-barcodes

The HDF5 group barcode_info gives information regarding the barcodes determined to be underneath the tissue. This HDF5 group contains two columns:

Dataset	Type	Description
`genomes`	string	A list of all genome references used for gene expression libraries in this analysis.
`pass_filter`	uint64	A matrix with three columns that contains one row per passing spot-barcode. Each row is a tuple (`barcode_idx, library_idx, genome_idx`), where `genome_idx` is an index into the `genomes` dataset.

Feature reference

The HDF5 group features contains information regarding the feature reference(s) used for the analysis. The datasets within the features group represent columns in a table containing one row per feature (gene). Values in the feature_idx column described in the previous section provide indices into the rows of this table of features.

In addition to the columns described below, user-specified tags may also be present. The dataset _all_tag_keys contains a list of user-specified tags as well as built-in tags (e.g. genome).

Column	Type	Description
`feature_type`	string	The type of feature reference to which this feature belongs i.e. Gene Expression.
`genome`	string	The genome reference for a given feature (e.g., "GRCh38" or "mm10").
`id`	string	The unique id corresponding to this feature (for example, an Ensembl gene ID).
`name`	string	A human-readable name associated with this feature (for example, the common name associated with a gene).

The features group also contains an HDF5 group target_sets which contain the probe set reference CSV for Visium FFPE samples and target panel CSV for Targeted Gene Expression. When a target gene panel is present, indices of the target genes are stored inside target_sets, in an HDF5 dataset named after the target gene panel (e.g., "Human Gene Signature").

Probe reference

Present only when probe set reference CSV is used. The HDF5 group probes contains information regarding the probe set used for the analysis. The datasets within the probes group represent the columns in a table containing one row per probe. Values in the probe_idx column described in the previous section provide indices into the rows of this table of probes.

Column	Type	Description
`feature_name`	string	The name of the feature (gene) targeted by this probe.
`feature_id`	string	The Ensembl gene identifier of the gene targeted by this probe.
`probe_id`	string	A unique identifier assigned to each probe.
`region`	string	Present only when v2 probe set reference CSV is used. The region targeted by the probe may be either `spliced` (overlapping a splice junction on the gene) or `unspliced`.

2-bit encoding

The UMI sequences are 2-bit encoded as follows:

Each pair of bits encodes a nucleotide (0="A", 1="C", 2="G", 3="T").
The least significant byte (LSB) contains the 3'-most nucleotides.

Note that the spot-barcode sequences do not have this encoding. Instead, they are stored as plain strings in the library_info HDF5 group.

Space Ranger

Loupe

10x Genomics
Visium Spatial Gene Expression

Molecule Info

Table of Contents

HDF5 File Hierarchy

Per-molecule columns

Reference columns

Experiment reference

Observed spot-barcodes

Feature reference

Probe reference

2-bit encoding

About

Legal Notices

Resources

Headquarters

Social

Space Ranger

Loupe

10x GenomicsVisium Spatial Gene Expression

Molecule Info

Table of Contents

HDF5 File Hierarchy

Per-molecule columns

Reference columns

Experiment reference

Observed spot-barcodes

Feature reference

Probe reference

2-bit encoding

10x Genomics
Visium Spatial Gene Expression