HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Visium Spatial Gene Expression

Feature-Barcode Matrices

The spaceranger pipeline outputs two types of feature-barcode matrices described in the table below. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column).

Type Description
Unfiltered feature-barcode matrixContains every barcode from fixed list of known-good barcode sequences. This includes background and tissue-associated barcodes.
Filtered feature-barcode matrixContains only tissue-associated barcodes under tissue. For Targeted Gene Expression samples, non-targeted genes are removed from the filtered matrix.

Each matrix is stored in the Market Exchange Format (MEX) for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. For example, the matrices output may look like:

$ cd /home/jdoe/runs/sample345/outs
$ tree filtered_feature_bc_matrix
filtered_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz

0 directories, 3 files

Features correspond to row indices. For each feature, its feature ID and name are stored in the first and second column of the (unzipped) features.tsv.gz file, respectively. The third column identifies the type of feature, which will be Gene Expression. Below is a minimal example features.tsv.gz file showing data collected for 3 genes.

$ gzip -cd filtered_feature_bc_matrix/features.tsv.gz
ENSG00000141510       TP53         Gene Expression
ENSG00000012048       BRCA1        Gene Expression
ENSG00000139687       RB1          Gene Expression

For Gene Expression data, the ID corresponds to gene_id in the annotation field of the reference GTF. Similarly, the name corresponds to gene_name in the annotation field of the reference GTF. If no gene_name field is present in the reference GTF, gene name is equivalent to gene ID.

For multi-species experiments, gene IDs and names are prefixed with the genome name to avoid name collisions between genes of different species e.g. GAPDH becomes hg19_GAPDH and Gm15816 becomes mm10_Gm15816.

Barcode sequences correspond to column indices.

$ gzip -cd  filtered_feature_bc_matrices/hg19/barcodes.tsv
AAACATACAAAACG-1
AAACATACAAAAGC-1
AAACATACAAACAG-1
AAACATACAAACGA-1
AAACATACAAAGCA-1
AAACATACAAAGTG-1
AAACATACAACAGA-1
AAACATACAACCAC-1
AAACATACAACCGT-1
AAACATACAACCTG-1

Each barcode sequence includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCAT-1

More details on the barcode sequence format are available in the barcoded BAM section.

R and Python support the MEX format, and sparse matrices can be used for more efficient manipulation.

Loading Matrices into R

The R package Matrix supports loading MEX format data, and can be easily used to load the sparse feature-barcode matrix, as shown in the example code below.

library(Matrix)
matrix_dir = "/opt/sample345/outs/filtered_feature_bc_matrix/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path, 
                           header = FALSE,
                           stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path, 
                           header = FALSE,
                           stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1

Loading Matrices into Python

The csv, os, gzip and scipy.io modules can be used to load a feature-barcode matrix into Python as shown below.

import csv
import gzip
import os
import scipy.io
 

matrix_dir = "/opt/sample345/outs/filtered_feature_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz"))

features_path = os.path.join(matrix_dir, "features.tsv.gz")
feature_ids = [row[0] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
gene_names = [row[1] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
feature_types = [row[2] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz")
barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path), delimiter="\t")]

Converting to CSV Format

Space Ranger represents the feature-barcode matrix using sparse formats (only the nonzero entries are stored) in order to minimize file size. All of our programs, and many other programs for gene expression analysis, support sparse formats.

However, certain programs (e.g. Excel) only support dense formats (where every row-column entry is explicitly stored, even if it's a zero). You can convert a feature-barcode matrix to dense CSV format using the spaceranger mat2csv command. This command takes two arguments - an input matrix generated by Space Ranger (either an H5 file or a MEX directory), and an output path for the dense CSV. For example, to convert a matrix from a pipestance named sample123 in the current directory, either of the following commands would work:

# convert from MEX
$ spaceranger mat2csv sample123/outs/filtered_feature_bc_matrix sample123.csv
# or, convert from H5
$ spaceranger mat2csv sample123/outs/filtered_feature_bc_matrix.h5 sample123.csv

You can then load sample123.csv into Excel.

WARNING: dense files can be very large and may cause Excel to crash, or even fail in mat2csv if your computer doesn't have enough memory. For example, a feature-barcode matrix from a human reference (~33k genes) with ~3k barcodes uses at least 200MB of disk space.