HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Gene-Barcode Matrices

The cellranger pipeline outputs two types of gene-barcode matrices.

TypeDescription
Unfiltered gene-barcode matricesContains every barcode from fixed list of known-good barcode sequences. This includes background and non-cellular barcodes.
Filtered gene-barcode matricesContains only detected cellular barcodes.

The cellranger pipeline generates a gene-barcode matrix per species. Each matrix is stored in Market Exchange Format (MEX). It also contains TSV files with genes and barcode sequences corresponding to row and column indices, respectively. For example, if cellranger is run with a human reference, the matrices output may look like:

$ cd /home/jdoe/runs/sample345/outs
$ tree filtered_gene_bc_matrices
filtered_gene_bc_matrices
└── hg19
    ├── barcodes.tsv
    ├── genes.tsv
    └── matrix.mtx
2 directories, 3 files

Genes correspond to row indices. For each gene, its gene ID and gene name are stored in the first and second column of the genes.tsv file, respectively.

$ head filtered_gene_bc_matrices/hg19/genes.tsv
ENSG00000243485    MIR1302-10
ENSG00000237613    FAM138A
ENSG00000186092    OR4F5
ENSG00000238009    RP11-34P13.7
ENSG00000239945    RP11-34P13.8
ENSG00000237683    AL627309.1
ENSG00000239906    RP11-34P13.14
ENSG00000241599    RP11-34P13.9
ENSG00000228463    AP006222.2
ENSG00000237094    RP4-669L17.10

Gene ID corresponds to gene_id in the annotation field of the reference GTF. Similarly, gene name corresponds to gene_name in the annotation field of the reference GTF. If no gene_name field is present in the reference GTF, gene name is equivalent to gene ID.

For multi-species experiments, gene IDs and names are prefixed with the genome name to avoid name collisions between genes of different species e.g. GAPDH becomes hg19_GAPDH and Gm15816 becomes mm10_Gm15816.

Barcode sequences correspond to column indices.

$ head filtered_gene_bc_matrices/hg19/barcodes.tsv
AAACATACAAAACG-1
AAACATACAAAAGC-1
AAACATACAAACAG-1
AAACATACAAACGA-1
AAACATACAAAGCA-1
AAACATACAAAGTG-1
AAACATACAACAGA-1
AAACATACAACCAC-1
AAACATACAACCGT-1
AAACATACAACCTG-1

Each barcode sequence includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCAT-1

More details on the barcode sequence format are available in the barcoded BAM section.

R and Python support MEX format, and sparse matrices can be used for more efficient manipulation.

Loading matrices into R

The cellrangerRkit library is needed to load a gene-barcode matrix into R. Assuming you have R installed and Rscript on your $PATH, this can be installed by running the following command:

$ cellranger install-rkit

For example, running the following code loads the filtered human (hg19) gene-barcode matrix into R:

library(cellrangerRkit)
genome <- "hg19"
gene_bc_matrix <- load_cellranger_matrix("/opt/sample345", genome=genome)

To load a gene-barcode matrix from another species, you will need to edit the genome variable above. For example, to load the filtered mouse (mm10) gene-barcode matrix, you would set genome <- "mm10" and rerun the script above.

Loading matrices into Python

The csv, os and scipy.io libraries are recommended for loading a gene-barcode matrix into Python.

import csv
import os
import scipy.io
 
genome = "hg19"
matrices_dir = "/opt/sample345/outs/filtered_gene_bc_matrices"
human_matrix_dir = os.path.join(matrices_dir, genome)
mat = scipy.io.mmread(os.path.join(human_matrix_dir, "matrix.mtx"))
 
genes_path = os.path.join(human_matrix_dir, "genes.tsv")
gene_ids = [row[0] for row in csv.reader(open(genes_path), delimiter="\t")]
gene_names = [row[1] for row in csv.reader(open(genes_path), delimiter="\t")]
 
barcodes_path = os.path.join(human_matrix_dir, "barcodes.tsv")
barcodes = [row[0] for row in csv.reader(open(barcodes_path), delimiter="\t")]

Similarly with R to load a gene-barcode matrix from another species, you will need to edit the genome variable above.