HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Feature-Barcode Matrices

The cellranger-atac pipeline outputs three types of feature-barcode matrices. The matrix has features as rows and barcodes as columns. Each element of the matrix is the number of cut sites associated with a feature and barcode.

TypeDescription
Unfiltered (Raw) peak-barcode matrixContains every observed barcode including background and non-cellular barcodes.
Filtered peak-barcode matrixContains only detected cellular barcodes.
Filtered tf-barcode matrixContains only detected cellular barcodes.

The cellranger-atac pipeline generates peak-barcode and tf-barcode matrices. Each matrix is stored in Market Exchange Format (MEX). It also contains a BED file with peaks (TSV file for transcription factors) and barcode sequences corresponding to row and column indices, respectively. For example, if cellranger-atac is run with a human reference, the matrices output may look like:

$ cd /home/jdoe/runs/sample345/outs
$ tree filtered_peak_bc_matrix
filtered_peak_bc_matrix
├── barcodes.tsv
├── peaks.bed
└── matrix.mtx
1 directories, 3 files

Features correspond to row indices. For each peak, its chromosome, start and end positions are stored in the peaks.bed file.

$ head filtered_peak_bc_matrix/peaks.bed
chr1    237588  237917
chr1    564444  565537
chr1    567478  568248
chr1    569021  569641
chr1    713461  715293
chr1    752379  753032
chr1    762073  763379
chr1    773651  774064
chr1    779547  780286
chr1    793345  794375

These peaks match the original peaks called by the peak calling algorithms. They are duplicated in the mex directory for bioinformatic safety while processing the matrices outside of the pipeline.

For each transcription factor, its fully specified name in the reference and its common name are stored as first and second columns in the motifs.tsv, respectively.

$ head filtered_tf_bc_matrix/motifs.tsv
Arnt_HUMAN.MA0004.1 Arnt
Ahr::Arnt_HUMAN.MA0006.1    Ahr::Arnt
Ddit3::Cebpa_HUMAN.MA0019.1 Ddit3::Cebpa
NFIL3_HUMAN.MA0025.1    NFIL3
Mecom_HUMAN.MA0029.1    Mecom
FOXF2_HUMAN.MA0030.1    FOXF2
FOXD1_HUMAN.MA0031.1    FOXD1
Gfi1_HUMAN.MA0038.1 Gfi1
Foxq1_HUMAN.MA0040.1    Foxq1
Foxd3_HUMAN.MA0041.1    Foxd3

Transcription factor fully specified names correspond to the names in the motifs.pfm reference. Similarly, transcription factor common names correspond to the prefix before species in the fully specified names located in the first column.

For multi-species experiments, reference contigs (first column in peaks.bed) are prefixed with the genome name to avoid name collisions between chromosomes of different species e.g. chr1 becomes hg19_chr1.

Barcode sequences correspond to column indices.

$ head filtered_peak_bc_matrix/barcodes.tsv
AAACATACAAAACG-1
AAACATACAAAAGC-1
AAACATACAAACAG-1
AAACATACAAACGA-1
AAACATACAAAGCA-1
AAACATACAAAGTG-1
AAACATACAACAGA-1
AAACATACAACCAC-1
AAACATACAACCGT-1
AAACATACAACCTG-1

Each barcode sequence includes a suffix with a dash separator followed by a number:

AGAATGGTCTGCAT-1

More details on the barcode sequence format are available in the barcoded BAM section.

R and Python support MEX format, and sparse matrices can be used for more efficient manipulation.

Loading matrices into R

It is possible to load mex files directly into R, for example:

require(magrittr)
require(readr)
require(Matrix)
require(tidyr)
require(dplyr)
 
# peak-bc matrix
mex_dir_path <- "/opt/sample345/outs/filtered_peak_bc_matrix"

mtx_path <- paste(mex_dir_path, "matrix.mtx", sep = '/')
feature_path <- paste(mex_dir_path, "peaks.bed", sep = '/')
barcode_path <- paste(mex_dir_path, "barcodes.tsv", sep = '/')
 
features <- readr::read_tsv(feature_path, col_names = F) %>% tidyr::unite(feature)
barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode)
 
mtx <- Matrix::readMM(mtx_path) %>%
  magrittr::set_rownames(features$feature) %>%
  magrittr::set_colnames(barcodes$barcode)

# tf-bc matrix
mex_dir_path <- "/opt/sample345/outs/filtered_tf_bc_matrix"

mtx_path <- paste(mex_dir_path, "matrix.mtx", sep = '/')
feature_path <- paste(mex_dir_path, "motifs.tsv", sep = '/')
barcode_path <- paste(mex_dir_path, "barcodes.tsv", sep = '/')
 
features <- readr::read_tsv(feature_path, col_names = c('feature', 'common_name'))
barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode)
 
mtx <- Matrix::readMM(mtx_path) %>%
  magrittr::set_rownames(features$feature) %>%
  magrittr::set_colnames(barcodes$barcode)

Loading matrices into Python

The csv, os and scipy.io libraries are recommended for loading a feature-barcode matrix into Python.

import csv
import os
import scipy.io
 
# peak-bc matrix
 
matrix_dir = "/opt/sample345/outs/filtered_peak_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx"))
 
peaks_path = os.path.join(human_matrix_dir, "peaks.bed")
peaks = [(row[0], int(row[1]), int(row[2])) for row in csv.reader(open(peaks_path), delimiter="\t")]
 
barcodes_path = os.path.join(human_matrix_dir, "barcodes.tsv")
barcodes = [row[0] for row in csv.reader(open(barcodes_path), delimiter="\t")]

 
# tf-bc matrix
 
matrix_dir = "/opt/sample345/outs/filtered_tf_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx"))
 
motifs_path = os.path.join(human_matrix_dir, "motifs.tsv")
motif_ids = [row[0] for row in csv.reader(open(motifs_path), delimiter="\t")]
motif_names = [row[1] for row in csv.reader(open(motifs_path), delimiter="\t")]

Converting to CSV Format

Cell Ranger ATAC represents the feature-barcode matrices using sparse formats (only the nonzero entries are stored) in order to cut down on file size. All of our programs support sparse formats.

However certain programs (e.g. Excel) only support dense formats (where every row-column entry is explicitly stored, even if it's a zero). You can convert a feature-barcode matrix to dense CSV format using the cellranger mat2csv command from the Cell Ranger software. This command takes two arguments - an input matrix generated by Cell Ranger ATAC (either an H5 file or a MEX directory), and an output path for the dense CSV. For example, to convert a matrix from a pipestance named sample123 in the current directory, either of the following commands would work:

# convert from MEX
$ cellranger mat2csv sample123/outs/filtered_peak_bc_matrix sample123.csv
# or, convert from H5
$ cellranger mat2csv sample123/outs/filtered_peak_bc_matrix_h5.h5 sample123.csv

You can then load sample123.csv into Excel.

WARNING: dense files can be very large and may cause Excel to crash, or even fail in mat2csv if your computer doesn't have enough memory. For example, a peak-barcode matrix with 30k peaks with ~3k barcodes uses at least 200MB of disk space.