HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Aggregating Multiple Libraries with cellranger aggr

When doing large studies involving multiple biological samples (or multiple libraries / replicates of the same sample), it is best to run cellranger count on each of the libraries individually, and then pool the results using cellranger aggr.

The cellranger aggr command takes a CSV file specifying a list of cellranger count output files (specifically the molecule_info.h5 from each run), and produces a single gene-barcode matrix containing all the data.

When combining multiple libraries, the barcode sequences for each library are distinguished by their GEM group (see Gem Groups).

By default, each library's reads are subsampled such that all libraries have the same effective sequencing depth, measured in terms of reads per cell (see Depth Normalization).

Requirements

The first step is to run cellranger count on each individual library prepared using the 10x system, as described in Single-Library Analysis.

For example, suppose you ran three count pipelines as follows:

$ cd /opt/runs
$ cellranger count --id=LV123 ...
... wait for pipeline to finish ...
$ cellranger count --id=LB456 ...
... wait for pipeline to finish ...
$ cellranger count --id=LP789 ...
... wait for pipeline to finish ...

Now you want to aggregate these three runs to get a single gene-barcode matrix and analysis. In order to do so, you need to create an Aggregation CSV.

Setting Up An Aggregation CSV

Create a CSV file with a header line containing the following columns:

You can either make the CSV file in a text editor, or create it in Excel and export to CSV. Continuing the example from the previous section, your Excel spreadsheet would look like this:

AB
1library_idmolecule_h5
2LV123/opt/runs/LV123/outs/molecule_info.h5
3LB456/opt/runs/LB456/outs/molecule_info.h5
4LP789/opt/runs/LP789/outs/molecule_info.h5

When you save it as a CSV, the result would look like this:

library_id,molecule_h5
LV123,/opt/runs/LV123/outs/molecule_info.h5
LB456,/opt/runs/LB456/outs/molecule_info.h5
LP789,/opt/runs/LP789/outs/molecule_info.h5

Command Line Interface

These are the most common command line arguments (run cellranger aggr --help for a full list):

ArgumentDescription
--id=IDA unique run ID string: e.g. AGG123
--csv=CSVPath of a CSV file containing a list of cellranger count outputs (see Setting up a CSV).
--normalize=MODE(Optional) String specifying how to normalize depth across the input libraries. Valid values: mapped (default), raw, or none (see Depth Normalization).
--nosecondary(Optional) Add this flag to skip secondary analysis of the gene-barcode matrix (dimensionality reduction, clustering and visualization). Set this if you plan to use cellranger reanalyze or your own custom analysis.

After specifying these input arguments, run cellranger aggr:

$ cd /home/jdoe/runs
$ cellranger aggr --id=AGG123 \
                  --csv=AGG123_libraries.csv \
                  --normalize=mapped

The pipeline will begin to run, creating a new folder named with the aggregation ID you specified (e.g. /home/jdoe/runs/AGG123) for its output. If this folder already exists, cellranger will assume it is an existing pipestance and attempt to resume running it.

Pipeline Outputs

A successful run should conclude with a message similar to this:

2016-11-04 13:36:33 [runtime] (run:local)       ID.AGG123.SC_RNA_AGGREGATOR_CS.SC_RNA_AGGREGATOR.SUMMARIZE_AGGREGATED_REPORTS.fork0.join
2016-11-04 13:36:36 [runtime] (join_complete)   ID.AGG123.SC_RNA_AGGREGATOR_CS.SC_RNA_AGGREGATOR.SUMMARIZE_AGGREGATED_REPORTS
2016-11-04 13:36:45 [runtime] VDR killed 210 files, 29MB.
 
Outputs:
- Aggregation metrics summary HTML:       /home/jdoe/runs/AGG123/outs/web_summary.html
- Aggregation metrics summary JSON:       /home/jdoe/runs/AGG123/outs/summary.json
- Secondary analysis output CSV:          /home/jdoe/runs/AGG123/outs/analysis_csv
- Filtered gene-barcode matrices HDF5:    /home/jdoe/runs/AGG123/outs/filtered_gene_bc_matrices_h5.h5
- Filtered gene-barcode matrices MEX:     /home/jdoe/runs/AGG123/outs/filtered_gene_bc_matrices_mex
- Filtered molecule-level info:           /home/jdoe/runs/AGG123/outs/filtered_molecules.h5
- Unfiltered gene-barcode matrices HDF5:  /home/jdoe/runs/AGG123/outs/raw_gene_bc_matrices_h5.h5
- Unfiltered gene-barcode matrices MEX:   /home/jdoe/runs/AGG123/outs/raw_gene_bc_matrices_mex
- Unfiltered molecule-level info:         /home/jdoe/runs/AGG123/outs/raw_molecules.h5
- Barcodes of cell-containing partitions: /home/jdoe/runs/AGG123/outs/cell_barcodes.csv
- Copy of the input CSV:                  /home/jdoe/runs/AGG123/outs/aggregation_csv.csv
 
Pipestance completed successfully!

Once cellranger has successfully completed, you can browse the resulting summary HTML file in any supported web browser, or refer to the Understanding Output section to explore the data by hand. For machine-readable versions of the summary metrics, refer to the cellranger aggr section of the Summary Metrics page.

Gem Groups

This is an integer that is appended to each barcode in the gene-barcode matrix. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different libraries, despite having the same nucleotide sequence.

Depth Normalization

When combining data from multiple libraries, we recommend equalizing the read depth between libraries before merging, to reduce the batch effect introduced by sequencing. The cellranger aggr pipeline does this automatically by default, but you can choose to turn it off or change the way the normalization is done.

There are three normalization modes: