Software  ›   pipelines

# Aggregating Multiple GEM Groups with cellranger-dna aggr

When doing large studies involving multiple GEM wells, run cellranger-dna cnv on FASTQ data from each of the GEM wells individually, and then pool the results using cellranger-dna aggr, as described here.

The cellranger-dna aggr command takes a CSV file specifying a list of cellranger-dna cnv output files (specifically the cnv_data.h5 from each run), and produces a single cnv_data.h5 containing the aggregated data. Please note that only cnv_data.h5 files from cellranger-dna version 1.1 can be aggregated.

When combining data from multiple GEM wells, the barcode sequences for each channel are distinguished by a GEM well suffix appended to the barcode sequence (see GEM wells).

## Requirements

The first step is to run cellranger-dna cnv on each individual GEM well prepared using the 10x Chromium™ platform, as described in Copy Number Variation Analysis.

For example, suppose you ran three count pipelines as follows:

$cd /home/jdoe/runs$ cellranger-dna cnv --id=normal ...
... wait for pipeline to finish ...
$cellranger-dna cnv --id=tumor_primary ... ... wait for pipeline to finish ...$ cellranger-dna cnv --id=tumor_metastases ...
... wait for pipeline to finish ...

These three runs can now be aggregated into a single analysis. In order to do so, you must create an Aggregation CSV.

## Creating An Aggregation CSV

Create a CSV file with a header line containing the following columns:

• library_id: Unique identifier for this input GEM well. This will be used for labeling purposes only; it doesn't need to match any previous ID you've assigned to the GEM well.
• cnv_data: Path to the cnv_data.h5 file produced by cellranger-dna cnv. For example, if you processed your GEM well by calling cellranger-dna cnv --id=ID in some directory /DIR, the cnv_data.h5 would be /DIR/ID/outs/cnv_data.h5.

You can either make the CSV file in a text editor or create it in Excel and export to CSV. Continuing the example from the previous section, your Excel spreadsheet would look like this:

AB
1library_idcnv_data
2normal/home/jdoe/runs/normal/outs/cnv_data.h5
3tumor_primary/home/jdoe/runs/tumor_primary/outs/cnv_data.h5
4tumor_metastases/home/jdoe/runs/tumor_metastases/outs/cnv_data.h5

When you save it as a CSV, the result would look like this:

library_id,cnv_data
normal,/home/jdoe/runs/normal/outs/cnv_data.h5
tumor_primary,/home/jdoe/runs/tumor_primary/outs/cnv_data.h5
tumor_metastases,/home/jdoe/runs/tumor_metastases/outs/cnv_data.h5


## Command Line Interface

These are the most common command line arguments (run cellranger-dna aggr --help for a full list):

ArgumentDescription
--id=IDA unique run ID string: e.g. AGG123
--csv=CSVPath of a CSV file containing a list of cellranger-dna count outputs (see Setting up a CSV).
--reference=PATHPath to a Cell Ranger DNA reference.
--description=TEXT(optional) More detailed sample description.
--soft-min-avg-ploidy=FLOAT(optional) Use a known lower limit on the average ploidy of the sample.
--soft-max-avg-ploidy=FLOAT(optional) Use a known upper limit on the average ploidy of the sample.

After specifying these input arguments, run cellranger-dna aggr:

$cd /home/jdoe/runs$ cellranger-dna aggr --id=AGG123 \
--csv=AGG123_libraries.csv \
--reference=/home/jdoe/refs/GRCh37


The pipeline will begin to run, creating a new folder named with the aggregation ID you specified (e.g. /home/jdoe/runs/AGG123) for its output. If this folder already exists, cellranger-dna cnv will assume it is an existing pipestance and attempt to resume running it.

## Pipeline Outputs

The cellranger-dna aggr pipeline generates output files that contain all of the data from the individual input runs for convenient multi-sample analysis. The GEM well suffix of each barcode is updated to prevent barcode collisions, as described below.

Each output file produced by cellranger-dna aggr follows the format described in the Understanding Output section of the documentation, but includes the union of all the relevant barcodes from each input run.

A successful run should conclude with a message similar to this:

2019-05-06 20:35:47 [runtime] (run:local)       ID.AGGR123.CNV_AGGREGATOR_CS.DLOUPE_PREPROCESS.fork0.join
2019-05-06 20:35:48 [runtime] (chunks_complete) ID.AGGR123.CNV_AGGREGATOR_CS._POSTPROCESSING.MAKE_WEBSUMMARY
2019-05-06 20:35:54 [runtime] (join_complete)   ID.AGGR123.CNV_AGGREGATOR_CS.DLOUPE_PREPROCESS

Outputs:
- Aggregation specification:    /home/jdoe/runs/AGGR123/outs/aggregate.csv
- HDF5 file with CNV data:      /home/jdoe/runs/AGGR123/outs/cnv_data.h5
- Loupe visualization file:     /home/jdoe/runs/AGGR123/outs/dloupe.dloupe
- CNV calls with imputation:    /home/jdoe/runs/AGGR123/outs/node_cnv_calls.bed
- CNV calls without imputation: /home/jdoe/runs/AGGR123/outs/node_unmerged_cnv_calls.bed
- Per-cell summary metrics:     /home/jdoe/runs/AGGR123/outs/per_cell_summary_metrics.csv
- Analysis summary metrics:     /home/jdoe/runs/AGGR123/outs/summary.csv
- Run summary HTML:             /home/jdoe/runs/AGGR123/outs/web_summary.html

Pipestance completed successfully!


Once cellranger-dna aggr has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .dloupe file in Loupe scDNA Browser, or refer to the Understanding Output section to explore the data by hand. For machine-readable versions of the summary metrics, refer to the CSV page of the Understanding Outputs section.

## Understanding GEM Wells

Each GEM well is a physically distinct set of GEM partitions, but draws barcode sequences randomly from the pool of valid barcodes, known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer (called a GEM well suffix) identifying the GEM well to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different GEM wells, despite having the same barcode nucleotide sequence. The numbering of the GEM wells will reflect the order that the GEM wells were provided in the Aggregation CSV.

• 1.0
• Cell Ranger DNA v1.1 (latest)