Software  ›   pipelines

# Aggregating Multiple Capture Areas with spaceranger aggr

When doing large studies involving multiple Capture Areas of consecutive sections of the same tissue block, run spaceranger count on FASTQ data from each of the capture area individually, and then pool the results using spaceranger aggr, as described here.

The spaceranger aggr command takes a CSV file specifying a list of spaceranger count output files (specifically the molecule_info.h5 from each run), and produces a single feature-barcode matrix containing all the data.

When combining multiple capture areas, the barcode sequences for each channel are distinguished by a capture area suffix appended to the barcode sequence (see Capture Area).

By default, the reads from each capture area are subsampled such that all capture areas have the same effective sequencing depth, measured in terms of reads that are confidently mapped to the transcriptome or assigned to the feature IDs per spot. However, it is possible to change the depth normalization mode (see Depth Normalization).

## Requirements

The first step is to run spaceranger count on each individual capture area prepared using the 10x Visium™ platform, as described in Single-capture area Analysis.

Targeted Spatial Gene Expression data is supported by spaceranger aggr, provided that the same target panel CSV file is used for the targeted libraries, and can also be aggregated with whole transcriptome Spatial Gene Expression libraries.

Data from Visium for FFPE sections is supported by spaceranger aggr, provided that the same probe set reference CSV file is used for all samples. Aggregating FFPE data vith Visium data from fresh frozen sections (including Targeted Spatial Gene Expression data) is not supported.

For example, suppose you ran three count pipelines as follows:

$cd /opt/runs$ spaceranger count --id=LV123 ...
... wait for pipeline to finish ...
$spaceranger count --id=LB456 ... ... wait for pipeline to finish ...$ spaceranger count --id=LP789 ...
... wait for pipeline to finish ...

Now you can aggregate these three runs to get a single feature-barcode matrix and analysis. In order to do so, you need to create an Aggregation CSV.

## Setting Up an Aggregation CSV

Create a CSV file with a header line containing the following columns:

• library_id: Unique identifier for this input capture area. This will be used for labeling purposes only; it doesn't need to match any previous ID you've assigned to the capture area.
• molecule_h5: Path to the molecule_info.h5 file produced by spaceranger count. For example, if you processed your capture area by calling spaceranger count --id=ID in some directory /DIR, this path would be /DIR/ID/outs/molecule_info.h5.
• cloupe_file: Path to the cloupe.cloupe file produced by spaceranger count. For example, if you processed your capture area by calling spaceranger count --id=ID in some directory /DIR, this path would be /DIR/ID/outs/cloupe.cloupe.

You can either make the CSV file in a text editor, or create it in Excel and export to CSV. Continuing the example from the previous section, your Excel spreadsheet would look like this:

ABCD (optional)
1library_idmolecule_h5cloupe_filespatial_folder
2LV123/opt/runs/LV123/outs/molecule_info.h5/opt/runs/LV123/outs/cloupe.cloupe/opt/runs/LV123/outs/spatial
3LB456/opt/runs/LB456/outs/molecule_info.h5/opt/runs/LB456/outs/cloupe.cloupe/opt/runs/LB456/outs/spatial
4LP789/opt/runs/LP789/outs/molecule_info.h5/opt/runs/LP789/outs/cloupe.cloupe/opt/runs/LP789/outs/spatial

When you save it as a CSV, the result would look like this:

library_id,molecule_h5,cloupe_file,spatial_folder
LV123,/opt/runs/LV123/outs/molecule_info.h5,/opt/runs/LV123/outs/cloupe.cloupe,/opt/runs/LV123/outs/spatial
LB456,/opt/runs/LB456/outs/molecule_info.h5,/opt/runs/LB456/outs/cloupe.cloupe,/opt/runs/LB456/outs/spatial
LP789,/opt/runs/LP789/outs/molecule_info.h5,/opt/runs/LP789/outs/cloupe.cloupe,/opt/runs/LP789/outs/spatial


In addition to the CSV columns expected by spaceranger aggr, you may optionally supply additional columns containing library meta-data (e.g., lab or sample origin). These custom library annotations do not affect the analysis pipeline but can be visualized downstream in the Loupe Browser (see below). Note that unlike other CSV inputs to Spaceranger, these custom columns may contain characters outside the ASCII range (e.g., non-Latin characters).

## Creating Categories

When combining multiple samples into a single dataset with the spaceranger aggr pipeline, you can assign categories and values to individual samples by adding columns to the spaceranger aggr input spreadsheet. These category assignments propagate into Loupe Browser, where you can view them, and determine genes that drive differences within samples. For example, the following spreadsheet was used to aggregate the tutorial dataset:

ABCD
1library_idmolecule_h5cloupe_fileAMLStatus
2AMLNormal1/path/to/AMLNormal1/molecule_info.h5/opt/runs/LV123/outs/cloupe.cloupeNormal
3AMLNormal2/path/to/AMLNormal2/molecule_info.h5/opt/runs/LB456/outs/cloupe.cloupeNormal
4AMLPatient/path/to/AMLPatient/molecule_info.h5/opt/runs/LP789/outs/cloupe.cloupePatient

Any columns in addition to 'library_id', 'molecule_h5', 'cloupe_file' and 'spatial_folder' will be converted into categories, and the spots in each sample will be assigned to one of the values in that category.

spaceranger aggr does not perform batch correction for removal of technical artifacts due to differences in assays. For this reason, 10x does not recommend combining Visium data from fundamentally different treatments such as immunofluorescence and H&E stained tissue sections. spaceranger aggr can be used with samples from different biological conditions of the same tissue.

## Aggregating Targeted Spatial Gene Expression Data

The spaceranger aggr command can aggregate results that include Targeted Spatial Gene Expression analyses, provided that the requirements above are met. Secondary analysis for all libraries is done with the non-targeted genes excluded from the feature-barcode matrices. Aggregated feature-barcode matrices follow the same convention as Targeted Spatial Gene Expression analysis: the filtered feature-barcode matrices do not include non-targeted genes, whereas the raw feature-barcode matrices include all genes.

## Command Line Interface

These are the most common command line arguments (run spaceranger aggr --help for a full list):

ArgumentDescription
--id=IDA unique run ID string: e.g. AGG123
--csv=CSVPath of a CSV file containing a list of spaceranger count outputs (see Setting up a CSV).
--normalize=MODE(Optional) String specifying how to normalize depth across the input libraries. Valid values: mapped (default), or none (see Depth Normalization).

After specifying these input arguments, run spaceranger aggr:

$cd /home/jdoe/runs$ spaceranger aggr --id=AGG123 \
--csv=AGG123_libraries.csv \
--normalize=mapped


The pipeline will begin to run, creating a new folder named with the aggregation ID you specified (e.g. /home/jdoe/runs/AGG123) for its output. If this folder already exists, spaceranger will assume it is an existing pipestance and attempt to resume running it.

## Pipeline Outputs

The spaceranger aggr pipeline generates output files that contain all of the data from the individual input jobs, aggregated into single output files, for convenient multi-sample analysis. The capture area suffix of each barcode is updated to prevent barcode collisions, as described below.

Each output file produced by spaceranger aggr follows the format described in the Understanding Output section of the documentation, but includes the union of all the relevant barcodes from each of the input jobs.

A successful run should conclude with a message similar to this:

2018-10-04 13:36:33 [runtime] (run:local)       ID.AGG123.SPATIAL_RNA_AGGREGATOR_CS.SPATIAL_RNA_AGGREGATOR.SUMMARIZE_AGGREGATED_REPORTS.fork0.join
2018-10-04 13:36:36 [runtime] (join_complete)   ID.AGG123.SPATIAL_RNA_AGGREGATOR_CS.SPATIAL_RNA_AGGREGATOR.SUMMARIZE_AGGREGATED_REPORTS
2018-10-04 13:36:45 [runtime] VDR killed 210 files, 29MB.

Outputs:
- Aggregation metrics summary HTML:                           /home/jdoe/runs/AGG123/outs/web_summary.html
- Aggregation metrics summary JSON:                           /home/jdoe/runs/AGG123/outs/summary.json
- Secondary analysis output CSV:                              /home/jdoe/runs/AGG123/outs/analysis
- Filtered feature-barcode matrices MEX:                      /home/jdoe/runs/AGG123/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:                     /home/jdoe/runs/AGG123/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:                    /home/jdoe/runs/AGG123/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5:                   /home/jdoe/runs/AGG123/outs/raw_feature_bc_matrix.h5
- Copy of the input aggregation CSV:                          /home/jdoe/runs/AGG123/outs/aggregation.csv
- Loupe Browser file:                                         /home/jdoe/runs/AGG123/outs/cloupe.cloupe
- Aggregated tissue positions list:                           /home/jdoe/runs/AGG123/outs/aggr_tissue_positions_list.csv
- Spatial folder containing spatial images and scalefactors:  /home/jdoe/runs/AGG123/outs/spatial

Pipestance completed successfully!


Once spaceranger aggr has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .cloupe file in Loupe Browser, or refer to the Understanding Output section to explore the data by hand. For machine-readable versions of the summary metrics, refer to the spaceranger aggr section of the Summary Metrics page.

## Understanding Capture Areas

Each capture area is a physically distinct partition on a Visium slide. However, each of these capture areas are printed with the same set of barcode tagged mRNA capture sequences known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer identifying the capture area to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier in the feature-barcode matrix. For example, AAACAACGAATAGTTC-1 and AAACAACGAATAGTTC-2 are distinct spot barcodes from different capture areas, despite having the same barcode nucleotide sequence.

This number, called the capture area suffix, tells us which capture area the barcode sequence came from. The numbering of the capture area will reflect the order that the capture area were provided in the Aggregation CSV.

## Depth Normalization

When combining data from multiple capture areas, the spaceranger aggr pipeline automatically equalizes the read depth between groups before merging, which is the recommended approach in order to avoid the batch effect introduced by sequencing depth. It is possible to turn off normalization or change the way normalization is done. The none option may be appropriate if you want to maximize sensitivity and plan to deal with depth normalization in a downstream step.

There are two normalization modes:

• mapped (default): Subsample reads from higher-depth capture areas until they all have, on average, an equal number of reads per tissue covered spot that are confidently mapped to the transcriptome. If Targeted Spatial Gene Expression libraries are included, then normalization is performed on the basis of mean reads per spot mapped confidently to the targeted transcriptome. The subsampling rates for Targeted Spatial Gene Expression libraries are multiplied by 2, provided all libraries can achieve that depth. This multiple is consistent with sequencing depth recommendations and is also done to avoid removing large fractions of reads from targeted libraries whenever they are combined with whole transcriptome libraries.
• none: Do not normalize at all.