HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

Cell Ranger


10x Genomics
Chromium Single Cell Gene Expression

Running cellranger aggr

In this tutorial, you will learn how to:

The cellranger aggr pipeline is optional. It is used to aggregate, or combine two cellranger count runs together. With experiments involving multiple samples, and multiple 10x Chromium GEM wells, libraries must each be processed in separate runs of cellranger count.

To compare samples to each other for differential expression analysis, cellranger aggr is used to combine output files from each run of cellranger count to produce one single feature-barcode matrix and a .cloupe file for visualizing with Loupe Browser.

Get data

Use the following publicly available molecule_info.h5 files:

Start by making a directory to run the aggr pipeline in:

mkdir run_cellranger_aggr
cd run_cellranger_aggr

Next, download the data files.

wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_molecule_info.h5
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_molecule_info.h5

These are small files, less than 1GB each and usually take less than one minute to download.

Create aggregation CSV

The next step is to build the CSV file. CSV stands for comma separated value. For specific instructions for creating this CSV, see the cellranger aggr page.

The CSV file is a two-column file. The first column is for the sample id. This id name can be anything you want. Choose descriptive ids since they are used later in the analysis. The second column contains the paths to the molecule_info.h5 output files from the cellranger count pipelines.

From the same directory where the HDF5 files were downloaded, use the pwd command to print out the path:


The output is similar to the following:


Copy the path to make the CSV file. Use the text editor of your choice to make this file. This example uses nano.

nano pbmc_aggr.csv

This opens the nano text editor.


Paste the text above into the editor. Edit the path/to/ part for each molecule_info.h5 file so it matches the path of the file on your system.

Exit out of the nano text editor by pressing keys and then pressing for "Yes" to save the file.

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ?                                                                                                    
 Y Yes
 N No           ^C Cancel

Nano then asks you:

File Name to Write: pbmc_aggr.csv

Press the key to confirm keeping this filename and saving the file. Now you are back to the command prompt.

We have now saved our Linux-formatted CSV file and exited out of the nano text editor.

Set up the command for cellranger aggr

Run the --help command to print the usage statement and view the input requirements.

cellranger aggr --help

This command prints the following:

Aggregate data from multiple Cell Ranger runs
    cellranger aggr [FLAGS] [OPTIONS] --id  --csv 
        --nosecondary    Disable secondary analysis, e.g. clustering
        --dry            Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
        --disable-ui     Do not serve the web UI
        --noexit         Keep web UI running after pipestance completes or fails
        --nopreflight    Skip preflight checks
    -h, --help           Prints help information
        --id                A unique run id and output folder name [a-zA-Z0-9_-]+

This pipeline has two inputs:

Run cellranger aggr

Next, build the command line and run it.

cellranger aggr --id=1k_10k_pbmc_aggr --csv=pbmc_aggr.csv

The output is similar to the following:

2021-10-28 19:59:07 [perform] Serializing pipestance performance data.
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
2021-10-28 19:59:13 Shutting down.

Explore the output of cellranger aggr

Just like the other pipelines, when you see “Pipestance completed successfully!” the job is done, and the pipeline outputs are in the pipestance directory in the outs/ folder. List the contents of this directory:

ls -1 1k_10k_pbmc_aggr/outs/

The output is similar to the following:

├── aggregation.csv
├── count
│   ├── analysis
│   │   ├── clustering
│   │   ├── diffexp
│   │   ├── pca
│   │   ├── tsne
│   │   └── umap
│   ├── cloupe.cloupe
│   ├── filtered_feature_bc_matrix
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── filtered_feature_bc_matrix.h5
│   └── summary.json
└── web_summary.html

The outputs are similar to those from the cellranger count pipeline, with the exception of the BAM files and molecule_info.h5 files. More information about outputs is available in the Understanding Outputs section.

Other tutorials in this series