HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

Cell Ranger


Loupe

10x Genomics
Chromium Single Cell Gene Expression

Running cellranger aggr

In this tutorial, you will learn how to:

The cellranger aggr pipeline is optional. It is used to aggregate, or combine two cellranger count runs together. With experiments involving multiple samples, and multiple 10x Chromium GEM wells, libraries must each be processed in separate runs of cellranger count.

To compare samples to each other for differential expression analysis, cellranger aggr is used to combine output files from each run of cellranger count to produce one single feature-barcode matrix and a .cloupe file for visualizing with Loupe Browser.

Get data

Use the following publicly available molecule_info.h5 files:

Start by making a directory to run the aggr pipeline in:

mkdir run_cellranger_aggr
cd run_cellranger_aggr

Next, download the data files.

wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_molecule_info.h5
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_molecule_info.h5

These are small files, less than 1GB each and usually take less than one minute to download.

Create aggregation CSV

The next step is to build the CSV file. CSV stands for comma separated value. For specific instructions for creating this CSV, see the cellranger aggr page.

The CSV file is a two-column file. The first column is for the sample id. This id name can be anything you want. Choose descriptive ids since they are used later in the analysis. The second column contains the paths to the molecule_info.h5 output files from the cellranger count pipelines.

From the same directory where the HDF5 files were downloaded, use the pwd command to print out the path:

pwd

The output is similar to the following:

path/to/run_cellranger_aggr

Copy the path to make the CSV file. Use the text editor of your choice to make this file. This example uses nano.

nano pbmc_aggr.csv

This opens the nano text editor.

sample_id,molecule_h5
1k_pbmcs,path/to/run_cellranger_aggr/pbmc_1k_v3_molecule_info.h5
10k_pbmcs,path/to/run_cellranger_aggr/pbmc_10k_v3_molecule_info.h5

Paste the text above into the editor. Edit the path/to/ part for each molecule_info.h5 file so it matches the path of the file on your system.

Exit out of the nano text editor by pressing keys and then pressing for "Yes" to save the file.

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ?                                                                                                    
 Y Yes
 N No           ^C Cancel

Nano then asks you:

File Name to Write: pbmc_aggr.csv

Press the key to confirm keeping this filename and saving the file. Now you are back to the command prompt.

We have now saved our Linux-formatted CSV file and exited out of the nano text editor.

Set up the command for cellranger aggr

Run the --help command to print the usage statement and view the input requirements.

cellranger aggr --help

This command prints the following:

cellranger-aggr
Aggregate data from multiple Cell Ranger runs
 
USAGE:
    cellranger aggr [FLAGS] [OPTIONS] --id  --csv 
 
FLAGS:
        --nosecondary    Disable secondary analysis, e.g. clustering
        --dry            Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
        --disable-ui     Do not serve the web UI
        --noexit         Keep web UI running after pipestance completes or fails
        --nopreflight    Skip preflight checks
    -h, --help           Prints help information
 
OPTIONS:
        --id                A unique run id and output folder name [a-zA-Z0-9_-]+
...

This pipeline has two inputs:

Run cellranger aggr

Next, build the command line and run it.

cellranger aggr --id=1k_10k_pbmc_aggr --csv=pbmc_aggr.csv

The output is similar to the following:

2021-10-28 19:59:07 [perform] Serializing pipestance performance data.
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
 
2021-10-28 19:59:13 Shutting down.

Explore the output of cellranger aggr

Just like the other pipelines, when you see “Pipestance completed successfully!” the job is done, and the pipeline outputs are in the pipestance directory in the outs/ folder. List the contents of this directory:

ls -1 1k_10k_pbmc_aggr/outs/

The output is similar to the following:

├── aggregation.csv
├── count
│   ├── analysis
│   │   ├── clustering
│   │   ├── diffexp
│   │   ├── pca
│   │   ├── tsne
│   │   └── umap
│   ├── cloupe.cloupe
│   ├── filtered_feature_bc_matrix
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── filtered_feature_bc_matrix.h5
│   └── summary.json
└── web_summary.html

The outputs are similar to those from the cellranger count pipeline, with the exception of the BAM files and molecule_info.h5 files. More information about outputs is available in the Understanding Outputs section.

Other tutorials in this series