Cell Ranger7.1, printed on 10/05/2024
In this tutorial, you will learn how to:
The cellranger aggr pipeline is optional. It is used to aggregate, or combine two cellranger count runs together. With experiments involving multiple samples, and multiple 10x Chromium GEM wells, libraries must each be processed in separate runs of cellranger count.
To compare samples to each other for differential expression analysis,
cellranger aggr is used to combine output files from each run of
cellranger count to produce one single feature-barcode matrix and a
.cloupe
file for visualizing with Loupe Browser.
This tutorial is written with Cell Ranger v6.1.2. Commands are compatible with other versions of Cell Ranger, unless noted otherwise. |
Use the following publicly available molecule_info.h5
files:
Start by making a directory to run the aggr pipeline in:
mkdir run_cellranger_aggr cd run_cellranger_aggr
Next, download the data files.
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_molecule_info.h5 wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_molecule_info.h5
These are small files, less than 1GB each and usually take less than one minute to download.
The next step is to build the CSV file. CSV stands for comma separated value. For specific instructions for creating this CSV, see the cellranger aggr page.
The CSV file is a two-column file. The first column is for the sample id
.
This id name can be anything you want. Choose descriptive ids since they
are used later in the analysis. The second column contains the paths to the
molecule_info.h5
output files from the cellranger count pipelines.
For Cell Ranger v6.0+ and Loupe Browser v5.1.0+, the libraries CSV header should be sample_id,molecule_h5. For prior software versions, it should be library_id,molecule_h5. |
From the same directory where the HDF5 files were downloaded, use the pwd command to print out the path:
pwd
The output is similar to the following:
path/to/run_cellranger_aggr
Copy the path to make the CSV file. Use the text editor of your choice to make this file. This example uses nano.
nano pbmc_aggr.csv
This opens the nano text editor.
sample_id,molecule_h5 1k_pbmcs,path/to/run_cellranger_aggr/pbmc_1k_v3_molecule_info.h5 10k_pbmcs,path/to/run_cellranger_aggr/pbmc_10k_v3_molecule_info.h5
Paste the text above into the editor. Edit the path/to/ part for each molecule_info.h5
file so it matches the path of the file on your system.
Exit out of the nano text editor by pressing keys and then pressing for "Yes" to save the file.
Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ? Y Yes N No ^C Cancel
Nano then asks you:
File Name to Write: pbmc_aggr.csv
Press the key to confirm keeping this filename and saving the file. Now you are back to the command prompt.
We have now saved our Linux-formatted CSV file and exited out of the nano text editor.
Run the --help
command to print
the usage statement and view the input requirements.
cellranger aggr --help
This command prints the following:
cellranger-aggr Aggregate data from multiple Cell Ranger runs USAGE: cellranger aggr [FLAGS] [OPTIONS] --id--csv FLAGS: --nosecondary Disable secondary analysis, e.g. clustering --dry Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop --disable-ui Do not serve the web UI --noexit Keep web UI running after pipestance completes or fails --nopreflight Skip preflight checks -h, --help Prints help information OPTIONS: --id A unique run id and output folder name [a-zA-Z0-9_-]+ ...
This pipeline has two inputs:
--id
is used to name the output directory that the pipeline runs in.--csv
takes a CSV file that points to the outputs from the
cellranger
count pipeline.Next, build the command line and run it.
cellranger aggr --id=1k_10k_pbmc_aggr --csv=pbmc_aggr.csv
The output is similar to the following:
2021-10-28 19:59:07 [perform] Serializing pipestance performance data. Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully! 2021-10-28 19:59:13 Shutting down.
Just like the other pipelines, when you see “Pipestance completed successfully!”
the job is done, and the pipeline outputs are in the pipestance directory in the
outs/
folder. List the contents of this directory:
ls -1 1k_10k_pbmc_aggr/outs/
The output is similar to the following:
├── aggregation.csv ├── count │ ├── analysis │ │ ├── clustering │ │ ├── diffexp │ │ ├── pca │ │ ├── tsne │ │ └── umap │ ├── cloupe.cloupe │ ├── filtered_feature_bc_matrix │ │ ├── barcodes.tsv.gz │ │ ├── features.tsv.gz │ │ └── matrix.mtx.gz │ ├── filtered_feature_bc_matrix.h5 │ └── summary.json └── web_summary.html
The outputs are similar to those from the cellranger count pipeline, with the exception of the BAM files and molecule_info.h5 files. More information about outputs is available in the Understanding Outputs section.