Cell Ranger ATAC2.1, printed on 12/18/2024
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data. Cell Ranger ATAC includes four pipelines relevant to Single Cell ATAC experiments:
cellranger-atac mkfastq demultiplexes raw base call (BCL) files generated by Illumina® sequencers into FASTQ files. It is a wrapper around bcl2fastq from Illumina®, with additional useful features that are specific to 10x Genomics libraries and a simplified sample sheet format.
cellranger-atac count takes FASTQ files from cellranger-atac mkfastq and performs ATAC analysis, including:
cellranger-atac aggr aggregates and analyzes the outputs from multiple runs of cellranger-atac count (such as from multiple samples from one experiment) by performing the following steps:
cellranger-atac reanalyze takes the analysis files produced by cellranger-atac count or cellranger-atac aggr and reruns secondary analysis with tunable parameter settings:
Output is delivered in standard BAM, MEX, CSV, TSV, HDF5 and HTML formats that are augmented with cellular information.
The cellranger-atac count pipeline can take input from multiple sequencing runs on the same library.
Cell Ranger ATAC versions 2.1 supports libraries generated by the Chromium Single Cell ATAC v1, v1.1 Next GEM, and v2 reagent kits.
10x Genomics recommends using the pipeline analysis programs in order, starting with cellranger-atac mkfastq for demultiplexing the raw base call (BCL) files for each flow cell directory, and continuing with cellranger-atac count for single library analysis. If compatible FASTQ files are available from another source, a user can skip cellranger-atac mkfastq and use those FASTQ files as direct input to cellranger-atac count. Compatible FASTQ files can be found in reputable public datasets, or can be built by using bcl2fastq directly. See the Specifying Input FASTQs page for more details
The subsequent steps vary depending on how many samples, GEM wells, and flow cells you have (see the Glossary of Terms for detailed definitions). The relationship between these terms can be complex as there are multiple ways samples can be prepared for the pipeline. They are described here in order of increasing complexity. Note that the term library and GEM well are treated as equivalent because a single library is associated with data from one GEM well in the 10x Genomics Chromium run.
This is the most basic case. You have a single biological sample, which was prepared into a single library by processing through one GEM well (a set of partitioned cells from a single 10x Chromium™ Chip channel), and then sequenced on a single flow cell. Assuming the FASTQs have been generated with cellranger-atac mkfastq, you just need to run cellranger-atac count as described in Single-Library Analysis.
If you have a library generated from a single GEM well but was sequenced across multiple flow cells (e.g. to increase sequencing saturation), you can pool the reads from both sequencing runs. Follow the steps in Specifying Input Fastqs to combine them in a single cellranger-atac count run.
In this example you have one sample that is processed through multiple GEM wells. This is often done when conducting technical replicate experiments, or to increase the number of cells in your library without overloading a single GEM well in a 10x Genomics Chromium run. The libraries from the GEM wells are then pooled onto one flow cell and sequenced. In this case you demultiplex the data from the sequencing run and then run the libraries from each GEM well through a separate instance of cellranger-atac count. Once those are completed, you can perform a combined analysis using cellranger-atac aggr, as described in Multi-Library Aggregation. (See figure above).
In this example you have multiple samples that are processed through multiple GEM wells which generate multiple libraries and are pooled onto one flow cell. In this case, after demultiplexing, you must run cellranger-atac count separately for each GEM well to get sample specific data. For example, if your experimental design involves two samples, you will have to run cellranger-atac count two times - once for each sample. Then you can aggregate them with a single instance of cellranger-atac aggr, as described in Multi-Library Aggregation.