Cell Ranger ARC1.0, printed on 09/24/2022
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage. Furthermore, since the ATAC and gene expression measurements are on the very same cell, we are able to perform analyses that link chromatin accessibility and gene expression.
cellranger-arc mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional useful features that are specific to 10x libraries and a simplified sample sheet format. The same command can be used to demultiplex both ATAC and GEX flow cells.
cellranger-arc count takes FASTQ files from cellranger-arc mkfastq and performs alignment, filtering, barcode counting, peak calling and counting of both ATAC and GEX molecules. Furthermore, it uses the Chromium cellular barcodes to generate feature-barcode matrices, perform dimensionality reduction, determine clusters, perform differential analysis on clusters and identify linkages between peaks and genes. The count pipeline can take input from multiple sequencing runs on the same GEM well.
These pipelines combine Chromium-specific algorithms with the widely used aligners STAR and BWA. Output is delivered in standard BAM, MEX, CSV, HDF5 and HTML formats that are augmented with cellular information and a .cloupe file for use with Loupe Browser.
If you are beginning with raw base call (BCL) files, the Cell Ranger ARC workflow starts with demultiplexing the BCL files for each flow cell directory for all relevant ATAC and GEX sequencing runs. 10x recommends using cellranger-arc mkfastq as described in Generating FASTQs. If you are beginning with FASTQ files that have already been demultiplexed with bcl2fastq directly, or from a public source such as SRA, you can skip cellranger-arc mkfastq and begin with cellranger-arc count. Please see the Specifying Input FASTQs page for specific guidelines on which arguments to use for your scenario.
The exact steps of the workflow vary depending on how many samples, GEM wells, and flow cells you have. This section describes the different possible workflows.
In this example you have one sample that is processed through one GEM well (a set of partitioned cells from a single 10x Chromium™ Chip channel) and results in one Multiome ATAC library and one Multiome GEX library. Each library is sequenced separately on one flow cell. In this case you would generate FASTQs separately for ATAC and GEX by running cellranger-arc mkfastq on the respective flow cells and run cellranger-arc count as described in Single-Sample Analysis.
In this example you have one sample that is processed through one GEM well resulting in one ATAC library and one GEX library. The ATAC and GEX libraries are sequenced on two flow cells each. As an example, this may be done to increase sequencing depth, when the first sequencing run did not produce enough raw read pairs per cell. Here we would run cellranger-arc mkfastq a total of four times: once for each of the two ATAC flow cells and once for each of the two GEX flow cells. All of the reads can be combined in a single instance of the cellranger-arc count pipeline. This process is described in Specifying Input Fastqs.