Cell Ranger ATAC2.1, printed on 10/05/2024
Cell Ranger ATAC's pipelines analyze sequencing data produced from Chromium Single Cell ATAC libraries. This involves the following steps:
Run cellranger-atac mkfastq on the Illumina® BCL output folder to generate FASTQ files.
Run cellranger-atac count on each library that was demultiplexed by cellranger-atac mkfastq.
For the following example, assume that the Illumina® BCL output is in a
folder named /sequencing/140101_D00123_0111_AHAWT7ADXX
.
First, follow the instructions on running cellranger-atac
mkfastq to
generate FASTQ files. For example, if the flow cell serial number was
HAWT7ADXX
, then cellranger-atac mkfastq will
output FASTQ files in HAWT7ADXX/outs/fastq_path
.
To generate single cell accessibility counts for a single library, run cellranger-atac count with the following arguments. For a complete list of command-line arguments, run cellranger-atac count --help.
For help on which arguments to use to target a particular set of FASTQs, consult Specifying Input FASTQ Files for 10x Pipelines. |
These are the required command line arguments (also available through cellranger-atac count --help):
Additional optional parameters are available:
Option | Description |
---|---|
--description=TEXT | Sample description to embed in output files [default: ] |
--sample | Sample name as specified in the sample sheet supplied to cellranger-atac mkfastq.
Can take multiple comma-separated values, which is helpful if the same library was sequenced on multiple flow cells and the sample name used (and therefore FASTQ file prefix) is not identical between them. Doing this will treat all reads from the library, across flow cells, as one sample. Allowable characters in sample names are letters, numbers, hyphens, and underscores. |
--description=TEXT | Sample description to embed in output files [default: ] |
--project=TEXT |
Name of the project folder within a mkfastq or bcl2fastq- generated folder to pick FASTQs from |
--lanes=NUMS... |
Only use FASTQs from selected lanes |
--force-cells=NUM |
Define the top N barcodes with the most fragments overlapping peaks as cells and override the cell calling algorithm. N must be a positive integer <= 20,000. Use this option if the number of cells estimated by Cell Ranger ATAC is not consistent with the barcode rank plot |
--peaks=BED |
Override peak caller: specify peaks to use in downstream
analyses from supplied 3-column BED file. The supplied peaks
file must be sorted by position and not contain overlapping
peaks; comment lines beginning with # are allowed |
--dim-reduce=STR |
Dimensionality reduction mode for clustering. Note: plsa
has been temporarily restricted to run in single-threaded
mode due to technical considerations. This could lead to a
longer wall time for execution as compared to v1.2. Multi-
threading will be restored in a subsequent release [default:
lsa] [possible values: lsa, pca, plsa] |
--subsample-rate |
Downsample to preserve this fraction of reads |
--jobmode=MODE | Job manager to use. Valid options: local (default), sge, lsf, slurm, or path to a
.template file. Consult the Cluster Mode page for
details on configuring the pipeline to use a compute cluster [default: local] |
--localcores=NUM | Set max cores the pipeline may request at one time. Only applies to local jobs |
--localmem=NUM | Set max memory (GB) the pipeline may request at one time. Only applies to local jobs |
--localvmem=NUM | Set max virtual address space in GB for the pipeline. Only applies to local jobs |
--mempercore=NUM | Reserve enough threads for each job to ensure enough memory will be available, assuming each core on your cluster has at least this much memory available. Only applies to cluster jobmodes |
--maxjobs | Set max jobs submitted to cluster at one time. Only applies to cluster jobmodes |
--jobinterval | Set delay between submitting jobs to cluster, in ms. Only applies to cluster jobmodes |
--overrides=PATH | The path to a JSON file that specifies stage-level overrides for cores and memory.
Finer-grained than --localcores , --mempercore and --localmem . |
--uiport=PORT | Serve web UI at http://localhost:PORT |
--indices is deprecated. It is not needed with the output of cellranger-atac mkfastq, or bcl2fastq |
$ cd /home/jdoe/runs $ cellranger-atac count --id=sample345 \ --reference=/opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \ --sample=mysample \ --localcores=8 \ --localmem=64
Following a set of preflight checks to validate input arguments, cellranger-atac count pipeline stages will begin to run:
Martian Runtime - 4.0.7 Running preflight checks (please wait)...
By default, cellranger-atac will use all of the cores
available on your system to execute pipeline stages. You can specify a
different number of cores to use with the --localcores
option; for
example, --localcores=16
will limit
cellranger-atac to using up to sixteen cores at once.
Similarly, --localmem
will restrict the amount of memory (in GB)
used by cellranger-atac.
The pipeline will create a new folder named with the sample ID you specified
(e.g. /home/jdoe/runs/sample345
) for its output. If this folder
already exists, cellranger-atac will assume it is an
existing pipestance and attempt to resume running it.
A successful cellranger-atac count run should conclude with a message similar to this:
Outputs: - Per-barcode fragment counts & metrics: /home/jdoe/runs/sample345/outs/singlecell.csv - Position sorted BAM file: /home/jdoe/runs/sample345/outs/possorted_bam.bam - Position sorted BAM index: /home/jdoe/runs/sample345/outs/possorted_bam.bam.bai - Summary of all data metrics: /home/jdoe/runs/sample345/outs/summary.json - HTML file summarizing data & analysis: /home/jdoe/runs/sample345/outs/web_summary.html - Bed file of all called peak locations: /home/jdoe/runs/sample345/outs/peaks.bed - Raw peak barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/raw_peak_bc_matrix.h5 - Raw peak barcode matrix in mex format: /home/jdoe/runs/sample345/outs/raw_peak_bc_matrix - Directory of analysis files: /home/jdoe/runs/sample345/outs/analysis - Filtered peak barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/filtered_peak_bc_matrix.h5 - Filtered peak barcode matrix in mex format: /home/jdoe/runs/sample345/outs/filtered_peak_bc_matrix - Barcoded and aligned fragment file: /home/jdoe/runs/sample345/outs/fragments.tsv.gz - Fragment file index: /home/jdoe/runs/sample345/outs/fragments.tsv.gz.tbi - Filtered tf barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/filtered_tf_bc_matrix.h5 - Filtered tf barcode matrix in mex format: /home/jdoe/runs/sample345/outs/filtered_tf_bc_matrix - Loupe Browser input file: /home/jdoe/runs/sample345/outs/cloupe.cloupe - csv summarizing important metrics and values: /home/jdoe/runs/sample345/outs/summary.csv - Annotation of peaks with genes: /home/jdoe/runs/sample345/outs/peak_annotation.tsv - Peak-motif associations: /home/jdoe/runs/sample345/outs/peak_motif_mapping.bed Pipestance completed successfully!
The output of the pipeline will be contained in a folder named with the sample
ID you specified (e.g. sample345
). The subfolder named
outs/
will contain the main pipeline output files:
File Name | Description |
---|---|
singlecell.csv | Per-barcode fragment counts & metrics |
possorted_bam.bam | Position sorted BAM file |
possorted_bam.bam.bai | Position sorted BAM index |
summary.json | Summary of all data metrics |
web_summary.html | HTML file summarizing data & analysis |
peaks.bed | Bed file of all called peak locations |
raw_peak_bc_matrix.h5 | Raw peak barcode matrix in hdf5 format |
raw_peak_bc_matrix | Raw peak barcode matrix in mex format |
analysis | Directory of analysis files |
filtered_peak_bc_matrix.h5 | Filtered peak barcode matrix in hdf5 format |
filtered_peak_bc_matrix | Filtered peak barcode matrix |
fragments.tsv.gz | Barcoded and aligned fragment file |
fragments.tsv.gz.tbi | Fragment file index |
filtered_tf_bc_matrix.h5 | Filtered tf barcode matrix in hdf5 format |
filtered_tf_bc_matrix | Filtered tf barcode matrix in mex format |
cloupe.cloupe | Loupe Browser input file |
summary.csv | summary metrics in CSV form |
peak_annotation.tsv | Peak-gene associations based on genome proximity |
peak_motif_mapping.bed | Peak motif associations. Note that one peak could be associated with multiple transcription factor motifs. |
Once cellranger-atac count has successfully completed, you
can browse the resulting summary HTML
file in
any supported web browser, open the .cloupe
file in Loupe
Browser,
or refer to the Understanding
Output
section to explore the data by hand.