10x Genomics
Chromium Single Cell ATAC

Cell Ranger ATAC2.1, printed on 03/29/2025

Customized Secondary Analysis with Cell Ranger ATAC Reanalyze

The cellranger-atac reanalyze command reruns secondary analysis performed on the peak-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings.

Command line interface

These are the required command line arguments:

Argument	Description
`--id=ID`	Required. A unique run id and output folder name [a-zA-Z0-9_-]+ of maximum length 64 characters.
`--peaks=BED`	Required. Specify peaks to use in downstream analyses from supplied 3-column BED file. The supplied peaks file must be sorted by position and not contain overlapping peaks; comment lines beginning with `#` are allowed.
`--fragments=TSV.GZ`	Required. Path to the `fragments.tsv.gz.tbi` generated by `cellranger-atac count` or `aggr`. Note it is assumed that the tabix index file `fragments.tsv.gz.tbi` is present in the same directory.
`--reference=PATH`	Required. Path to folder containing a Cell Ranger ATAC or Cell Ranger ARC reference.

Optional command line parameters are listed below (also available through cellranger-atac reanalyze --help):

Option	Description
`--description=TEXT`	Sample description to embed in output files.
`--params=PARAMS_CSV`	A CSV file specifying analysis parameters.
`--force-cells=NUM`	Define the top N barcodes with the most fragments overlapping peaks as cells. N must be a positive integer <= 20,000. Please consult the documentation before using this option.
`--barcodes=LIST`	Specify barcodes to use in analysis. The barcodes could be specified in a text file that contains one barcode per line (blank lines are ignored). Or supply a CSV (with/without a header) whose first column will be used — exports from Loupe Browser will have this format.
`--agg=AGGREGATION_CSV`	If the input matrix was produced by `cellranger-atac aggr`, it is possible to pass the same aggregation CSV in order to retain per-library tag information in the resulting `.cloupe` file.
`--jobmode=MODE`	Job manager to use. Valid options: local (default), sge, lsf, slurm, or path to a `.template` file. Consult the Cluster Mode page for details on configuring the pipeline to use a compute cluster [default: local].
`--localcores=NUM`	Set max cores the pipeline may request at one time. Only applies to local jobs.
`--localmem=NUM`	Set max GB the pipeline may request at one time. Only applies to local jobs.
`--localvmem=NUM`	Set max virtual address space in GB for the pipeline. Only applies to local jobs.
`--mempercore=NUM`	Reserve enough threads for each job to ensure enough memory will be available, assuming each core on your cluster has at least this much memory available. Only applies to cluster jobmodes.
`--maxjobs=NUM`	Set max jobs submitted to cluster at one time. Only applies to cluster jobmodes.
`--jobinterval=NUM`	Set delay between submitting jobs to cluster, in ms. Only applies to cluster jobmodes.
`--overrides=PATH`	The path to a JSON file that specifies stage-level overrides for cores and memory. Finer-grained than `--localcores`, `--mempercore`, and `--localmem`.
`--uiport=PORT`	Serve web UI at `http://localhost:PORT`.

After determining input arguments and options, run cellranger-atac reanalyze. This example reanalyzes the results of an aggregation named AGG123:

cd /home/jdoe/runs
ls -1 AGG123/outs/*.gz # verify the input file exists
AGG123/outs/fragments.tsv.gz
cellranger-atac reanalyze --id=AGG123_reanalysis \
                            --peaks=AGG123/outs/peaks.bed \
                            --params=AGG123_reanalysis.csv \
                            --reference=/home/jdoe/refs/hg19 \
                            --fragments=/home/jdoe/runs/AGG123/outs/fragments.tsv.gz

The pipeline will begin to run, creating a new folder named with the reanalysis ID specified with the --id argument (e.g. /home/jdoe/runs/AGG123_reanalysis) for its output. If this output folder already exists, cellranger-atac will assume it is an existing pipestance and attempt to resume running it.

Pipeline outputs

A successful run should conclude with a message similar to this:

Outputs:
- Summary of all data metrics:                 /home/jdoe/runs/AGG123_reanalysis/outs/summary.json
- Per-barcode fragment counts & metrics:       /home/jdoe/runs/AGG123_reanalysis/outs/singlecell.csv
- Raw peak barcode matrix in hdf5 format:      /home/jdoe/runs/AGG123_reanalysis/outs/raw_peak_bc_matrix.h5
- Raw peak barcode matrix in mex format:       /home/jdoe/runs/AGG123_reanalysis/outs/raw_peak_bc_matrix
- Filtered peak barcode matrix in hdf5 format: /home/jdoe/runs/AGG123_reanalysis/outs/filtered_peak_bc_matrix.h5
- Filtered peak barcode matrix in mex format:  /home/jdoe/runs/AGG123_reanalysis/outs/filtered_peak_bc_matrix
- Directory of analysis files:                 /home/jdoe/runs/AGG123_reanalysis/outs/analysis
- HTML file summarizing aggregation analysis : /home/jdoe/runs/AGG123_reanalysis/outs/web_summary.html
- Filtered tf barcode matrix in hdf5 format:   /home/jdoe/runs/AGG123_reanalysis/outs/filtered_tf_bc_matrix.h5
- Filtered tf barcode matrix in mex format:    /home/jdoe/runs/AGG123_reanalysis/outs/filtered_tf_bc_matrix
- Loupe Browser input file:                    /home/jdoe/runs/AGG123_reanalysis/outs/cloupe.cloupe
- Annotation of peaks with genes:              /home/jdoe/runs/AGG123_reanalysis/outs/peak_annotation.tsv
 
Pipestance completed successfully!

Refer to the Overview page an explanation about the outputs.

Parameters

The CSV file passed to --params should have one row for every parameter that you want to customize. There is no header row. If a parameter is not specified in your CSV, its default value will be used. See Common Use Cases for some examples.

Here are detailed descriptions of each parameter. For parameters that subset the data, a default value of null indicates that no subsetting happens by default.

Parameter	Type	Default Value	Recommended Range	Description
`dim_reduce`	str	lsa	[lsa, pca, plsa]	Pick dimensionality reduction technique. Note: `plsa` has been temporarily restricted to run in single-threaded mode due to technical considerations. This could lead to a longer wall time for execution as compared to v1.2. Multi-threading will be restored in a subsequent release
`num_analysis_bcs`	int	null	Cannot be set higher than the available number of cells or lower than zero.	Randomly subset data to N barcodes for all analyses. Reduce this parameter if you want to improve performance or simulate results from lower cell counts. Resets to available number of cells if specified to be higher than it.
`num_dr_bcs`	int	null	Cannot be set higher than the available number of cells.	Randomly subset data to N barcodes when computing PCA projection (the most memory-intensive step). The PCA projection will still be applied to the full dataset, i.e. your final results will still reflect all the data. Try reducing this parameter if your analysis is running out of memory.
`num_dr_features`	int	null	Cannot be set higher than the number of peaks in the BED file.	Subset data to the top N features (that is, peaks, ranked by normalized dispersion) when computing LSA/PCA/PLSA projection (the most memory intensive step). The dimreduce projection will still be applied to the full dataset, i.e. your final results will still reflect all the data. Try reducing this parameter if your analysis is running out of memory.
`num_comps`	int	15	10-100 (20 for PLSA), depending on the number of cell populations / clusters you expect to see.	Compute N principal components for LSA/PCA/PLSA. Setting this too high may cause spurious clusters to be called.
`graphclust_neighbors`	int	0	10-500, depending on desired granularity.	Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering. The actual number of neighbors used is the maximum of this value and that determined by `neighbor_a` and `neighor_b`. Set this value to zero to use those values instead.
`neighbor_a`	float	-230.0	Determines how clustering granularity scales with cell count.	The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b *log10(n_cells). The actual number of neighbors used is the maximum of this value and `graphclust_neighbors`.
`neighbor_b`	float	120.0	Determines how clustering granularity scales with cell count.	The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b* log10(n_cells). The actual number of neighbors used is the maximum of this value and `graphclust_neighbors`.
`max_clusters`	int	10	10-50, depending on the number of cell populations / clusters you expect to see.	Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called.
`tsne_input_pcs`	int	null	Cannot be set higher than the `num_comps` parameter.	Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs, independent of the clustering/differential expression. You may find that TSNE is faster and/or the output looks better when using fewer PCs.
`tsne_perplexity`	int	30	30-50	TSNE perplexity parameter (see the TSNE FAQ for more details). When analyzing 100k+ cells, increasing this parameter may improve TSNE results, but the algorithm will be slower.
`tsne_theta`	float	0.5	Must be between 0 and 1.	TSNE theta parameter (see the TSNE FAQ for more details). Higher values yield faster, more approximate results (and vice versa). The runtime and memory performance of TSNE will increase dramatically if this is set below 0.25.
`tsne_max_dims`	int	2	Must be 2 or 3.	Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections (note: runtime will increase significantly).
`tsne_max_iter`	int	1000	1000-10000	Number of total TSNE iterations. Runtime increases linearly with number of iterations.
`tsne_stop_lying_iter`	int	250	Cannot be set higher than `tsne_max_iter`.	Iteration at which TSNE learning rate is reduced.
`tsne_mom_switch_iter`	int	250	Cannot be set higher than `tsne_max_iter`.	Iteration at which TSNE momentum is reduced. Cannot be set higher than `tsne_max_iter`.
`random_seed`	int	0	any integer	Random seed. Due to the randomized nature of the algorithms, changing this will produce slightly different results.

Common use cases

These examples illustrate what could be included in the --params CSV file in some common situations.

1. More principal components and clusters

For very large or diverse cell populations, the defaults may not capture the full variation between cells. In that case, try increasing the number of principal components and/or clusters. To run dimensionality reduction with 50 components and k-means with up to 30 clusters, include this in the CSV:

num_comps,50
max_clusters,30

2. Less memory usage

You can limit the memory usage of the analysis by computing the LSA projection on a subset of cells and features. This is especially useful for large datasets (100k+ cells). If you have 100k cells, it is reasonable to use only 50% of them for LSA - the memory usage will be cut in half, but you will still be well equipped to detect rare subpopulations. Limiting the number of features will reduce memory even further. To compute the LSA projection using 50000 cells and 3000 peaks, include this in the CSV:

num_dr_bcs,50000
num_dr_features,3000

Note: To avoid bias, subsetting of cells is done randomly. Subsetting of features is done by binning features by their mean expression across cells, then measuring the dispersion (a variance-like parameter) of each gene's expression normalized to the other features in its bin.

Cell Ranger ATAC

Loupe

10x Genomics
Chromium Single Cell ATAC

Customized Secondary Analysis with Cell Ranger ATAC Reanalyze

Command line interface

Pipeline outputs

Parameters

Common use cases

1. More principal components and clusters

2. Less memory usage

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger ATAC

Loupe

10x GenomicsChromium Single Cell ATAC

Customized Secondary Analysis with Cell Ranger ATAC Reanalyze

Command line interface

Pipeline outputs

Parameters

Common use cases

1. More principal components and clusters

2. Less memory usage

10x Genomics
Chromium Single Cell ATAC