HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Multiome ATAC + Gene Exp.

Customized Secondary Analysis with Cell Ranger ARC Reanalyze

Table of Contents

The cellranger-arc reanalyze command reruns secondary analysis performed on the peak-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings.

Command line interface

These are the required command line arguments:

ArgumentDescription
--id=IDRequired. A unique run id and output folder name [a-zA-Z0-9_-]+ of maximum length 64 characters.
--matrix=H5Required. Path to a feature-barcode matrix H5 generated by cellranger-arc count or aggr. If you intend to subset to a set of barcodes then use the raw matrix, otherwise use the filtered feature-barcode matrix.
--atac-fragments=TSV.GZRequired. Path to the atac_fragments.tsv.gz generated by cellranger-arc count or aggr. Note it is assumed that the tabix index file atac_fragments.tsv.gz.tbi is present in the same directory.
--reference=PATHRequired. Path to folder containing cellranger-arc-compatible reference. Reference packages can be downloaded from support.10xgenomics.com or constructed using the cellranger-arc mkref command. Note this reference must match the reference used for the initial cellranger-arc count run.

Optional command line parameters are listed below (also available through cellranger-arc reanalyze --help):

OptionDescription
--description=TXTSample description to embed in output files [default: ]
--barcodes=LISTSpecify barcodes to use in analysis. The barcodes could be specified in a text file that contains one barcode per line (blank lines are ignored). CSV (with/without a header) is also accepted. Only the first column of the CSV is used — exports from Loupe Browser will have this format. Required if neither --peaks nor --params has been specified.
--min-atac-count=NUMCell caller override: define the minimum number of ATAC transposition events in peaks (ATAC counts) for a cell barcode. Note: this option must be specified in conjunction with min-gex-count. With --min-atac-count=X and --min-gex-count=Y a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts.
--min-gex-count=NUMCell caller override: define the minimum number of GEX UMI counts for a cell barcode. Note: this option must be specified in conjunction with min-atac-count. With --min- atac-count=X and --min-gex-count=Y a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts.
--peaks=BED Override peak caller: specify peaks to use in secondary analyses from supplied 3-column BED file. The supplied peaks file must be sorted by position and not contain overlapping peaks; comment lines beginning with # are allowed. Required if neither --barcodes nor --params has been specified.
--params=CSVSpecify key-value pairs in CSV format for analysis: any subset of random_seed, k_means_max_clusters, feature_linkage_max_dist_mb, num_gex_pcs, num_atac_pcs. For example, to override the number of GEX principal components used to 15 and the distance threshold for feature linkage computation to 2.5 MB, the CSV would take the form:
    num_gex_pcs,15
    feature_linkage_max_dist_mb,2.5
    
Required if neither --peaks nor --barcodes has been specified.
--agg=AGGREGATION_CSVIf the input matrix was produced by cellranger-arc aggr, it is possible to pass the same aggregation CSV in order to retain per-library tag information in the resulting .cloupe file.
--jobmode=MODEJob manager to use. Valid options: local (default), sge, lsf, slurm, or path to a .template file. Search for help on "Cluster Mode" at support.10xgenomics.com for more details on configuring the pipeline to use a compute cluster [default: local].
--localcores=NUMSet max cores the pipeline may request at one time. Only applies to local jobs.
--localmem=NUMSet max GB the pipeline may request at one time. Only applies to local jobs.
--localvmem=NUMSet max virtual address space in GB for the pipeline. Only applies to local jobs.
--mempercore=NUMReserve enough threads for each job to ensure enough memory will be available, assuming each core on your cluster has at least this much memory available. Only applies to cluster jobmodes.
--maxjobs=NUMSet max jobs submitted to cluster at one time. Only applies to cluster jobmodes.
--jobinterval=NUMSet delay between submitting jobs to cluster, in ms. Only applies to cluster jobmodes.
--overrides=PATHThe path to a JSON file that specifies stage-level overrides for cores and memory. Finer-grained than --localcores, --mempercore, and --localmem.
--uiport=PORTServe web UI at http://localhost:PORT

After determining input arguments and options, run cellranger-arc reanalyze. This example reanalyzes the results of an aggregation named AGG123 in order to filter out doublet barcodes:

$ cd /home/jdoe/runs
$ ls -1 AGG123/outs/*.gz # verify the input file exists
AGG123/outs/fragments.tsv.gz
$ cellranger-arc reanalyze  --id=AGG123_reanalysis \
                            --barcodes=no_doublets.csv \
                            --matrix=/home/jdoe/runs/AGG123/outs/raw_feature_bc_matrix.h5 \
                            --reference=/home/jdoe/refs/hg19 \
                            --atac-fragments=/home/jdoe/runs/AGG123/outs/atac_fragments.tsv.gz

The pipeline will begin to run, creating a new folder named with the reanalysis ID specified with the --id argument (e.g. /home/jdoe/runs/AGG123_reanalysis) for its output. If this output folder already exists, cellranger-arc will assume it is an existing pipestance and attempt to resume running it.

Pipeline outputs

A successful run should conclude with a message similar to this:

2021-04-26 03:30:46 [runtime] (update)          ID.AGG123_reanalysis.SC_ATAC_GEX_REANALYZER_CS.ATAC_GEX_CLOUPE_PREPROCESS.fork0 join_running
2021-04-26 03:36:05 [runtime] (join_complete)   ID.AGG123_reanalysis.SC_ATAC_GEX_REANALYZER_CS.ATAC_GEX_CLOUPE_PREPROCESS
 
Outputs:
- Secondary analysis outputs:
    clustering:
      atac: {
        ...
      }
      gex:  {
        ...
      }
    dimensionality_reduction:
      atac: {
        ...
      }
      gex:  {
        ...
      }
    feature_linkage:
      ...
    tf_analysis:
      ...
- Filtered feature barcode matrix HDF5:          /home/jdoe/runs/AGG123_reanalysis/outs/filtered_feature_bc_matrix.h5
- Loupe browser visualization file:              /home/jdoe/runs/AGG123_reanalysis/outs/cloupe.cloupe
- ATAC peak annotations based on proximal genes: /home/jdoe/runs/AGG123_reanalysis/outs/atac_peak_annotation.tsv
- Secondary analysis summary:                    /home/jdoe/runs/AGG123_reanalysis/outs/summary.json
 
Pipestance completed successfully!

Refer to the Analysis page for an explanation of the output.

Parameters

The CSV file passed to --params should have one row for every parameter that you want to customize. There is no header row. If a parameter is not specified in your CSV, its default value will be used.

Here are detailed descriptions of each parameter. For parameters that subset the data, a default value of null indicates that no subsetting happens by default.

ParameterTypeDefault ValueRecommended RangeDescription
feature_linkage_max_dist_mbfloat10.1-5, depending on the what is a biological meaningful length scale for the organismChange the distance over which pairs of features are considered for feature linkage estimation. Increasing this number will increase the number of linkage, but features that are very far away on the genome are less likely to be causally linked.
num_atac_pcsint1510-100, depending on the number of cell populations / clusters you expect to see.Compute N principal components for LSA. Setting this too high may cause spurious clusters to be called.
num_gex_pcsint1010-100, depending on the number of cell populations / clusters you expect to see.Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called.
k_means_max_clustersint102-10, depending on the number of cell populations / clusters you expect to see.Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called.
random_seedint0any 64-bit integerDue to the randomized nature of the algorithms, changing this will produce slightly different results.