Software  ›   pipelines
If your question is not answered here, please email us at:  support@10xgenomics.com

10x Genomics
Chromium Single Cell CNV

Customized Secondary Analysis using cellranger-dna reanalyze

The cellranger-dna reanalyze command re-runs copy number variation analysis on the read counts per bin per cell matrix, optionally with different parameters, a subset of cells, or structured by a user-provided Newick-formatted tree.

Command Line Interface

These are the most common command line arguments (run cellranger-dna reanalyze --help for a full list):

ArgumentDescription
--id=IDA unique run ID string: e.g. AGG123_reanalysis
--cnv-data=H5Path of cnv_data.h5 from a previous cellranger-dna invocation (cellranger-dna cnv, cellranger-dna reanalyze, or cellranger-dna aggr).
--reference=PATHPath to a Cell Ranger DNA reference.
--csv=CSVPath of CSV file containing barcode subset definitions (see Configuration).
--description=TEXT(optional) More detailed sample description.
--tree=NEWICK(optional) Path to a Newick format tree file that defines the new tree structure. If this flag is not set, the data will be clustered as-is. Each leaf of this tree must correspond to a row in the CSV passed to --csv.
--soft-min-avg-ploidy=FLOAT(optional) Use a known lower limit on the average ploidy of the sample.
--soft-max-avg-ploidy=FLOAT(optional) Use a known upper limit on the average ploidy of the sample.

After specifying these input arguments, run cellranger-dna reanalyze. In this example, we're reanalyzing the results of an aggregation named AGG123:

$ cd /home/jdoe/runs
$ ls -1 AGG123/outs/*.h5 # verify the input file exists
AGG123/outs/cnv_data.h5
$ cellranger-dna reanalyze --id=AGG123_reanalysis \
                       --cnv-data=AGG123/outs/cnv_data.h5 \
                       --csv=AGG123_reanalysis.csv \
                       --reference=/home/jdoe/refs/GRCh37

The pipeline will begin to run, creating a new folder named with the specified reanalysis ID (e.g. /home/jdoe/runs/AGG123_reanalysis). If this folder already exists, cellranger-dna will assume it is an existing pipestance and attempt to resume running it.

Pipeline Outputs

A successful run should conclude with a message similar to this:

2019-05-06 21:40:29 [runtime] (run:local)       ID.AGG123_reanalysis.CNV_REANALYZER_CS.DLOUPE_PREPROCESS.fork0.join
2019-05-06 21:40:31 [runtime] (chunks_complete) ID.AGG123_reanalysis.CNV_REANALYZER_CS._POSTPROCESSING.MAKE_WEBSUMMARY
2019-05-06 21:40:37 [runtime] (join_complete)   ID.AGG123_reanalysis.CNV_REANALYZER_CS.DLOUPE_PREPROCESS
 
Outputs:
- Run alerts:                       /home/jdoe/runs/AGG123_reanalysis/outs/alarms_summary.txt
- HDF5 file with CNV data:          /home/jdoe/runs/AGG123_reanalysis/outs/cnv_data.h5
- Loupe visualization file:         /home/jdoe/runs/AGG123_reanalysis/outs/dloupe.dloupe
- CNV calls with imputation:        /home/jdoe/runs/AGG123_reanalysis/outs/node_cnv_calls.bed
- CNV calls without imputation:     /home/jdoe/runs/AGG123_reanalysis/outs/node_unmerged_cnv_calls.bed
- Per-cell summary metrics:         /home/jdoe/runs/AGG123_reanalysis/outs/per_cell_summary_metrics.csv
- Reanalyze specification:          /home/jdoe/runs/AGG123_reanalysis/outs/reanalyze.csv
- Analysis summary metrics:         /home/jdoe/runs/AGG123_reanalysis/outs/summary.csv
- Newick guide-tree for clustering: null
 
Pipestance completed successfully!

Refer to the Analysis page for an explanation of the output.

Configuration

Selecting Cells Using a List of Cell Barcodes

You may select your barcodes of interest for each group directly, using a separate barcodes file for each group.

A text editor or Excel may be used to construct the configuration CSV. Your spreadsheet may look something like this:

AB
1library_idbarcodes_csv
2normal/home/jdoe/normal_barcodes.csv
3tumor_primary/home/jdoe/tumor_primary_barcodes.csv
4tumor_metastases/home/jdoe/tumor_metastases_barcodes.csv

When you save this to CSV, it will look something like this:

library_id,barcodes_csv
normal,/home/jdoe/normal_barcodes.csv
tumor_primary,/home/jdoe/tumor_primary_barcodes.csv
tumor_metastases,/home/jdoe/tumor_metastases_barcodes.csv

The barcodes CSV files will each have one barcode entry per line, including the GEM well suffix (see GEM wells). Each such file will look something like:

AAACGGGTCAAAGTGA-1
AAAGATGCAATGGGAC-1
...
TTTGTCATCCGCACGA-1

Selecting the Cells of a Group of Interest

You may use a Group ID, determined perchance by exploring the data in Loupe scDNA Browser, as a proxy for a list of barcodes. In this case, all constituent cells of that group will be included.

As before, a text editor or Excel may be used to construct the CSV. Your spreadsheet may look something like this:

AB
1library_idnode_id
2normal842
3tumor_primary912
4tumor_metastases919

When you save this to CSV, it will look something like this:

library_id,node_id
normal,842
tumor_primary,912
tumor_metastases,919

Guiding Clustering with a Custom Newick Tree

If you define more than one group in the configuration CSV, you must also guide clustering by providing a Newick-formatted tree. For instance, if you have defined three groups in the CSV: normal, tumor_primary, and tumor_metastases, you may force the pipeline to arrange them with the normal tissue as an outgroup:

(normal,(tumor_primary,tumor_metastases));

Common Use Cases

These examples illustrate how you may use cellranger-dna reanalyze in some common situations.

1. Impose Upper or Lower Limits on the Copy Number Variation Analysis

When the outputs of a cellranger-dna cnv suggest the use of the options for imposing upper or lower limits on the average ploidy, cellranger-dna reanalyze is the most suitable way to do so, avoiding the significant computational overhead of the read processing stages of the cellranger-dna cnv.

2. Omit Replicating or Noisy Cells from the Analysis

You may wish to omit noisy or replicating cells from the outputs of the analsis. These cells can be identified from the per_cell_summary_metrics.csv or from exploration in the Loupe scDNA Browser. From these sources, a barcodes.csv file (with one cell barcode per line) may be constructed containing only the cell barcodes of interest, which when employed with cellranger-dna reanalyze will produce outputs with only those cells.

3. Imposing a Structure on the Clustering

As show in the example above, you may impose a self-chosen tree structure on the clustering using a Newick-formatted tree.