Cell Ranger DNA1.1, printed on 11/23/2024
Analysis software for the 10x Genomics single cell DNA product is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support. |
The cellranger-dna reanalyze command re-runs copy number variation analysis on the read counts per bin per cell matrix, optionally with different parameters, a subset of cells, or structured by a user-provided Newick-formatted tree.
cellranger-dna reanalyze only works with cnv_data.h5 files generated by Cell Ranger DNA v1.1.
|
These are the most common command line arguments (run cellranger-dna reanalyze --help for a full list):
Argument | Description |
---|---|
--id=ID | A unique run ID string: e.g. AGG123_reanalysis |
--cnv-data=H5 | Path of cnv_data.h5 from a previous cellranger-dna invocation (cellranger-dna cnv, cellranger-dna reanalyze, or cellranger-dna aggr). |
--reference=PATH | Path to a Cell Ranger DNA reference. |
--csv=CSV | Path of CSV file containing barcode subset definitions (see Configuration). |
--description=TEXT | (optional) More detailed sample description. |
--tree=NEWICK | (optional) Path to a Newick format tree file that defines the new tree structure. If this flag is not set, the data will be clustered as-is. Each leaf of this tree must correspond to a row in the CSV passed to --csv. |
--soft-min-avg-ploidy=FLOAT | (optional) Use a known lower limit on the average ploidy of the sample. |
--soft-max-avg-ploidy=FLOAT | (optional) Use a known upper limit on the average ploidy of the sample. |
After specifying these input arguments, run cellranger-dna reanalyze. In this example, we're reanalyzing the results of an aggregation named AGG123
:
$ cd /home/jdoe/runs $ ls -1 AGG123/outs/*.h5 # verify the input file exists AGG123/outs/cnv_data.h5 $ cellranger-dna reanalyze --id=AGG123_reanalysis \ --cnv-data=AGG123/outs/cnv_data.h5 \ --csv=AGG123_reanalysis.csv \ --reference=/home/jdoe/refs/GRCh37
The pipeline will begin to run, creating a new folder named with the specified reanalysis ID (e.g. /home/jdoe/runs/AGG123_reanalysis
). If this folder already exists, cellranger-dna will assume it is an existing pipestance and attempt to resume running it.
A successful run should conclude with a message similar to this:
2019-05-06 21:40:29 [runtime] (run:local) ID.AGG123_reanalysis.CNV_REANALYZER_CS.DLOUPE_PREPROCESS.fork0.join 2019-05-06 21:40:31 [runtime] (chunks_complete) ID.AGG123_reanalysis.CNV_REANALYZER_CS._POSTPROCESSING.MAKE_WEBSUMMARY 2019-05-06 21:40:37 [runtime] (join_complete) ID.AGG123_reanalysis.CNV_REANALYZER_CS.DLOUPE_PREPROCESS Outputs: - Run alerts: /home/jdoe/runs/AGG123_reanalysis/outs/alarms_summary.txt - HDF5 file with CNV data: /home/jdoe/runs/AGG123_reanalysis/outs/cnv_data.h5 - Loupe visualization file: /home/jdoe/runs/AGG123_reanalysis/outs/dloupe.dloupe - CNV calls with imputation: /home/jdoe/runs/AGG123_reanalysis/outs/node_cnv_calls.bed - CNV calls without imputation: /home/jdoe/runs/AGG123_reanalysis/outs/node_unmerged_cnv_calls.bed - Per-cell summary metrics: /home/jdoe/runs/AGG123_reanalysis/outs/per_cell_summary_metrics.csv - Reanalyze specification: /home/jdoe/runs/AGG123_reanalysis/outs/reanalyze.csv - Analysis summary metrics: /home/jdoe/runs/AGG123_reanalysis/outs/summary.csv - Newick guide-tree for clustering: null Pipestance completed successfully!
Refer to the Analysis page for an explanation of the output.
You may select your barcodes of interest for each group directly, using a separate barcodes file for each group.
A text editor or Excel may be used to construct the configuration CSV. Your spreadsheet may look something like this:
A | B | |
---|---|---|
1 | library_id | barcodes_csv |
2 | normal | /home/jdoe/normal_barcodes.csv |
3 | tumor_primary | /home/jdoe/tumor_primary_barcodes.csv |
4 | tumor_metastases | /home/jdoe/tumor_metastases_barcodes.csv |
When you save this to CSV, it will look something like this:
library_id,barcodes_csv normal,/home/jdoe/normal_barcodes.csv tumor_primary,/home/jdoe/tumor_primary_barcodes.csv tumor_metastases,/home/jdoe/tumor_metastases_barcodes.csv
The barcodes CSV files will each have one barcode entry per line, including the GEM well suffix (see GEM wells). Each such file will look something like:
AAACGGGTCAAAGTGA-1 AAAGATGCAATGGGAC-1 ... TTTGTCATCCGCACGA-1
You may use a Group ID, determined perchance by exploring the data in Loupe scDNA Browser, as a proxy for a list of barcodes. In this case, all constituent cells of that group will be included.
As before, a text editor or Excel may be used to construct the CSV. Your spreadsheet may look something like this:
A | B | |
---|---|---|
1 | library_id | node_id |
2 | normal | 842 |
3 | tumor_primary | 912 |
4 | tumor_metastases | 919 |
When you save this to CSV, it will look something like this:
library_id,node_id normal,842 tumor_primary,912 tumor_metastases,919
If you define more than one group in the configuration CSV, you must also guide clustering by providing a Newick-formatted tree. For instance, if you have defined three groups in the CSV: normal, tumor_primary, and tumor_metastases, you may force the pipeline to arrange them with the normal tissue as an outgroup:
(normal,(tumor_primary,tumor_metastases));
cellranger-dna reanalyze requires binary tree structure for any Newick input files. Polytomies are not supported. |
These examples illustrate how you may use cellranger-dna reanalyze in some common situations.
When the outputs of a cellranger-dna cnv suggest the use of the options for imposing upper or lower limits on the average ploidy, cellranger-dna reanalyze is the most suitable way to do so, avoiding the significant computational overhead of the read processing stages of the cellranger-dna cnv.
You may wish to omit noisy or replicating cells from the outputs of the analsis.
These cells can be identified from the per_cell_summary_metrics.csv or from exploration in the Loupe scDNA Browser.
From these sources, a barcodes.csv
file (with one cell barcode per line) may be constructed containing only the cell barcodes of interest,
which when employed with cellranger-dna reanalyze will produce outputs with only those cells.
As show in the example above, you may impose a self-chosen tree structure on the clustering using a Newick-formatted tree.