Cell Ranger DNA1.0, printed on 11/20/2024
Cell Ranger DNA's pipelines analyze sequencing data produced from Chromium Single Cell DNA sequencing libraries. This involves the following steps:
Run cellranger-dna mkfastq on the Illumina BCL output folder to generate FASTQ files.
Run cellranger-dna cnv on FASTQ files produced by cellranger-dna mkfastq.
For the following example, assume that the Illumina BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX.
First, follow the instructions on running cellranger-dna mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX, then cellranger-dna mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path.
Argument | Description |
---|---|
--id | (Required) A unique run ID string: e.g. sample345 |
--fastqs | (Required) Path of the FASTQ folder generated by cellranger-dna mkfastq e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flowcells. Doing this will treat all reads from the library, across flowcells, as one sample. |
--reference | (Required) Path to the Cell Ranger DNA compatible reference. |
--sample | (Optional) Sample name as specified in the sample sheet supplied to mkfastq.
Can take multiple comma-separated values, which is helpful if the sample was sequenced on multiple flowcells and the sample name used (and therefore FASTQ file prefix) is not identical between them. Doing this will treat all reads from the library, across flowcells, as one sample. |
--description | (Optional) Detailed sample description. |
--downsample | (Optional, float ) Downsample input FASTQs to approximately this many gigabases of input sequence. |
--maxreads | (Optional, int ) Downsample input FASTQs to approximately this many single-ended reads. Cannot be used if --downsample is specified. |
--force-cells | (Optional, int ) Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger DNA is not consistent with the barcode rank plot. |
--soft-min-avg-ploidy | (Optional, float ) Encourage Cell Ranger DNA to prefer calls that put the cell's mean ploidy above this number. Refer to the Determining absolute copy numbers section on the Interpreting output page for when this option should be used. |
--soft-max-avg-ploidy | (Optional, float ) Encourage Cell Ranger DNA to prefer calls that put the cell's mean ploidy below this number. Refer to the Determining absolute copy numbers section on the Interpreting output page for when this option should be used. |
--lanes | (Optional) Lanes associated with this sample |
--localcores | (Optional) Restricts cellranger-dna to use specified number of cores to execute pipeline stages. By default, cellranger-dna will use all of the cores available on your system. |
--localmem | (Optional) Restricts cellranger-dna to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger-dna will use 90% of the memory available on your system. Please note that cellranger-dna requires at least 16 GB of memory to run all pipeline stages. |
--indices | (Deprecated. Optional. Only used for output from cellranger-dna demux) Sample indices associated with this sample. Comma-separated list of:
|
After determining these input arguments, run cellranger-dna:
$ cd /home/jdoe/runs $ cellranger-dna cnv --id=sample345 \ --reference=/opt/path/to/ref \ --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \ --sample=mysample \
Following a set of preflight checks to validate input arguments, cellranger-dna cnv pipeline stages will begin to run:
Martian Runtime - 3.0.0 Running preflight checks (please wait)... 2018-04-26 14:12:20 [runtime] (ready) ID.sample345.CNV_CALLER_SINGLECELL_CS.CNV_CALLER_SINGLECELL._ALIGNER.SETUP_CHUNKS 2018-04-26 14:12:20 [runtime] (run:local) ID.sample345.CNV_CALLER_SINGLECELL_CS.CNV_CALLER_SINGLECELL._ALIGNER.SETUP_CHUNKS.fork0.chnk0.main ...
By default, cellranger-dna will use all of the cores available on your system to execute pipeline stages. You can specify a different number of cores to use with the --localcores option; for example, --localcores=16 will limit cellranger-dna to using up to sixteen cores at once. Similarly, --localmem will restrict the amount of memory (in GB) used by cellranger-dna.
The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345) for its output. If this folder already exists, cellranger-dna will assume it is an existing pipestance and attempt to resume running it.
A successful cellranger-dna cnv run should conclude with a message similar to this:
2018-04-26 14:23:25 [runtime] (join_complete) ID.sample345.CNV_CALLER_SINGLECELL_CS.CNV_CALLER_SINGLECELL.POSTPROCESSING.COMPILE_CNV_DATA Outputs: \- Position-sorted BAM: /home/jdoe/runs/sample345/outs/possorted_bam.bam \- Position-sorted BAM index: /home/jdoe/runs/sample345/outs/possorted_bam.bam.bai \- CNV calls with imputation: /home/jdoe/runs/sample345/outs/node_cnv_calls.bed \- CNV calls without imputation: /home/jdoe/runs/sample345/outs/node_unmerged_cnv_calls.bed \- Highly mappable regions: /home/jdoe/runs/sample345/outs/mappable_regions.bed \- Per-cell summary metrics: /home/jdoe/runs/sample345/outs/per_cell_summary_metrics.csv \- Analysis summary metrics: /home/jdoe/runs/sample345/outs/summary.csv \- Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html \- HDF5 file with CNV data: /home/jdoe/runs/sample345/outs/cnv_data.h5 \- Loupe visualization file: /home/jdoe/runs/sample345/outs/dloupe.dloupe \- Run alerts: /home/jdoe/runs/sample345/outs/alarms_summary.txt Pipestance completed successfully!
The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345). The subfolder named outs will contain the main pipeline output files.
Once cellranger-dna cnv has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .dloupe file in Loupe scDNA Browser, or refer to the Understanding Output section to explore the data by hand.