Software  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Single-Library Analysis with Cell Ranger

Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression. It also processes data generated by using Feature Barcode technology and/or Single Cell Targeted Gene Expression. The analysis involves the following steps:

  1. Run cellranger mkfastq on the Illumina BCL output folder to generate FASTQ files.

  2. Run cellranger count on each GEM well that was demultiplexed by cellranger mkfastq. For Targeted Gene Expression libraries, see Targeted Gene Expression Analysis for instructions on how to provide the target gene panel information. If you created a Feature Barcode library alongside the Gene Expression library, you will pass them both to cellranger count at this point. See Feature Barcode Analysis for details.

  3. Optionally, run cellranger aggr to aggregate multiple GEM wells from a single experiment that were analyzed by cellranger count.

  4. Optionally run cellranger reanalyze to re-run the secondary analysis on a library or aggregated set of libraries (i.e., PCA, t-SNE, and clustering) and be able to fine-tune parameters.

For the following example, assume that the Illumina BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX.

Run cellranger mkfastq

First, follow the instructions on running cellranger mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX, then cellranger mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path.

Run cellranger count

To generate single cell feature counts for a single library, run cellranger count with the following arguments. For a complete listing of the arguments accepted, see the Command Line Argument Reference below, or run cellranger count --help.

After determining these input arguments, run cellranger:

$ cd /home/jdoe/runs
$ cellranger count --id=sample345 \
                   --transcriptome=/opt/refdata-gex-GRCh38-2020-A \
                   --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \
                   --sample=mysample \
                   --expect-cells=1000 \
                   --localcores=8 \
                   --localmem=64

Following a series of checks to validate input arguments, cellranger count pipeline stages will begin to run:

Martian Runtime - v4.0.0
 
Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...

Checking optional arguments... ...

By default, cellranger will use all of the cores available on your system to execute pipeline stages. You can specify a different number of cores to use with the --localcores option; for example, --localcores=16 will limit cellranger to using up to sixteen cores at once. Similarly, --localmem will restrict the amount of memory (in GB) used by cellranger.

The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345) for its output. If this folder already exists, cellranger will assume it is an existing pipestance and attempt to resume running it.

Output Files

A successful cellranger count run should conclude with a message similar to this:

Outputs:
- Run summary HTML:                         /opt/sample345/outs/web_summary.html
- Run summary CSV:                          /opt/sample345/outs/metrics_summary.csv
- BAM:                                      /opt/sample345/outs/possorted_genome_bam.bam
- BAM index:                                /opt/sample345/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX:    /opt/sample345/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:   /opt/sample345/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:  /opt/sample345/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /opt/sample345/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV:            /opt/sample345/outs/analysis
- Per-molecule read information:            /opt/sample345/outs/molecule_info.h5
- CRISPR-specific analysis:                 null
- Loupe Browser file:                       /opt/sample345/outs/cloupe.cloupe
- Feature Reference:                        null
- Target Panel File:                        null

Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully!

yyyy-mm-dd hh:mm:ss Shutting down. Saving pipestance info to "tiny/tiny.mri.tgz"

The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345). The subfolder named outs will contain the main pipeline output files:

File NameDescription
web_summary.htmlRun summary metrics and charts in HTML format
metrics_summary.csvRun summary metrics in CSV format
possorted_genome_bam.bamReads aligned to the genome and transcriptome annotated with barcode information
possorted_genome_bam.bam.baiIndex for possorted_genome_bam.bam
filtered_feature_bc_matrixFiltered feature-barcode matrices containing only cellular barcodes in MEX format. (In Targeted Gene Expression samples, the non-targeted genes are not present.)
filtered_feature_bc_matrix_h5.h5Filtered feature-barcode matrices containing only cellular barcodes in HDF5 format. (In Targeted Gene Expression samples, the non-targeted genes are not present.)
raw_feature_bc_matricesUnfiltered feature-barcode matrices containing all barcodes in MEX format
raw_feature_bc_matrix_h5.h5Unfiltered feature-barcode matrices containing all barcodes in HDF5 format
analysisSecondary analysis data including dimensionality reduction, cell clustering, and differential expression
molecule_info.h5Molecule-level information used by cellranger aggr to aggregate samples into larger datasets
cloupe.cloupeLoupe Browser visualization and analysis file
feature_reference.csv(Feature Barcode only) Feature Reference CSV file
target_panel.csv(Targeted GEX only) Targed panel CSV file

Once cellranger count has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .cloupe file in Loupe Browser, or refer to the Understanding Output section to explore the data by hand.

Command-Line Argument Reference

ArgumentDescription
--idA unique run ID string: e.g. sample345
--fastqsEither:
Path of the fastq_path folder generated by cellranger mkfastq
e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path. This contains a directory hierarchy that cellranger count will automatically traverse.
- OR -
Any folder containing fastq files, for example if the fastq files were generated by a service provider and delivered outside the context of the mkfastq output directory structure.
Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flowcells.
Doing this will treat all reads from the library, across flowcells, as one sample.
If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr.
This argument cannot be used when performing Feature Barcode analysis; use --libraries instead
--librariesPath to a libraries.csv file declaring FASTQ paths and library types of input libraries. Required for feature-barcode analysis. See Feature Barcode Analysis for details. When using this argument, --fastqs and --sample must not be passed.
--sampleSample name as specified in the sample sheet supplied to cellranger mkfastq.
Can take multiple comma-separated values, which is helpful if the same library was sequenced on multiple flowcells and the sample name used (and therefore fastq file prefix) is not identical between them.
Doing this will treat all reads from the library, across flowcells, as one sample.
If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr.
Allowable characters in sample names are letters, numbers, hyphens, and underscores.
--transcriptomePath to the Cell Ranger compatible transcriptome reference e.g.
  • For a human-only sample, use /opt/refdata-gex-GRCh38-2020-A
  • For a human and mouse mixture sample, use /opt/refdata-gex-GRCh38-and-mm10-2020-A
--feature-refPath to a Feature Reference CSV file declaring the Feature Barcode reagents in use in the experiment. Required for Feature Barcode analysis. See Feature Barcode Reference for details on how to construct the feature reference.
--target-panelPath to a Target Panel CSV file declaring the target panel used, if any. Required for Targeted Gene Expression analysis. See Targeted Gene Expression Analysis for details
--no-target-umi-filter(optional) Add this flag to disable targeted UMI filtering. See Targeted Algorithms for details.
--expect-cells(optional) Expected number of recovered cells. Default: 3,000 cells.
--force-cells(optional) Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
--nosecondary(optional) Add this flag to skip secondary analysis of the feature-barcode matrix (dimensionality reduction, clustering and visualization). Set this if you plan to use cellranger reanalyze or your own custom analysis.
--no-libraries Proceed with processing using a --feature-ref but no feature-barcode data specified with the --libraries flag.
--chemistry (optional) Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection. Select one of:
  • auto for auto-detection (default),
  • threeprime for Single Cell 3′,
  • fiveprime for Single Cell 5′,
  • SC3Pv2 for Single Cell 3′ v2,
  • SC3Pv3 for Single Cell 3′ v3,
  • SC5P-PE for Single Cell 5′ paired-end (both R1 and R2 are used for alignment),
  • SC5P-R2 for Single Cell 5′ R2-only (where only R2 is used for alignment).
  • SC3Pv1 for Single Cell 3′ v1. NOTE: this mode cannot be auto-detected. It must be set explicitly with this option.
--r1-length(optional) Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3′ v2 or Single Cell 5′. This and --r2-length are useful for determining the optimal read length for sequencing.
--r2-length(optional) Hard-trim the input R2 sequence to this length.
--lanes(optional) Lanes associated with this sample
--localcoresRestricts cellranger to use specified number of cores to execute pipeline stages. By default, cellranger will use all of the cores available on your system.
--localmemRestricts cellranger to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger will use 90% of the memory available on your system.