Cell Ranger 3.0 (latest), printed on 01/19/2019
Cell Ranger processes all Feature Barcoding data through a basic counting pipeline that determines the count of each Feature in each cell. This analysis is done by the cellranger count pipeline. The pipeline outputs a unified feature-barcode matrix that contains gene expression counts alongside Feature Barcode counts for each cell barcode. The feature-barcode matrix replaces the gene-barcode matrix emitted by older versions of Cell Ranger.
The pipeline first extracts and corrects the cell barcode and UMI from the feature library using the same methods as gene expression read processing. It then then matches the Feature Barcoding read against the list of features declared in the Feature Barcode Reference. The counts for each feature are available in the feature-barcode matrix output files and in the Loupe Cell Browser output file.
To enable Feature Barcoding analysis, cellranger count needs two new inputs:
--librariesflag, and declares the FASTQ files and library type for each input dataset. In a typical Feature Barcoding analysis there will be two input libraries: one for the normal single-cell gene expression reads, and one for the Feature Barcoding reads. This argument replaces the
--feature-refflag and declares the Feature Barcodes in use in the experiment. For each unique Feature Barcode used, this file declares a feature name and identifier, the barcode sequence of this feature, and a pattern indicating how to extract the barcode from the read sequence. See Feature Barcode Reference for details on how to construct the feature reference.
The complete set of arguments to cellranger count are covered in Single-Sample Analysis.
When inputting Feature Barcode data to Cell Ranger via the Libraries CSV File, you must declare the library_type of each library. Specific values for library_type will enable additional downstream processing, specifically for CRISPR Guide Capture and Antibody Capture. The following table outlines the types of libraries that can be specified and what they mean for the downstream processing.
|For use with experiments measuring cell surface protein expression levels via an antibody staining assay. Enables a t-SNE projection of the cells using only the Antibody Capture / Cell Surface Protein feature counts. This projection is available in an output file and in Loupe Cell Browser. See the Antibody Algorithms page for more details.|
|Enables an analysis of gene expression changes caused by the presence of CRISPR perturbations, in a Perturb-Seq style assay. See the CRISPR Overview page for more details. This mode also creates a t-SNE projection using only the CRISPR guide counts. This projection is available in an output file and in Loupe Cell Browser.|
|Provides processing of the Feature Barcoding reads and a basic summary of the sequencing quality and library quality, but performs no special processing of the Feature Barcoding counts.|
The Libraries CSV File declares the input FASTQ data for the libraries that make up a Feature Barcoding experiment. This will include one library containing Single Cell Gene Expression reads, and on more more libraries containing Feature Barcoding reads. To use cellranger count in Feature Barcoding mode, you must create a Libraries CSV File and pass it with the
--libraries flag. The following table describes what the content should be in the Libraries CSV File.
|A fully qualified path to the directory containing the demultiplexed FASTQ files for this sample. Analagous to the |
|Same as the |
|The FASTQ data will be interpreted using the rows from the feature reference file that have a ‘feature_type’ that matches this library_type. This field is case-sensitive, and must match a valid library type as described in the Library / Feature Types section. Must be Gene Expression for the gene expression libraries. Must be one of Custom, Antibody Capture, or CRISPR Guide Capture for Feature Barcoding libraries. At least one Gene Expression entry must be present.|
Gene expression + CRISPR libraries. In this example we've demultiplexed the sequencing data from two libraries named
CRISPR_sample1 on the bcl2fastq / mkfastq sample sheet. This genertated FASTQ files named
CRISPR_sample1_S0_L001_001.fastq.gz into the path
/opt/foo. We pass the FASTQ sample names and paths to Cell Ranger with the appropriate library types:
|/opt/foo/||CRISPR_sample1||CRISPR Guide Capture|
Gene expression + Antibody libraries.
In this example we've demultiplexed the sequencing data from two libraries named
Ab_sample2 on the bcl2fastq / mkfastq sample sheet. This genertated FASTQ files named
Ab_sample2_S0_L001_001.fastq.gz into the path
/opt/foo. We pass the FASTQ sample names to Cell Ranger with the appropriate library types:
If your assay scheme creates a library containing multiple library_types, for example if you're using CRISPR Guide Capture and Antibody Capture features, you will need to select a single library_type for the library when inputting it into the Libraries CSV File. This will provide only one kind of specialized library analysis. To get multiple specialized analyses, you will need to run Cell Ranger multiple times, passing different library_type values in the Libraries CSV File. This is a limitation of Cell Ranger 3.0 that will be lifted in future releases. Regardless of the library_type specified, the feature-barcode matrix outputs will contain counts for all specified features.
A Feature Reference CSV File is required when processing Feature Barcoding data. It declares the molecule structure and unique Feature Barcode sequence of each feature present in your experiment. Each line of the CSV declares one unique Feature Barcode.
The Feature Reference CSV File is passed to cellranger count with the
--feature-ref flag. Please note that the CSV may not contain characters outside of the ASCII range.
This table describes the columns in the Feature Reference CSV File. Example files can be found below.
||Unique ID for this feature. Must not contain whitespace, quote or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome.|
||Human-readable name for this feature. Must not contain whitespace. This name will be displayed in Loupe Cell Browser.|
||Specifies which RNA sequencing read contains the Feature Barocde sequence. Must be R1 or R2. Note: in most cases R2 is the correct read.|
||Specifies how to extract the Feature Barcode sequence from the read. See the [Barcode Extraction Pattern](#pattern) section below for details.|
||Nucleotide barcode sequence associated with this feature. E.g., antibody barcode or sgRNA protospacer sequence.|
||Type of the feature. See the [Library/Feature Types](#feature-types) section for details on allowed values of this field. FASTQ data specified in the Library CSV File with a
||(Optional) Reference gene identifier of the target gene of a CRISPR guide RNA. A gene with this id must exist in the reference transcriptome. Providing target_gene_id and target_gene_name will enable the pipeline to perform differential expression analysis, assuming that control ("Non-Targeting") guides are also specified. Non-targeting guides must contain the value "Non-Targeting" in the "target_gene_id" and "target_gene_name" fields. See the CRISPR Overview section for more details.|
||(Optional) Gene name of target gene of a CRISPR guide RNA. The gene name corresponding to the gene referenced in the target_gene_id field must match the gene name given here. See the CRISPR Overview section for more details.|
The pattern field of the feature reference defines how to locate the Feature Barcode within a read. The Feature Barcode may appear at a known offset with respect to the start or end of the read or may appear at a fixed position relative to a known anchor sequence. The pattern column can be made up of a combination of these elements:
Any constant sequences made up of A, C, G and T in the pattern must match exactly in the read sequence. Any N in the pattern is allowed to match a single arbitrary base. A modest number of fixed bases should be used to minimize the chance of a sequencing error disrupting the match. The fixed sequence should also be long enough to uniquely identify the position of the Feature Barcode. For feature types that require an non-N anchor, we recommend 12bp-20bp of constant sequence. The extracted Feature Barcode sequences are corrected up to a Hamming distance of 1 using the 10x barcode correction algorithm that is used to correct cell barcodes.
TotalSeq™-B is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v3 assay. The Feature Barcode sequence appears at a fixed position (base 10) in the R2 read.
Example TotalSeq™-B Feature Reference CSV Please note, this is a pre-release set of TotalSeq-B antibodies. The barcode sequences have since changed. Please refer to https://www.biolegend.com/totalseq for the latest conjugated barcode information.
TotalSeq™-C is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 5' assay. The Feature Barcode sequence appears at a fixed position (base 10) in the R2 read.
TotalSeq™-A is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v2 and Single Cell 3' v3 kits. The Feature Barcode sequence appears at the start of the R2 read.
In CRISPR Guide Capture assays, the barcode is the CRISPR protospacer sequence. The protospacer is followed by a downstream constant sequence in the guide RNA which is used as an anchor to identify the location of the protospacer. We recommend using a 12bp-20bp constant sequence that can be uniquely identified, but is short enough that it is unlikely to be disrupted by a sequencing error. In the example Feature Reference CSV file we declare six guide RNA features with six distinct barcode / protospacer sequences. We use the
target_gene_name columns to declare the target gene of each guide RNA, for use in downstream CRISPR perturbation analysis. Two guides are declared with
Non-Targeting. Cells containing
Non-Targeting guides will be used as controls for CRISPR perturbation analysis. The four remaining guides target two genes.