10x Genomics
Chromium Single Cell Immune Profiling

Cell Ranger7.1, printed on 04/02/2025

CRISPR Algorithms Overview

Pooled CRISPR screens
Experimental design
Protospacer calling
Perturbation efficiency and transcriptome-wide effects

Pooled CRISPR screens

Feature Barcode technology may be used to perform pooled CRISPR screens in an efficient and scalable fashion. For an introduction to pooled CRISPR screens, see: Perturb-Seq (Adamson et al., 2016, and Dixit et al., 2016), CRISP-seq (Jaitin et al., 2016), CROP-seq (Datlinger et al., 2017), or CRISPR-QTL (Gasperini et al., 2019). See the Glossary for definitions of terms.

The goal of a pooled CRISPR screen is to use CRISPR to perturb the expression of a list of pre-identified genes and quantitatively measure the effects of those perturbations on the transcriptome of the cells of interest. The cells are typically transfected in a pooled fashion with a number of plasmids that code for guide RNAs that target the pre-selected genes of interest. Since the assay captures both transcripts and transfected guide RNAs from each cell, one can correlate the changes in the transcriptome with the perturbations received by each sub-group of cells.

Experimental design

A wide variety of experimental designs are used in pooled CRISPR screens, depending on the nature of the biological questions being investigated and the scope of the experiment. We emphasize three general principles commonly employed in such experiments.

Multiple guide RNAs per target gene. In general, it is hard to predict the functional efficacy of a guide RNA construct purely from its in silico design. In order to mitigate the risk of non-functional guide RNA molecules that do not perturb the expression of their target genes significantly, pooled CRISPR screens typically employ 2-5 guide RNA constructs per target gene.
Non-targeting guide RNAs that function as negative controls. In order to measure the effectiveness of a particular guide RNA construct in perturbing the expression of its target gene, or the effects of such a perturbation on the rest of the transcriptome, one would need to perform a differential expression analysis where the cells expressing the relevant guide RNA(s) are compared with control cells. The experimental design typically includes control guide RNA constructs that are explicitly designed not to target any annotated genes in the reference transcriptome; these guide RNAs are called "non-targeting" guides. The control cells used in the differential expression analyses are typically cells identified as containing only (some combination of) non-targeting guides. In order to account for possible error in the design or transfection of these non-targeting guide RNA constructs, typically more than one such construct (usually 2-5) are used in the experiment.
Carefully designed and validated transfection protocol. Based on the particular transfection protocol used in the assay, the distribution of guide RNA constructs among cells can vary widely, from as few as a median of 1 guide per cell to as high as 15 per cell. The transfection protocol is usually carefully designed based on the requirements imposed by the biological questions of interest, such as the median number of guide RNA constructs per cell or the number of cells required per perturbation of interest. In addition, typically the transfection protocol is validated by some combination of PCR-based techniques and next-generation sequencing (see Methods sections of the References).

Protospacer calling

Pooled CRISPR screens may have low levels of ambient guide RNA in solution, resulting in a small number of "background" UMI counts in cells that do not express any guide RNA constructs. For each guide RNA construct specified in the Feature Reference CSV File, Cell Ranger assumes two populations of cells: one that expresses the guide and another that does not. The latter population may have background guide RNA UMI counts.

To identify the subpopulation of cells that significantly express a particular guide RNA above the background, Cell Ranger fits a Gaussian Mixture Model to the log-transformed molecules/cell distribution. This model calculates the probability of a given cell belonging to the population expressing the guide RNA instead of the background population and uses that probability to identify cells expressing the guide RNA. The algorithm runs independently for each guide RNA specified in the Feature Reference CSV.

Please note that the protospacer calling step fails when there are fewer than two guide RNAs per target gene. Refer to the Experimental design section for details.

Cell Ranger v7.0 and later allow CRISPR Guide Capture datasets to be aggregated. Protospacer calling is performed again on the aggregated data.

Perturbation efficiency and transcriptome-wide effects

In pooled CRISPR screens, two central questions arise. First, to what extent did the expression of the target genes change amongst those cells expressing the guide RNAs that targeted those genes ("Perturbation Efficiency")? Second, what effects did these perturbations have on the transcriptome of those cells ("Perturbation Effects")?

Both questions rely on differential expression analyses. As with Gene Expression, Cell Ranger uses the quick and simple method sSeq (Yu, Huber, & Vitek, 2013) in order to find differentially expressed genes between the perturbed cells and the control cells (cells that only contain guide RNAs designed specifically to be non-targeting). For details on the implementation of sSeq within Cell Ranger, see Gene Expression.

To quantify Perturbation Efficiency, we report the log2-fold-change in the expression of each target gene. To address transcriptome-wide Perturbation Effects, we provide a list of top perturbed genes for each perturbation, in addition to a list of how every gene in the reference transcriptome changed under each perturbation.

Each of the above results are calculated "by feature," where the cells are grouped based on the combinations of guide RNAs they contain, or "by target," where they are grouped based on the combinations of genes targeted by those guide RNAs. (The latter can lead to increased statistical power in cases where each gene is targeted by multiple guides, since cells where the same combinations of genes are perturbed may be grouped together.)

CRISPR output files are described in detail, with examples, here: CRISPR output files.