Cell Ranger7.1 (latest), printed on 01/31/2023
Feature Barcode technology may be used to perform pooled CRISPR screens in an efficient and scalable fashion. For an introduction to pooled CRISPR screens, see: Perturb-Seq (Adamson et al., 2016, and Dixit et al., 2016), CRISP-seq (Jaitin et al., 2016), CROP-seq (Datlinger et al., 2017), or CRISPR-QTL (Gasperini et al., 2018). See the Glossary for definitions of terms.
The goal of a pooled CRISPR screen is to use CRISPR to perturb the expression of a list of pre-identified genes and quantitatively measure the effects of those perturbations on the transcriptome of the cells of interest. The cells are typically transfected in a pooled fashion with a number of plasmids that code for guide RNAs that target the pre-selected genes of interest. Since the assay captures both transcripts and transfected guide RNAs from each cell, one can correlate the changes in the transcriptome with the perturbations received by each sub-group of cells.
A wide variety of experimental designs are used in pooled CRISPR screens, depending on the nature of the biological questions being investigated and the scope of the experiment. We emphasize three general principles commonly employed in such experiments.
Multiple guide RNAs per target gene. In general, it is hard to predict the functional efficacy of a guide RNA construct purely from its in silico design. In order to mitigate the risk of non-functional guide RNA molecules that do not perturb the expression of their target genes significantly, pooled CRISPR screens typically employ 2-5 guide RNA constructs per target gene.
Non-targeting guide RNAs that function as negative controls. In order to measure the effectiveness of a particular guide RNA construct in perturbing the expression of its target gene, or the effects of such a perturbation on the rest of the transcriptome, one would need to perform a differential expression analysis where the cells expressing the relevant guide RNA(s) are compared with control cells. The experimental design typically includes control guide RNA constructs that are explicitly designed not to target any annotated genes in the reference transcriptome; these guide RNAs are called "non-targeting" guides. The control cells used in the differential expression analyses are typically cells identified as containing only (some combination of) non-targeting guides. In order to account for possible error in the design or transfection of these non-targeting guide RNA constructs, typically more than one such construct (usually 2-5) are used in the experiment.
Carefully designed and validated transfection protocol. Based on the particular transfection protocol used in the assay, the distribution of guide RNA constructs among cells can vary widely, from as few as a median of 1 guide per cell to as high as 15 per cell. The transfection protocol is usually carefully designed based on the requirements imposed by the biological questions of interest, such as the median number of guide RNA constructs per cell or the number of cells required per perturbation of interest. In addition, typically the transfection protocol is validated by some combination of PCR-based techniques and next-generation sequencing (see Methods sections of the References).
In pooled CRISPR screens, the presence of low levels of ambient guide RNA in solution typically leads to a small number of "background" UMI counts even in cells that do not express any guide RNA constructs. In the Protospacer Calling step, Cell Ranger identifies, for each guide RNA construct specified in the Feature Reference CSV File, the sub-population of cells that express that particular guide RNA significantly above background.
For each guide RNA, Cell Ranger assumes that there are two populations of cells: one that expresses the guide and one that does not. (This latter population only has UMI counts due to ambient guide RNA molecules.) To distinguish these two populations from each other, Cell Ranger fits a Gaussian Mixture Model to the log-transformed Molecules/Cell distribution for the guide RNA in question. This model calculates the probability that a given cell belongs to the population expressing the guide RNA rather than the background population, and uses that probability to identify cells expressing the guide RNA. The algorithm runs independently for each guide RNA specified in the Feature Reference CSV File.
Cell Ranger v7.0 and later allow CRISPR Guide Capture datasets to be aggregated. Protospacer calling is performed again on the aggregated data.
In pooled CRISPR screens, two central questions arise. First, to what extent did the expression of the target genes change amongst those cells expressing the guide RNAs that targeted those genes ("Perturbation Efficiency")? Second, what effects did these perturbations have on the transcriptome of those cells ("Perturbation Effects")?
Both questions rely on differential expression analyses. As with Gene Expression, Cell Ranger uses the quick and simple method sSeq (Yu, Huber, & Vitek, 2013) in order to find differentially expressed genes between the perturbed cells and the control cells (cells that only contain guide RNAs designed specifically to be non-targeting). For details on the implementation of sSeq within Cell Ranger, see Gene Expression.
To quantify Perturbation Efficiency, we report the log2-fold-change in the expression of each target gene. To address transcriptome-wide Perturbation Effects, we provide a list of top perturbed genes for each perturbation, in addition to a list of how every gene in the reference transcriptome changed under each perturbation.
Each of the above results are calculated "by feature," where the cells are grouped based on the combinations of guide RNAs they contain, or "by target," where they are grouped based on the combinations of genes targeted by those guide RNAs. (The latter can lead to increased statistical power in cases where each gene is targeted by multiple guides, since cells where the same combinations of genes are perturbed may be grouped together.)
CRISPR output files are described in detail, with examples, here: CRISPR output files.
Yu, D., Huber, W. amd Vitek, O. Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics 29, 1275–1282 (2013).
Adamson, B., Norman, T., Jost, M., Cho, M., Nunez, J., Chen, Y., Villalta, J., Gilbert, L., Horlbeck, M., Hein, M., Pak, R., Gray, A., Gross, C., Dixit, A., Parnas, O., Regev, A., and Weissman, J. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867-1882 (2016).
Dixit, A., Parnas, O., Li, B., Chen, J., Fulco, C., Jerby-Arnon, L., Marjanovic, N., Dionne, D., Burks, T., Raychowdhury, R., Adamson, B., Norman, T., Lander, E., Weissman, J., Friedman, N., and Regev, A. Perturb-Seq: Dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).
Jaitin, D., Weiner, A., Yofe, I., Lara-Astiaso, D., Keren-Shaul, H., David, E., Meir Salame, T., Tanay, A., van Oudenaarden, A., and Amit, I. Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq. Cell 167, 1883–1896 (2016).
Datlinger, P., Rendeiro, A., Schmidl, C., Krausgruber, T., Traxler, P., Klughammer, J., Schuster, L., Kuchler, A., Alpar, D., and Bock, C. Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods 14, 297–301 (2017).
Gasperini, M., Hill, A., McFaline-Figueroa, J., Martin, B., Trapnell, C., Ahituv, N., and Shendure, J. crisprQTL mapping as a genome-wide association framework for cellular genetic screens. bioRxiv 314344, doi: https://doi.org/10.1101/314344 (2018).