HOME  ›   pipelines

# VDJ Cell Calling Algorithm

In the 10x system, there is a large population of droplets (GEMs), some of which contain a cell, and of those, some of which contain a T or B cell.

Detection of T or B cells depends on identification and counting of V(D)J transcripts from them. Some T and B cells have very low expression levels for these transcripts, and thus these may not be detected. Conversely, sufficiently high levels of extracellular mRNA may result in some barcodes being misidentified as T or B cells. Thus the goal of the VDJ cell calling algorithm is to approximate the set of barcodes that contain a T or B cell.

The algorithm is executed as part of the assembly algorithm. To be identified as a T or B cell, a barcode must satisfy the following three requirements:

1. There must be a productive, confident contig, and if there is only one such contig, there must be more than one UMI supporting its junction region. (In the denovo case, we require only that there is a contig.) Although other cell types can exhibit transcription within the TCR and BCR loci, only T and B cells produce fully rearranged transcripts that contain both a V and a C segment. Therefore having a productive contig is good evidence that a transcript from a T or B cell was present in the GEM. However, it is possible that the transcript was background -- present in the fluid between cells rather than in an intact cell. Requiring more than one UMI provides some protection against this case.

2. There must be at least three filtered UMIs having at least two read pairs each (see Assembly Algorithm). This reduces the likelihood of mis-identifying a cell as a T or B cell based solely on background transcripts.

3. Compute the N50 value of the number of read pairs per UMI, across all barcodes. If for a given barcode, the maximum read pair count across filtered UMIs is less then 3% of this N50, do not call the barcode a cell. This provides some protection against transcripts arising from index hopping on an Illumina flowcell, and from other forms of cross-library contamination.

In addition to the three requirements listed above, Cell Ranger 3.1 also introduced a new filter to account for noise introduced by plasma cells and B cells containing large amounts of RNA (as documented in the Cell Ranger 3.1 release notes). This 1) tightens the is_cell filter for low-frequency clones that share a chain with a higher-frequency or large clone, and 2) shrinks high-frequency clones to remove noise from mRNA leakage caused by sample processing (e.g. not due to genuine biological clonal expansion).

Additional cell filters are imposed during clonotype grouping.