Chromium Single Cell Immune Profiling
Cell Ranger3.0, printed on 09/21/2023
Cell Calling Algorithm
Even though we see many putative cell barcodes in the data, only a fraction of them correspond to droplets that truly contained a cell. The remaining droplets generate background reads. The goal of this algorithm is to select the barcodes corresponding to droplets that contained cells.
Cell calling is performed independently of V(D)J read filtering and assembly.
- Take all UMIs with Read Pairs per UMI (RPU) > 1.
- Compute the N50 RPU per barcode. The N50 statistic is an estimate of centrality that is robust to contamination by large numbers of low-value elements.
- Among the highest-read-pair barcodes that comprise 50% of the read pairs, compute the 99th percentile (P99) of the N50 RPU. Remove barcodes whose RPU exceeds this P99.
- Fit a 2-component Gaussian mixture model (GMM) to the log(N50 RPU per barcode).
- Let the barcode with the smallest RPU in the higher-mean component be the RPU threshold.
- Take all UMIs that exceed the RPU threshold.
- Fit a 2-component Gaussian mixture model to the log(Filtered UMIs per barcode).
- Call as cells the barcodes with posterior probability > 0.5 of belonging to the higher-mean component.
- Report as "Cell Count Confidence" the fraction of barcodes in the higher-mean component with posterior probability > 0.99. This indicates the quality of the separation between background and cell-associated barcodes.