Cell Ranger 1.0, printed on 12/14/2019
The cellranger pipeline outputs several CSV files which contain automated secondary analysis results. A subset of these results are used to render the Analysis View in the run summary.
Before clustering the cells, Principal Component Analysis (PCA) is run on the normalized filtered gene-barcode matrix to reduce the number of feature (gene) dimensions. This produces a projection of each cell onto the first 10 principal components.
$ cd /home/jdoe/runs/sample345/outs $ head -2 analysis/pca/projection.csv Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10 AAACATACAACGAA-1,-0.2765,-5.7056,6.5324,-12.2736,-1.4390,-1.1656,-0.1754,-2.9748,3.3785,1.6539
This also produces a components matrix which indicates how much each gene contributed to each principal component.
$ head -2 analysis/pca/components.csv PC,ENSG00000228327,ENSG00000237491,ENSG00000177757,ENSG00000225880,...,ENSG00000160310 1,-0.0044,0.0039,-0.0024,-0.0016,...,-0.0104
After running PCA, t-distributed Stochastic Neighbor Embedding (t-SNE) is run to visualize cells in a 2-D space.
$ head -5 analysis/tsne/projection.csv Barcode,TSNE-1,TSNE-2 AAACATACAACGAA-1,-13.5494,1.4674 AAACATACTACGCA-1,-2.7325,-10.6347 AAACCGTGTCTCGC-1,12.9590,-1.6369 AAACGCACAACCAC-1,-9.3585,-6.7300
K-means clustering is then run to group cells together that have similar expression profiles. K-means is run for many values of K=2,...,10 where K corresponds to the number of clusters. The corresponding results for each K is separated into its own directory.
$ ls analysis/kmeans 10_clusters 3_clusters 5_clusters 7_clusters 9_clusters 2_clusters 4_clusters 6_clusters 8_clusters
$ ls analysis/kmeans/3_clusters clusters.csv differential_expression.csv
For each K, cellranger produces cluster assignments for each cell.
$ head -5 analysis/kmeans/3_clusters/clusters.csv Barcode,Cluster AAACATACAACGAA-1,2 AAACATACTACGCA-1,2 AAACCGTGTCTCGC-1,1 AAACGCACAACCAC-1,3
cellranger also produces a table indicating which genes are differentially expressed in each cluster relative to the other clusters. Each gene is assigned a weight within each cluster. This weight is computed from the pipeline's differential expression algorithm and is used to determine the most differentially expressed genes within each cluster. There is also an adjacent column indicating how many UMI counts per cell are contained for that gene within each cluster.
$ head -5 analysis/kmeans/3_clusters/differential_expression.csv Gene ID,Gene Name,Cluster 1 Weight,Cluster 1 UMI counts/cell,Cluster 2 Weight,Cluster 2 UMI counts/cell,Cluster 3 Weight,Cluster 3 UMI counts/cell ENSG00000228327,RP11-206L10.2,28.8050,0.01028,-25.3453,0.0,-3.4597,0.00346 ENSG00000237491,RP11-206L10.9,2.7509,0.01234,12.4303,0.0107,-15.1813,0.0069 ENSG00000177757,FAM87B,25.0959,0.0020,-9.8430,0.0,-15.2528,0.0 ENSG00000225880,LINC00115,29.2860,0.0205,1.1274,0.01340,-30.4135,0.0069
If you analyzed a multi-species experiment, the analysis output will look different. For example, the human-mouse mixing experiment is run to verify system functionality. It consists of mixing approximately 600 human (HEK293T) cells and 600 mouse (3T3) cells in a 1:1 ratio.
cellranger produces a single analysis CSV file indicating whether each GEM contains only a single human cell (hg19), a single mouse cell (mm10) or multiple mouse and human cells (Multiplet).
$ cd /home/jdoe/runs/sample345/outs $ head -5 analysis/gem_classification.csv barcode,hg19,mm10,call AAACATACACCTCC-1,3,815,mm10 AAACATACACCTGA-1,14,780,mm10 AAACATACACGTGT-1,2,439,mm10 AAACATACAGACTC-1,700,776,Multiplet