HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Gene Expression

Run Analysis

The cellranger pipeline outputs several CSV files which contain automated secondary analysis results. A subset of these results are used to render the Analysis View in the run summary.

Dimensionality Reduction

Before clustering the cells, Principal Component Analysis (PCA) is run on the normalized filtered gene-barcode matrix to reduce the number of feature (gene) dimensions. This produces a projection of each cell onto the first 10 principal components.

$ cd /home/jdoe/runs/sample345/outs
$ head -2 analysis/pca/projection.csv
Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10
AAACATACAACGAA-1,-0.2765,-5.7056,6.5324,-12.2736,-1.4390,-1.1656,-0.1754,-2.9748,3.3785,1.6539

This also produces a components matrix which indicates how much each gene contributed to each principal component.

$ head -2 analysis/pca/components.csv
PC,ENSG00000228327,ENSG00000237491,ENSG00000177757,ENSG00000225880,...,ENSG00000160310
1,-0.0044,0.0039,-0.0024,-0.0016,...,-0.0104

After running PCA, t-distributed Stochastic Neighbor Embedding (t-SNE) is run to visualize cells in a 2-D space.

$ head -5 analysis/tsne/projection.csv
Barcode,TSNE-1,TSNE-2
AAACATACAACGAA-1,-13.5494,1.4674
AAACATACTACGCA-1,-2.7325,-10.6347
AAACCGTGTCTCGC-1,12.9590,-1.6369
AAACGCACAACCAC-1,-9.3585,-6.7300

Clustering and Differential Expression

K-means clustering is then run to group cells together that have similar expression profiles. K-means is run for many values of K=2,...,10 where K corresponds to the number of clusters. The corresponding results for each K is separated into its own directory.

$ ls analysis/kmeans
10_clusters  3_clusters  5_clusters  7_clusters  9_clusters
2_clusters   4_clusters  6_clusters  8_clusters
$ ls analysis/kmeans/3_clusters clusters.csv differential_expression.csv

For each K, cellranger produces cluster assignments for each cell.

$ head -5 analysis/kmeans/3_clusters/clusters.csv
Barcode,Cluster
AAACATACAACGAA-1,2
AAACATACTACGCA-1,2
AAACCGTGTCTCGC-1,1
AAACGCACAACCAC-1,3

cellranger also produces a table indicating which genes are differentially expressed in each cluster relative to the other clusters. Each gene is assigned a weight within each cluster. This weight is computed from the pipeline's differential expression algorithm and is used to determine the most differentially expressed genes within each cluster. There is also an adjacent column indicating how many UMI counts per cell are contained for that gene within each cluster.

$ head -5 analysis/kmeans/3_clusters/differential_expression.csv
Gene ID,Gene Name,Cluster 1 Weight,Cluster 1 UMI counts/cell,Cluster 2 Weight,Cluster 2 UMI counts/cell,Cluster 3 Weight,Cluster 3 UMI counts/cell
ENSG00000228327,RP11-206L10.2,28.8050,0.01028,-25.3453,0.0,-3.4597,0.00346
ENSG00000237491,RP11-206L10.9,2.7509,0.01234,12.4303,0.0107,-15.1813,0.0069
ENSG00000177757,FAM87B,25.0959,0.0020,-9.8430,0.0,-15.2528,0.0
ENSG00000225880,LINC00115,29.2860,0.0205,1.1274,0.01340,-30.4135,0.0069

Multiple Species

If you analyzed a multi-species experiment, the analysis output will look different. For example, the human-mouse mixing experiment is run to verify system functionality. It consists of mixing approximately 600 human (HEK293T) cells and 600 mouse (3T3) cells in a 1:1 ratio.

cellranger produces a single analysis CSV file indicating whether each GEM contains only a single human cell (hg19), a single mouse cell (mm10) or multiple mouse and human cells (Multiplet).

$ cd /home/jdoe/runs/sample345/outs
$ head -5 analysis/gem_classification.csv
barcode,hg19,mm10,call
AAACATACACCTCC-1,3,815,mm10
AAACATACACCTGA-1,14,780,mm10
AAACATACACGTGT-1,2,439,mm10
AAACATACAGACTC-1,700,776,Multiplet