HOME  ›   pipelines

# Run Analysis

The count pipeline outputs several CSV files which contain automated secondary analysis results. A subset of these results are used to render the Analysis View in the run summary.

## Dimensionality Reduction

Before clustering, Principal Component Analysis (PCA) is run on the normalized filtered feature-barcode matrix to reduce the number of feature (gene) dimensions. Only gene expression features are used as PCA features. The PCA analysis produces four output files. The first is a projection of each spot onto the first N principal components. By default N=10 (N=100 when chemistry batch correction is enabled).

$cd /home/jdoe/runs/sample345/outs$ head -2 analysis/pca/10_components/projection.csv
Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10
AAACATACAACGAA-1,-0.2765,-5.7056,6.5324,-12.2736,-1.4390,-1.1656,-0.1754,-2.9748,3.3785,1.6539


The second file is a components matrix which indicates how much each feature contributed (the loadings) to each principal component. Features that were not included in the PCA analysis have all of their loading values set to zero.

$head -2 analysis/pca/10_components/components.csv PC,ENSG00000228327,ENSG00000237491,ENSG00000177757,ENSG00000225880,...,ENSG00000160310 1,-0.0044,0.0039,-0.0024,-0.0016,...,-0.0104  The third file records the proportion of total variance explained by each principal component. When choosing the number of principal components that are significant, it is useful to look at the plot of variance explained as a function of PC rank - when the numbers start to flatten out, subsequent PCs are unlikely to represent meaningful variation in the data. $ head -5 analysis/pca/10_components/variance.csv
PC,Proportion.Variance.Explained
1,0.0056404970744118104
2,0.0038897311237809061
3,0.0028803714818085419
4,0.0020830581822081206


The final file lists the normalized dispersion of each feature, after binning features by their mean expression across the dataset. This provides a useful measure of variability of each feature.

$head -5 analysis/pca/10_components/dispersion.csv Feature,Normalized.Dispersion ENSG00000228327,2.0138970131886671 ENSG00000237491,1.3773662040549017 ENSG00000177757,-0.28102027567224191 ENSG00000225880,1.9887312950109921  ## t-SNE After running PCA, t-distributed Stochastic Neighbor Embedding (t-SNE) is run to visualize spots in a 2-D space. $ head -5 analysis/tsne/2_components/projection.csv
Barcode,TSNE-1,TSNE-2
AAACATACAACGAA-1,-13.5494,1.4674
AAACATACTACGCA-1,-2.7325,-10.6347
AAACCGTGTCTCGC-1,12.9590,-1.6369
AAACGCACAACCAC-1,-9.3585,-6.7300


## Clustering

Clustering is then run to group spots that have similar expression profiles together, based on their projection into PCA space. Graph-based clustering (under graphclust) is run once as it does not require a pre-specified number of clusters. K-means (under kmeans) is run for many values of K=2,...,N where K corresponds to the number of clusters, and N=10 by default. The corresponding results for each K is separated into its own directory.

$ls analysis/clustering graphclust kmeans_10_clusters kmeans_2_clusters kmeans_3_clusters kmeans_4_clusters kmeans_5_clusters kmeans_6_clusters kmeans_7_clusters kmeans_8_clusters kmeans_9_clusters For each clustering, spaceranger produces cluster assignments for each spot. $ head -5 analysis/clustering/kmeans_3_clusters/clusters.csv
Barcode,Cluster
AAACATACAACGAA-1,2
AAACATACTACGCA-1,2
AAACCGTGTCTCGC-1,1
AAACGCACAACCAC-1,3


## Differential Expression

spaceranger also produces a table indicating which features are differentially expressed in each cluster relative to all other clusters. For each feature we compute three values per cluster:

• The mean UMI counts per spot of this feature in cluster i
• The log2 fold-change of this feature's expression in cluster i relative to all other clusters
• The p-value denoting significance of this feature's expression in cluster i relative to other clusters, adjusted to account for the number of hypotheses (i.e. number of features) being tested.

This is located in a different directory than the clustering results, but follows the same structure, with each clustering separated into its own directory.

$head -5 analysis/diffexp/kmeans_3_clusters/differential_expression.csv Feature ID,Feature Name,Cluster 1 Mean UMI Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean UMI Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean UMI Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value ENSG00000228327,RP11-206L10.2,0.0056858989363338264,2.6207666981569986,0.00052155805898912184,0.0,-0.75299726644507814,0.64066099091888962,0.00071455453829430329,-2.3725403666493312,0.0043023680184636837 ENSG00000237491,RP11-206L10.9,0.00012635330969630726,-0.31783275717885928,0.40959138980118809,0.0,3.8319652342760779,0.11986963938734894,0.0,0.56605908868652577,0.39910771338768203 ENSG00000177757,FAM87B,0.0,-2.9027952579000154,0.0,0.0,3.2470027335549219,0.19129034227967889,0.00071455453829430329,3.1510215894076818,0.0 ENSG00000225880,LINC00115,0.0003790599290889218,-5.71015017995762,8.4751637615375386e-28,0.20790015775229512,7.965820981010868,1.3374521290889345e-46,0.0017863863457357582,-2.2065304152104019,0.00059189960914085744 ## Spatial enrichment spaceranger produces a table of Moran's I values for each feature when specific conditions are met: • The tissue must cover at least 30 spots • The feature must be detected in at least 10 spots • The feature must have a total UMI count of at least 20 The Moran's I value can be anywhere between -1 (perfectly dispersed) to 1 (perfectly enriched) but in biological samples values significantly below 0 are unexpected. A p-value is provided, as well as an adjusted p-value which is corrected using the Benjamini-Hochberg method for multiple comparisons. $ head -5 outs/spatial_enrichment.csv

Feature ID,Feature Name,I,P value,Adjusted p value,Feature Counts in Spots Under Tissue,Median Normalized Average Counts,Barcodes Detected per Feature
ENSG00000108821,COL1A1,0.7537685359890554,0.0,0.0,233252,81.82037652661849,2436
ENSG00000164692,COL1A2,0.7465835682088932,0.0,0.0,178333,64.92945395285535,2437
ENSG00000168542,COL3A1,0.6872801298764767,0.0,0.0,77581,27.93455688536169,2405
ENSG00000113140,SPARC,0.6519299969668487,0.0,0.0,84647,30.836280395414637,2428

The spatial_enrichment.csv file is located in the outs directory of every spaceranger count run.

## Downstream analysis in R

Data structures produced by Visium can be analyzed and visualized in R. See Secondary Analysis in R for instructions.