HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Run Analysis

The count pipeline outputs several CSV files which contain automated secondary analysis results. A subset of these results are used to render the Cell Clustering view in the run summary.

Dimensionality Reduction

Before clustering the cells, Latent Semantic Analysis (LSA) is run on the normalized filtered peak-barcode matrix to reduce the number of feature (peak) dimensions. This produces a projection of each cell onto the first N components (default N=15). One may alternatively choose Principal Component Analysis (PCA) or Probabilistic Latent Semantic Analysis (PLSA) to perform dimensionality reduction in the pipeline. All of these methods provide the following basic CSV output files.

$ cd /home/jdoe/runs/sample345/outs
$ head -2 analysis/lsa/15_components/projection.csv
Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10,PC-11,PC-12,PC-13,PC-14,PC-15
AAATGAGCAATCAGGG-1,-2.0256188855237585,-20.464971914743963,2.4066208658862194,-0.9789882112497361,-0.09345960806751374,-8.483300343102174,-5.672765504454421,18.312955842984056,5.6927438340737195,-3.0378744705134992,0.3959335790734238,-4.93326991505897,9.485264727952154,0.2107363858043646,0.948135821430962

This also produces a components matrix which indicates how much each peak contributed to each component.

$ head -2 analysis/lsa/15_components/components.csv
PC,chr1:9695143-9697582,chr1:9698212-9701041,...
1,-0.5482991923678618,-0.6374211593177428,...

This also produces the proportion of total variance explained by each component. When choosing the number of components that are significant, it is useful to look at the plot of variance explained as a function of component rank - when the numbers start to flatten out, subsequent components are unlikely to represent meaningful variation in the data.

$ head -5 analysis/lsa/15_components/variance.csv
PC,Proportion.Variance.Explained
1,0.8452609977911579
2,0.032765042936590785
3,0.026127180307558735
4,0.01667142686188944

We also compute the normalized dispersion of each peak, after binning peaks by their mean expression across the dataset. This provides a useful measure of variability of each peak.

$ head -5 analysis/lsa/15_components/dispersion.csv
Feature,Normalized.Dispersion
chr1:9695143-9697582,0.02029960904777695
chr1:9698212-9701041,0.10379770925583033
chr1:9825253-9827762,-1.0
chr1:9829746-9830116,25.528012093307737

Visualization

After running PCA, t-distributed Stochastic Neighbor Embedding (t-SNE) is run to visualize cells in a 2-D space.

$ head -5 analysis/tsne/2_components/projection.csv
Barcode,TSNE-1,TSNE-2
AAATGAGCAATCAGGG-1,1.552159628302055,4.434829693735686
AACAAAGCACCTATTT-1,3.2188609791527667,0.03569781940043248
AACCTTTCAATGATGA-1,-3.8319704788291475,-1.092848944953291
AACTTGGCATGGCCGT-1,-4.226692514189564,0.3351938808086092

Clustering

Clustering is then run to group cells together that have similar accessibility profiles, based on their projection into lower dimensional space. Graph-based clustering (under graphclust) is run once as it does not require prespecification of the number of clusters. For PCA, K-means (under kmeans) is run for many values of K=2,...,N where K corresponds to the number of clusters. For LSA or PLSA, spherical k-means (under kmeans) is run over the same range of K. By default N=10. The corresponding results for each K is separated into its own directory.

$ ls analysis/clustering
graphclust            kmeans_2_clusters  kmeans_4_clusters
kmeans_6_clusters   kmeans_8_clusters  kmeans_10_clusters
kmeans_3_clusters   kmeans_5_clusters  kmeans_7_clusters
kmeans_9_clusters

For each clustering, cellranger-atac produces cluster assignments for each cell.

$ head -5 analysis/clustering/kmeans_3_clusters/clusters.csv
Barcode,Cluster
AAATGAGCAATCAGGG-1,2
AACAAAGCACCTATTT-1,1
AACCTTTCAATGATGA-1,3
AACTTGGCATGGCCGT-1,3

Differential Enrichment Analysis

Prior to differential analysis, cellranger-atac produces a peak-barcode matrix and a transcription factor-barcode matrix of counts as described in Matrices. cellranger-atac then produces a table indicating which peaks and transcription factor motifs are differentially accessible in each cluster relative to all other clusters, as per the algorithms described here. For each feature, whether it is peak or transcription factor motif, we compute three values per cluster:

This is located in a different directory than the clustering results, but follows the same structure, with each clustering separated into its own directory.

$ head -5 analysis/enrichment/kmeans_3_clusters/differential_expression.csv
Feature ID,Feature Name,Cluster 1 Mean Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value
chr1:9695129-9697582,chr1:9695129-9697582,0.014098403818774368,-5.823451487250574,2.2659671842098193e-06,4.185745651762137e-09,-1.3874516676069444,0.5918812904596457,1.9512762483589925,7.238430090771634,5.00258305609651e-09
chr1:9698210-9701041,chr1:9698210-9701041,0.013761153212430422,-6.1502095503083165,7.855686702156565e-07,0.046489553517204636,-3.0232327143356246,0.01647646310191049,2.2844378973176838,6.5025499776936115,4.703658999567952e-13
.
.
.
AHR_HUMAN.H11MO.0.B,AHR_HUMAN.H11MO.0.B,1.5229979744677225e-09,-0.558490289359965,1.0,1.5229979744575502e-09,1.41990325445066,1.0,1.5229979744838465e-09,2.5
360529002402097,1.0
AIRE_HUMAN.H11MO.0.C,AIRE_HUMAN.H11MO.0.C,382.4895824324451,-1.366896997726535,0.007214824200990991,4098.191143669588,0.031632664734601475,1.0,124.229272550
17468,2.136369782757689,0.0015585067057439586

Notice that the table for any specific clustering includes differential analysis results for both peaks and transcription factor motifs.