HOME  ›   visualization
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Identifying Significant Features

NOTE: This section presumes you've created cell type clusters in the Identifying Cell Types tutorial section. You can follow along roughly by opening the ATAC Tutorial dataset, and choosing the K-Medoids (LSA) Cluster, with K=4.

Finding T Cell Subgroups

Let's return to the barcode plot, and look at the region of cells with T cell markers. Dimensionality reduction via LSA and subsequent clustering separates the T cells into a few separate groups, as seen on the t-SNE plot. What might be driving differences between those subgroups? We can use the significant features tool to find out.

First, let's create a second T cell group by using the lasso tool to highlight the rightmost cluster of T cells and label it "T Cells 2", as shown below:

To get a hint of what is distinguishing these clusters, we can use the Significant Features tool, located below the active cluster list. With this tool, you can compute distinguishing motifs, individual peaks, or promoter sums between currently selected clusters. The tool computes significant features by one of two ways, selectable via the Significant Feature Comparison selector:

The Feature Type selector allows you to select between available feature types. You can only look at one feature type at a time, as the sSEQ significant feature calculation relies on a common denominator of feature counts; there are different total sums across the dataset for peaks, promoter sums, and motifs.

Let's start by doing a sanity check and finding distinguishing promoter sums for each cell type. First, make the Feature Table visible by clicking on the list icon to the left of the bottom panel. Next, with the Cell Types category visible in Categories, mode, choose Globally Distinguishing from the comparison selector, and Promoter Sum as the Feature Type. Press the calculator icon, and wait for Loupe Cell Browser to compute significantly enriched promoter sums. This operation should take around 30 seconds, depending on your machine's performance. This calculation will take longer for datasets with more cells or higher depth, and longer when finding significant peaks.

Once the calculation completes, you see the most significantly enriched promoter sums for the B cell group, with accompanying log2 fold change against the other cells, sorted by p-value:

By default, the most comparatively enriched motifs, peaks or promoter sums are calculated. However, that is configurable in the Feature Table Options menu, which is accessible on the feature table's menu bar.

Not surprisingly, the most significantly enriched promoter sum is MS4A1 (CD20), which is used to identify the B cell cluster. Click on the cell type column headers to see the feature lists for the other cell types. Once a cell type is selected, clicking on the cell header again sorts the features by log2 fold change.

Next, let's explore the differences between the two T cell clusters. In the Categories panel, uncheck the B Cells and Monocytes clusters, and then select Locally Distinguishing from the comparison selector. Let's also choose Motif as the desired Feature Type, and compute the significant motifs.

After computation, the feature table now shows the motifs that are most differentially accessible between the two T cell groups. To verify the differences in accessibility, you can highlight the significant motifs in the barcode plot. First, click the Split View button at the bottom of the toolbar, and select Current Category (Cell Types). Split View segments the cells in the barcode plot into distinct regions by cluster, even if the cells overlap in the parent t-SNE plot. Next, click on BATF::JUN in the feature table. Clicking on a feature in the table allows you to add it to a feature list, copy the name to the clipboard, or to view the accessibility of the feature in the barcode plot. Click Set as Active Feature to see the BATF::JUN motif z-score in the barcode plot:

The cells in the T Cell 2 group are bluer, indicating that BATF-associated peaks are less accessible on average in that group than in the T Cell group. BATF accessibility has been shown to increase with cell differentiation and age^1^, so this may indicate that the T Cell 2 group is comprised of younger, or more naive T cells. It should be noted that the T Cell 2 group contains both cells with accessible CD8A and CD4 promoter regions, indicating a more diverse mix than could be explained by neat separation between helper and cytotoxic T cells.

Finally, let's look for significant peaks between the two T cell clusters. Return to the Categories mode, and change the Feature Type selector to Peaks. Press the Calculator icon, and wait for a while for the computation to finish.

Let's add a few significant peaks in the primary T cell cluster to a feature list. Click on the first five significantly enriched peaks, one at a time, and add them to a new T Cell 1 Drivers list. To create the new list, type T Cell 1 Drivers in the Add to Feature List input field, and select the create new list option. With the list selected, click on the plus icon to add the features to the list.

We can now learn more about these individual peaks. With the T Cell 1 Drivers list selected in Accessibility mode, hover over a peak, and click on the Peak Viewer icon that appears next to the feature. Let's select the first peak in the list, "chr1:159046026-159047751":

This highlights the peak in the peak viewer. Zooming out in the peak viewer shows additional context. You can see from the gene annotation track that the peak is located right at the transcription start site for the AIM2 gene.

In this manner, you should be able to start teasing out additional information about the different cell groups in your data. Once you've analyzed your dataset, saved lists of significant features, and identified cell types, you can share your findings in a variety of ways with collaborators.
Let's move onto Sharing to find out how.

[1] https://www.ncbi.nlm.nih.gov/pubmed/28439570