HOME  ›   visualization

# Analyzing Public PBMC scRNA-Seq Data Using Loupe Browser

This tutorial demonstrates the basic workflow for Single Cell Gene Expression data analysis with Loupe Browser, including quality control (QC), clustering, cell annotation, cell abundance analysis, and differential gene expression. The tutorial uses 5’ Single Cell Gene Expression data, however, all the steps also apply to 3' Single Cell Gene Expression data.

## Data overview

The 5' Single Cell Gene Expression datasets (v2) used in this tutorial were obtained from the Asian Immune Diversity Atlas (AIDA) project. These eight datasets represent four samples (JP_RIK_H040, JP_RIK_H041, KR_SGI_H128, and KR_SGI_H131) from two healthy Japanese donors and two healthy South Korean donors (each sample has two libraries):

1. JP_RIK_B003_L001_5GEX_H040 (Japan_RIKEN_Batch3_Library1_HealthyDonor040)
2. JP_RIK_B003_L002_5GEX_H040 (Japan_RIKEN_Batch3_Library2_HealthyDonor040)
3. JP_RIK_B003_L001_5GEX_H041 (Japan_RIKEN_Batch3_Library1_HealthyDonor041)
4. JP_RIK_B003_L002_5GEX_H041 (Japan_RIKEN_Batch3_Library2_HealthyDonor041)
5. KR_SGI_B010_L001_5GEX_H128 (S.Korea_SGI_Batch10_Library1_HealthyDonor128)
6. KR_SGI_B010_L002_5GEX_H128 (S.Korea_SGI_Batch10_Library2_HealthyDonor128)
7. KR_SGI_B010_L001_5GEX_H131 (S.Korea_SGI_Batch10_Library1_HealthyDonor131)
8. KR_SGI_B010_L002_5GEX_H131 (S.Korea_SGI_Batch10_Library2_HealthyDonor131)


The raw FASTQ files were analyzed with the cellranger count pipeline, followed by cellranger aggr. The steps in this tutorial are performed in Loupe Browser, after loading the cloupe.cloupe output file from cellranger aggr (see appendix for the cellranger commands).

## Single cell gene expression QC

First, use the filtering and reclustering workflow in Loupe Browser to perform quality control (QC) based on three metrics: 1) UMI counts, 2) the number of detected genes, and 3) the percentage of mitochondrial genes.

After opening the tutorial .cloupe file, click the Recluster button in the Categories Mode panel:

A new window will open with the Recluster module.

1. Click Next to set thresholds by UMI counts.

### QC based on UMI counts

The main goal of this step is to remove outliers, which could potentially be low-quality cells (very low UMI counts) or doublets (very high UMI counts). There are no universal criteria to filter UMI counts since different sample and cell types have distinct UMI levels (e.g., tumor samples may have higher RNA content compared to PBMCs; in PBMC samples, some disease states may result in greater abundance of higher RNA content plasma B cells or lower RNA content platelets (see Yamawaki et al. 2021)). Selection of the UMI range should be based on biological knowledge about the sample and may be an iterative process. UMI counts can be visualized on the linear or log2 scale.

We will select the barcodes containing UMI counts between 210 and 214 (most of the cells are within this range).

1. Enter 10 and 14 as the Min and Max values on the Log2 scale. In total, 147 (1.6%) of barcodes are removed. The affected clusters in this QC step are shown on the right panel (e.g., two barcodes were removed from Cluster 4).
2. Click Next to set thresholds by features.

### QC based on the number of detected genes

The main purpose of this step is to remove potential low-quality cells and multiplets. Similar to UMI counts, there are no universal QC rules on the number of detected genes. Here, select the barcodes with gene numbers between 29.5 and 213 (most of the cells are within this range).

1. Enter 9.5 and 13 as the Min and Max values on the Log2 scale. In total, an additional 0.1% (1.7% - 1.6% from the UMI count filter) of barcodes are removed. The affected clusters are updated (e.g., now four barcodes have been removed from Cluster 4).
2. Click Next to set thresholds by mitochondrial UMI counts.

### QC based on the percentage of mitochondrial gene

A high level of mitochondrial gene expression could indicate dying cells (see this knowledge base article, Ilicic et al. 2016). QC filtering based on mitochondrial gene levels could remove these low-quality cells. To filter barcodes based on mitochondrial gene levels, the first step is to provide a list of mitochondrial genes. The gene lists for human and mouse are available in the pre-built references drop-down menu. Users can also upload custom mitochondrial gene set lists for different species or other gene sets of interest, such as the ribosomal genes.

1. Since the datasets here were generated using human samples, select the Human (GRCh38/hg19) reference. Then click Submit.

Similar to the QC on UMI counts and the number of detected genes, there are no universal thresholds for the percentage of mitochondrial genes. If a review of available scientific literature suggests the study sample is known to have high mitochondrial gene expression (e.g., tumor biopsies), you may want to select less stringent filtering values.

1. Enter 0 and 15% for the percentage of mitochondrial genes with barcodes. In total, 2% of barcodes have now been removed from the dataset.
2. Click Next to perform the reclustering analysis.

### Cell reclustering and embedding after QC

The last step is to perform reclustering and projection, selecting both t-SNE and UMAP embeddings. Loupe Browser also provides features to Adjust reanalyze parameters (for advanced users) for reclustering and projection. This tutorial uses default parameters.

1. Enter a name for the recluster analysis (here, “postQC”).
2. Click the Recluster button to perform the analysis.

It takes several minutes to complete the reclustering. When complete, click Done to return to the Loupe Browser window.

There are now new t-SNE and UMAP plots ("postQC" Category label) generated from the reclustering results for our post-QC cells.

## Cell annotation

After obtaining clusters based on gene expression similarity, identify the corresponding cell type for each cluster. This cell annotation process can be challenging. For more information, see this cell type annotation tutorial in Loupe Browser and this Analysis Guide that describes several additional web resources for cell annotation.

This tutorial uses marker-gene-based manual annotation to identify the major peripheral blood mononuclear cell (PBMC) cell types (the Split on Category view and Lasso Selection tool can also be used to annotate cell clusters). The Gene/Feature Expression top-right panel of Loupe Browser allows researchers to profile gene expression across cells. The Feature Plot function enables visualization of the expression levels for key PBMC marker genes across all cells in the dataset. Finally, the gene expression levels in different clusters can be visualized with Loupe Browser's violin plot function.

• In Categories, set the cluster to "postQC"
• Switch to Gene/Feature Expression
• Enter the PBMC marker gene into the Search for a feature... box and set the scale to LogNorm (refer to this article for help choosing which scale value to use in Loupe Browser) to visualize the gene expression levels for each marker gene.
• Set the feature plot to "postQC UMAP"
• Select the violin plot option in the Data panel selector

Based on the expression levels of key PBMC marker genes across clusters, these clusters can be annotated into major immune cell types: T cells, NK cells, B cells, and monocytes. In particular for monocytes, based on the relative expression levels between CD14 and FCGR3A (CD16), two monocyte subtypes, i.e. CD14 monocyte (classical) and CD16 monocyte (non-classical), can be further characterized. Click the tabs below for each cell type:

Here, observe that CD3D, a marker for T cells, is expressed in Clusters 2-5, 7, 9, 10, and 12-14 in the violin plots. We will use these marker genes to annotate clusters by cell type.

The marker gene for NK cells, NCAM1, is expressed in Cluster 1. While less NK cell-specific, NKG7 and FCGR3A can also be used to support NK cell annotation as they characterize NK cell cytotoxic functions (the violin plot gene expression patterns will shift as you select each gene from the Active Feature List).

The marker gene for B cells (MS4A1) is expressed in Cluster 8.

The marker genes for CD14 monocytes (CD14high, FCGR3Alow) are expressed in Clusters 6 and 11.

The marker genes for CD16 monocytes (CD14low, FCGR3Ahigh) are expressed in the remaining cluster, Cluster 15.

The Filters function on the top-right panel of Loupe Browser can now be utilized to annotate each cluster. Begin by annotating Clusters 2-5, 7, 9, 10, and 12-14 as T cells. Note that these different clusters in T cells could represent different T cell subtypes (not discussed in this tutorial, for cell subtype annotation, see this tutorial).

1. Switch to Filters
2. Click Create new rule
3. Select IN[cluster name]

1. Scroll down in the Cluster menu to "postQC Cluster 2" (make sure it is labeled postQC, as the other cluster types in this dataset are named similarly). Click Create new rule and add the remaining clusters for T cell annotation.

1. Switch the AND/OR toggle to OR
2. Click Assign 5946 barcodes

1. Enter "cell_annotation" for the Category, "T cells" for the Cluster, and click Save. (Switch to violin plots to view the expression levels of these clusters for CD3D along with this T cell type annotation)

The T cell annotation is now added to the Categories panel under "cell_annotation".

Assign the remaining clusters using the Filters option (delete previous cluster rules using the trash bin icon) - click the tabs below for each cell type:

Annotate Cluster 1 as NK cells:

Annotate Cluster 8 as B cells:

Based on the expression levels of CD14, the unannotated clusters 6 and 11 are identified as CD14 monocytes:

Now that we have annotated several clusters, use the Split on Category option to display the UMAP projection split by cell annotation. Based on the expression levels of CD16, the unannotated cluster 15 is CD16 monocytes:

Now all the major PBMC cell types are annotated:

## Export cell annotations and import metadata

Based on the marker gene expression profiles, clusters are annotated in major PBMC cell types. The cell annotation results can be exported as a CSV file with the annotation for each barcode (export with unlabeled or only labeled barcodes).

Loupe Browser supports the import of metadata as new categories, such as donor and country in this case study. Accordingly, differentially expressed gene analysis can be performed based on the new metadata. Below, two methods are illustrated to import new metadata either manually from the exported SampleID category or by using Filters to annotate samples.

Here, since the information of donor and country is based on sampleID:

1. Click the three vertical dots
2. Select Export SampleID to export all barcodes in the SampleID category

Next, based on sampleID, add the information of donor and country for each barcode in a text editor such as Excel:

• For example, the samples JP_RIK_B003_L001_5GEX_H040 and JP_RIK_B003_L002_5GEX_H040 correspond to donor JP_RIK_H040. Add the remaining donor labels: JK_RIK_H041, KR_SGI_H128, KR_SGI_H131.
• Add the country label as JP or KR.

Save the table as a comma-separated value (CSV) file (metadata.csv) (left: Excel, right: text editor view of the CSV format):

Import the metadata file into Loupe Browser as a new category (this file is also provided for download in Data overview):

The split view by sampleID enables another way to select samples for category assignment. In the Filters panel, cells in JP_RIK_B001_L001_5GEX_H040 and JP_RIK_B001_L002_5GEX_H040 are assigned to cluster JP_RIK_H040 in the donor category. A similar strategy is applied to the other three donors, as illustrated in this video:

Similarly, the Filters function is utilized to group cells into different countries, as illustrated in this video:

After either manually adding the donor/country categories or using the Filters function, two new categories are generated:

## Cell abundance analysis

We will now compare the cell abundance of T cells across the four donors. The split view by donors shows their cell profiles in the UMAP plot (“donor” should be selected in “Categories”). From the donor category in the right panel, we can obtain the total cell number for each donor (e.g., 2,162 cells for donor JP_RIK_H040).

For the next step, the total T cell number will be counted for each donor in order to calculate the relative T cell ratio. The Filters function can be used to get the T cell number for each donor. Create a filter rule to select T cells for each donor as illustrated below (in this case, the AND/OR toggle should be set to AND). After setting the rules, T cell number for each donor will be shown at the bottom (e.g., 1,394 barcodes assigned to T cells for donor JP_RIK_H040).

Now, T cell abundance for the four donors can be calculated as below (the numerator comes from the filter for T cells by donor, the denominator from total cells per donor):

JP_RIK_H040 – 1394/2162 = 0.64
JP_RIK_H041 – 1479/2239 = 0.66
KR_SGI_H128 – 1609/2333 = 0.69
KR_SGI_H131 – 1464/2142 = 0.68


## Analysis of differentially expressed genes

Loupe Browser provides the functionality to call differentially expressed genes (DEG) in globally and locally distinguishing modes. In the global mode, the DEG of each cluster vs. all the other clusters will be identified. In the local mode, the DEG will be called in a similar way to the global mode but only for the selected clusters. Here, DEG calling in T cells between the two South Korean donors, KR_SGI_H128 and KR_SGI_H131, will be performed to demonstrate this function in Loupe Browser.

1. Switch to Filters
2. Create a new rule and select T cells and KR_SGI_H128.
3. The AND/OR toggle should be set to AND.
4. Assign the barcodes by entering "T cell group" as Category and "KR_SGI_H128" as the Cluster, and Save.

Create a second filter rule for donor KR_SGI_H131:

Now call differentially expressed genes: 5) Switch to Categories 6) Now T cells have been assigned to “KR_SGI_H128” and “KR_SGI_H131” clusters of new Category “T cell groups”. 7) Select Locally Distinguishing 8) DEGs in T cells between these two clusters can be measured by clicking the calculator button.

After DEG calling, the results are displayed in the feature table. The DEG table can be exported as a CSV file. 9) Click the Feature Table Options button 10) Select All Significant Genes Per Cluster for Filter Features 11) Select All Gene Expression for CSV Export Count 12) Click the Export Table to CSV button

The DEG list can be used for downstream gene ontology and pathway analyses using third-party tools. The top pathways associated with the significantly up-regulated genes in KG_SGI_H128 (DAVID, log2 fold change>1 and adjusted p-value<0.05) include "Apoptosis" and "Graft-versus-host disease".

## Visualization of DEGs

From the above DEG calling, GADD45B is one of the up-regulated genes in T cells from KR_SGI_H128 compared to KR_SGI_H131. GADD45B is a member of a group of genes that respond to environmental stresses by mediating activation of the p38/JNK pathway.

The expression levels of GADD45B in T cells between these two donors can be visualized in a “T cell group” split view stratified by donors.

1. Enter GADD45B in the Search for a feature... box
2. Select the “LogNorm” scale for visualization, since this scale normalizes RNA contents in each cell and can be used to compare gene expression levels between two groups (see details in this article).
3. Select violin plot to visualize gene expression levels between these two groups. The violin plots can be saved using the Export button in the top-right corner of the violin plot box.
4. The UMAP plot can be saved as an image file using the camera button.

More analyses can be performed in Loupe Browser, such as sub-clustering of major cell types into cell subtypes. In addition, third-party tools developed in the single cell community are also very useful to address more advanced data analyses, such as RNA velocity (read more in this Analysis Guide about using third-party tools to perform RNA velocity analysis).

## Appendix: How was the aggregated dataset generated?

The datasets used in the tutorial are part of the Asian Immune Diversity Atlas (AIDA) project. Raw FASTQ files of eight datasets were downloaded from the Human Cell Atlas data portal. These eight datasets represent four samples from two healthy Japanese donors and two healthy South Korean donors (each sample has two libraries). Two Japanese samples were multiplexed in the same batch and then the multiplexed pool was split into two 10x Genomics reactions/libraries. The same strategy was applied to the two South Korean samples. After downloading the FASTQ files, Cell Ranger v7.0 was used to process the sequencing data. One example command line is shown below for the cellranger count pipeline (replace text in red with path to files):

cellranger count \
--id=KR_SGI_B010_L001_5GEX_H134 \
--fastqs=/PATH_TO_raw_data/ \
--sample=KR_SGI_B010_L001_5GEX_H134 \
--transcriptome=/PATH_TO_cellranger_reference/refdata-gex-GRCh38-2020-A/ \
--localcores=8 --localmem=64


After running cellranger count on each of the eight datasets (each dataset is from a different GEM well), the cellranger aggr pipeline was used to aggregate them:

cellranger aggr \
--id=aida_aggr \
--csv=/PATH_TO_aggregation_csv/sample.csv \
--localcores=16 \
--localmem=128


The aggregation CSV file, sample.csv, should contain the following:

sample_id,molecule_h5
JP_RIK_B003_L001_5GEX_H040,path/to/molecule_info.h5
JP_RIK_B003_L002_5GEX_H040,path/to/molecule_info.h5
JP_RIK_B003_L001_5GEX_H041,path/to/molecule_info.h5
JP_RIK_B003_L002_5GEX_H041,path/to/molecule_info.h5
KR_SGI_B010_L001_5GEX_H128,path/to/molecule_info.h5
KR_SGI_B010_L002_5GEX_H128,path/to/molecule_info.h5
KR_SGI_B010_L001_5GEX_H131,path/to/molecule_info.h5
KR_SGI_B010_L002_5GEX_H131,path/to/molecule_info.h5


The cloupe.cloupe in the aida_aggr/outs/count/ directory is the aggregated cloupe file imported into Loupe Browser used for this tutorial.