Cell Ranger ARC1.0, printed on 05/18/2022
Peaks are mapped to one or more genes based on genomic proximity. The general principle is as follows:
tag "basic". If a gene has no such transcripts then all transcripts are considered when computing the TSS.
The annotation procedure is as follows:
The output file of peak annotation is atac_peak_annotation.tsv. It has the following format:
|1||peak||Location of peak, denoted as "contig_start_end".|
|2||gene||Gene symbol based on the gene annotation in the reference.|
|3||distance||Distance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene.|
|4||peak_type||Can be "promoter", "distal" or "intergenic".|
peak gene distance peak_type chr1_144529748_144530140 PPIAL4B -165503 distal chr1_144533459_144534494 PPIAL4B -169214 distal chr1_144535921_144536510 PPIAL4B -171676 distal chr1_144593494_144593902 intergenic chr1_144594466_144594608 intergenic chr1_144907978_144908683 PDE4DIP 23465 distal chr1_144917975_144918388 PDE4DIP 13760 distal chr1_144930729_144932952 PDE4DIP 0 promoter chr1_144935233_144935903 PDE4DIP -2682 distal chr1_145021465_145021812 PDE4DIP 17959 distal chr1_145029934_145030407 PDE4DIP 9364 distal chr1_145039179_145040353 PDE4DIP 0 promoter chr1_145042909_145043074 PDE4DIP -2908 distal chr1_145058730_145059385 PDE4DIP 16497 distal chr1_145075570_145075775 PDE4DIP 107 distal chr1_145090390_145090497 SEC22B -5916 distal chr1_145096097_145096897 SEC22B 0 promoter chr1_145114341_145114882 SEC22B 17929 distal chr1_145129664_145130160 CH17-478G19.1 9453 distal chr1_145138857_145139783 CH17-478G19.1 0 promoter chr1_145208754_145210260 NOTCH2NL;RP11-458D21.5 0;0 promoter;promoter chr1_145253531_145253882 NOTCH2NL;RP11-458D21.5 28933;0 distal;distal chr1_145293232_145293711 NBPF10;RP11-458D21.5 0;0 promoter;distal chr1_145382220_145383308 HFE2 -14942 distal chr1_145395396_145399610 HFE2 0 promoter chr1_145421600_145421979 HFE2 8322 distal
The peak annotation file can be used for custom analysis, such as plotting peak-gene relationship or generating gene activity score from peaks. Here we provide some examples of loading the peak annotation file and converting it to various data structures.
library(tidyverse) library(Matrix) peak_annotation_file <- "/opt/sample345/outs/atac_peak_annotation.tsv" # direct loading of the original format df_peakanno <- readr::read_tsv(peak_annotation_file) # separate each row into a single peak-gene-type combination, i.e. split by ";" df_peakanno <- readr::read_tsv(peak_annotation_file) %>% tidyr::separate_rows(gene, distance, peak_type, sep = ';') # Convert to a sparse binary matrix of peak-gene mapping relationship # the order of the peaks is the same as the peak-barcode matrix in the pipeline output # the order of the genes is alphanumeric sparseMatrix_peakanno <- readr::read_tsv(peak_annotation_file) %>% dplyr::mutate(peak = factor(peak, levels = peak)) %>% tidyr::separate_rows(gene, distance, peak_type, sep = ';') %>% dplyr::filter(!is.na(gene)) %>% # can also add extra filter here using dplyr::filter() # such as restricting peaks to promoters only or within a certain distance to TSS dplyr::mutate(gene = factor(gene)) %>% dplyr::group_by(peak, gene) %>% dplyr::summarise(value = as.integer(n() > 0)) %>% stats::xtabs(value ~ peak + gene, data = ., sparse = T)