HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Peak Annotations

How a peak is annotated to genes

Peaks are mapped to genes based on the genomic location of the nearby genes. The general principle is as follows:

The annotation procedure is as follows:

  1. If a peak overlaps with promoter region (-1000 bp, +100 bp) of any transcription start site (TSS), it is annotated as a promoter peak of the gene.
  2. If a peak is within 200 kb of the closest TSS, and if it is not a promoter peak of the gene of the closest TSS, it will be annotated as a distal peak of that gene.
  3. If a peak overlaps the body of a transcript, and it is not a promoter nor a distal peak of the gene, it will be annotated as a distal peak of that gene with distance set as zero.
  4. If a peak has not been mapped to any gene at the step, it will be annotated as an intergenic peak without a gene symbol assigned.

Annotation output file

The output file of peak annotation is peak_annotation.tsv. It has the following format:

Column NumberNameDescription
1chromContig that contains the peak
2startPeak start location
3endPeak end location
4geneGene symbol based on the gene annotation in the reference.
5distanceDistance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene.
6peak_typeCan be "promoter", "distal" or "intergenic".

Below is an example of a subsection of a peak_annotation.tsv. Each row represents one annotation assigned to one peak. Note that the same peak can be annotated with multiple genes and these entries appear on successive lines. This happens when a peak is proximal to multiple genes and Cell Ranger ATAC does not have a way to disambiguate them. For example, the peak chr14:77786487-77786973 is annotated as being a candidate promoter peak for GSTZ1 or a candidate distal regulator of POMT2.

chrom   start   end     gene    distance        peak_type
...
chr14	77769877	77770568	POMT2	16659	distal
chr14	77781976	77782953	POMT2	4274	distal
chr14	77781976	77782953	GSTZ1	-4274	distal
chr14	77786487	77786973	POMT2	254	distal
chr14	77786487	77786973	GSTZ1	-254	promoter
chr14	77787130	77787963	POMT2	0	promoter
chr14	77787130	77787963	GSTZ1	0	promoter
chr14	77843033	77843952	TMED8	0	promoter
...