HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Cell Ranger ATAC Reference packages

Overview

The reference data consists of the reference genome sequence and its associated genome annotation, which includes gene and transcript coordinates, regulatory regions and transcription factor motifs. Both the genome sequence and annotation packaged with the software are derived from reputable, well-established consortia such as NCBI, GENCODE, Ensembl and ENCODE. The exact files in the reference directory have undergone minimal processing from the source files directly downloaded from each consortium (details below).

Versions

The provided single species reference packages are:

Note that for GRCh38, we do not use the decoy and alternate contigs in any analysis steps in the pipeline.

For mutli-species experiments, we provide the following reference packages that are combinations of some of the single species builds above. These are made by taking the union of reference sequences and annotations.

Note that the contigs names are prefixed by species build. Eg. chr1 from hg19 is labelled as hg19_chr1 inside the hg19_and_mm10 build.

Genome sequences

All genome sequences are in the "fasta" directory, in which the raw fasta data is downloaded from NCBI. Genome index files are also created by samtool faidx, bwa and pysam. Finally, a contig definition json file is created to be read by the pipeline to parse the contents of the reference package.

Gene annotation

Gene annotations are downloaded from GENCODE and the version of "basic" annotation (instead of the "comprehensive") is used (links are at hg19, GRCh38 and mm10).

Regulatory regions

Regulatory regions are downloaded from the following sources: