Cell Ranger ATAC1.0, printed on 12/22/2024
The reference data consists of the reference genome sequence and its associated genome annotation, which includes gene and transcript coordinates, regulatory regions and transcription factor motifs. Both the genome sequence and annotation packaged with the software are derived from reputable, well-established consortia such as NCBI, GENCODE, Ensembl and ENCODE. The exact files in the reference directory have undergone minimal processing from the source files directly downloaded from each consortium (details below).
The provided single species reference packages are:
Please note that Cell Ranger ATAC 1.0.0 does not currently support the ability to build custom references. |
Note that for GRCh38, we do not use the decoy and alternate contigs in any analysis steps in the pipeline.
For mutli-species experiments, we provide the following reference packages that are combinations of some of the single species builds above. These are made by taking the union of reference sequences and annotations.
Note that the contigs names are prefixed by species build. Eg. chr1 from hg19 is labelled as hg19_chr1 inside the hg19_and_mm10 build.
All genome sequences are in the "fasta" directory, in which the raw fasta data is downloaded from NCBI. Genome index files are also created by samtool faidx
, bwa
and pysam
. Finally, a contig definition json file is created to be read by the pipeline to parse the contents of the reference package.
Gene annotations are downloaded from GENCODE and the version of "basic" annotation (instead of the "comprehensive") is used (links are at hg19, GRCh38 and mm10).
Regulatory regions are downloaded from the following sources: