10x Genomics
Chromium Single Cell ATAC

Cell Ranger ATAC2.1, printed on 03/29/2025

Creating a Reference Package with cellranger-atac mkref

Cell Ranger ATAC provides pre-built human (GRCh38) and mouse (mm10) reference packages for use with cellranger-atac count. To create and use a custom reference package, Cell Ranger ATAC requires a reference genome sequence (FASTA file) and gene annotations (GTF file). Optionally transcription factor motifs can be specified in JASPAR format.

Backwards Compatibility. Cell Ranger ATAC 2.1 can only be run on references generated by cellranger-atac mkref version 2.1 or 2.0. Cell Ranger ATAC 2.0 can only be run on references that are constructed using cellranger-atac mkref version 2.0.

Requirements for reference construction

Cell Ranger ATAC supports the use of customer-generated references under the following conditions:

The FASTA and GTF files must contain the same contig names. This can be ensured by downloading them from the same source.
All genes in the GTF must have annotations with feature type 'exon' in column 3.
The GTF must have annotations with feature type 'transcript' for each transcript feature.
The GTF must have gene feature rows for all chromosomes. Removing gene feature rows results in empty gene tracks on Loupe Browser.
Due to limitations of the BAM index format a contig in the reference FASTA file cannot exceed 536.8 Mb (2^29 bases). If a contig exceeds that size you will have to split it into smaller contigs and make corresponding modifications to the GTF file.
Prior versions on Cell Ranger ATAC imposed restrictions on the contig count for references. All such restrictions have been removed.
Earlier versions of Cell Ranger ATAC (v2.0 and earlier) required non-overlapping gene annotations in the GTF. This restriction was removed in Cell Ranger ATAC v2.1.
For single-species reference construction, contig names in the FASTA and GTF files cannot include underscores (_). The aggr pipeline reads characters after an _ as a species identifier. If that is not the case, an aggr run error occurs.

Note that reference packages generated using cellranger-arc mkref are compatible with Cell Ranger ATAC pipelines.

Frequently asked questions

How do I create a reference package for use with Cell Ranger ATAC?
Can I use a Cell Ranger ARC reference with Cell Ranger ATAC?
How do I add a gene to an existing reference package?
How were the pre-built human (GRCh38) and mouse (mm10) reference packages created?

Making a reference package

We outline the steps to create a reference package starting from a reference genome and a set of gene annotations.

1. Download FASTA and GTF

FASTA and GTF files can be downloaded from sites like Ensembl and UCSC. The downloaded files are typically compressed. They must be uncompressed in order to process them in subsequent steps. The most comprehensive genome sequence and annotations are recommended:

For the genome sequence, include all major chromosomes, unplaced and unlocalized scaffolds, but do not include patches and alternative haplotypes.
- In Ensembl, the recommended genome file to download is annotated as "primary assembly."
- In NCBI, it is "no alternative - analysis set."
For the GTF file, genes must be annotated with feature type 'exon' in column 3.

2. (Optional) Filter annotations

This step is optional. Any gene that is contained in the GTF file will end up in the final count matrix and analysis. When several low confidence gene annotations are present the peaks are less interpretable. For example, if a GTF contains a low-confidence gene annotation that overlaps with a high-confidence protein coding gene then a nearby peak cannot automatically be interpreted as a promoter. GTF files downloaded from sites like ENSEMBL and UCSC often contain transcripts and genes which need to be filtered from your final annotation. Some examples of filters may include:

Restricting to one or more classes of genes: GTF files often contain a field like gene_biotype or gene_type labeling a gene class as protein-coding or lincRNA etc.
Removing genes from the pseudo-autosomal region
Removing low-confidence transcripts

See the filters used for the pre-built GRCh38 and mm10 references.

3. (Optional) Add transcription factor motifs

This step is optional. When a motifs file in JASPAR format is supplied the pipeline generates additional transcription factor analyses as described here. If these analyses are of interest, you can download transcription factor motif position-weight matrices in JASPAR format, for example, from JASPAR 2022. The JASPAR format specifies a motif name using a FASTA-style header line followed by the position-weight matrix. Here is an example:

>Arnt_MA0004.1
A  [     4     19      0      0      0      0 ]
C  [    16      0     20      0      0      0 ]
G  [     0      1      0     20      0     20 ]
T  [     0      0      0      0     20      0 ]

The pre-built GRCh38 and mm10 references utilize JASPAR vertebrate, non-redundant motifs that can be downloaded from JASPAR 2022. Note: the motif headers are modified such that >MOTIF_ID\tMOTIF_NAME is turned into >MOTIF_NAME_MOTIF_ID. This modification allows for better readability of the motif analysis results.

4. Create a configuration file with reference information

cellranger-atac mkref takes as input a configuration file that bundles various inputs to the tool. We explain how to construct a configuration file using the example of GRCh38:

{
    organism: "human"
    genome: ["GRCh38"]
    input_fasta: ["/path/to/GRCh38/assembly.fa"]
    input_gtf: ["/path/to/gencode/annotation.gtf"]
    non_nuclear_contigs: ["chrM"]
    input_motifs: "/path/to/jaspar/motifs.pfm"
}

Each line consists of a key: value. Note that some fields are plain strings enclosed by double quotes "", other fields are filesystem paths enclosed by double quotes "". Finally, some parameters are lists of strings/paths enclosed by square brackets []. The individual parameter fields are described below:

Parameter	Function
`organism`	Optional; string. Name of the organism. This is displayed in the web summary but is otherwise not used in the analysis.
`genome`	Required; list of strings. Name(s) of the genome(s) that comprise the organism. Note: Cell Ranger ATAC only supports single-species references so this list should be of length 1. The reference package is constructed in the current working directory where the directory name is the name of the genome. In the example above the reference package would be constructed in `$(pwd)/GRCh38`.
`input_fasta`	Required; list of paths. Path(s) to the assembly FASTA file(s) for each genome in uncompressed FASTA format. Note: Cell Ranger ATAC only supports single-species references so this list should be of length 1.
`input_gtf`	Required; list of paths. Path(s) to the gene annotation GTF file(s) for each genome in GTF format. Note: Cell Ranger ATAC only supports single-species references so this list should be of length 1.
`non_nuclear_contigs`	Optional; list of strings. Name(s) of contig(s) that do not have any chromatin structure, for example, mitochondria or plastids. For the GRCh38 assembly this would be `["chrM"]`. These contigs are excluded from peak calling since the entire contig will be "open" due to a lack of chromatin structure.
`input_motifs`	Optional; path. Path to file containing transcription factor motifs in JASPAR format (see above). Note: the any spaces in the header name are converted to a single underscore. For ease of use in Loupe Browser, we recommend using a header that begins with a human-readable name rather than a motif identifier.

5. Run cellranger-atac mkref

To create the reference package, use the cellranger-atac mkref command, passing it one or more matching sets of FASTA and GTF files. This utility copies your FASTA and GTF, indexes these in several formats, and outputs a folder with the name you pass to genome in the config file.

Argument	Description
`--config`	Required; path. Path to configuration file containing additional information about the reference. See above for more details.
`--ref-version`	Optional; string. Reference version string to include with reference.
`--help or -h`	Optional. Show list of all arguments and options.
`--version`	Optional. Show version.

To build a reference, run mkref as illustrated below:

$ cellranger-atac mkref --config=/home/jdoe/10x_references/GRCh38.config
 
>>> Creating reference for GRCh38 <<<

Creating new reference folder at /home/jdoe/10x_references/GRCh38
...done

Writing genome FASTA file into reference folder...
...done

Indexing genome FASTA file...
...done

Writing genes GTF file into reference folder...
...done

Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
...done

Computing hash of genes GTF file...
...done

...done

Generating bwa index (may take over an hour for a 3Gb genome)...
[bwa_index] Pack FASTA... 0.13 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 4.05 seconds elapse.
[bwa_index] Update BWT... 0.06 sec
[bwa_index] Pack forward-only FASTA... 0.09 sec
[bwa_index] Construct SA from BWT and Occ... 1.80 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /home/jdoe/10x_references/GRCh38/fasta/genome.fa
[main] Real time: 6.206 sec; CPU: 6.137 sec
done

Writing TSS and transcripts bed file...
...done

Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
...done

Computing hash of genes GTF file...
...done

...done

\>>> Reference successfully created at GRCh38 <<<

The created reference package will contain the following files:

GRCh38
├── fasta
│   ├── genome.fa
│   ├── genome.fa.amb
│   ├── genome.fa.ann
│   ├── genome.fa.bwt
│   ├── genome.fa.fai
│   ├── genome.fa.pac
│   └── genome.fa.sa
├── genes
│   └── genes.gtf.gz
├── reference.json
├── regions
    ├── motifs.pfm      # present if motifs file was supplied
    ├── transcripts.bed
    └── tss.bed

System Requirements

Indexing a typical human 3Gb FASTA file often takes up to 8 core hours and requires 32 GB of memory.

Can I use a Cell Ranger ARC reference with Cell Ranger ATAC?

The only difference between a reference constructed using cellranger-atac mkref and cellranger-arc mkref is that Cell Ranger ARC references contain a genome index for the splice-aware STAR aligner, which is used to compute alignments of gene expression reads. An ARC reference can be used with Cell Ranger ATAC. But the reverse is not true, i.e., an ATAC reference cannot be used with Cell Ranger ARC.

Adding one or more genes to your reference

Provided that you follow the format described above, it is fairly simple to add custom gene definitions to an existing reference package constructed using cellranger-atac mkref. If we assume that the reference package is located in REF_DIR, the FASTA sequence records are stored in REF_DIR/fasta/genome.fa and the gene annotations are compressed and stored in REF_DIR/genes/genes.gtf.gz. First, create a new FASTA reference file by adding any additional contigs for the new genes to REF_DIR/fasta/genome.fa if needed. Next create a new GTF file by uncompressing REF_DIR/genes/genes.gtf.gz and then appending records for each new gene. Note that the new genes must have GTF features of type 'exon' for each exon and 'transcript' for each transcript.

The GTF file format is essentially a list of records, one per line, each comprising nine tab-delimited non-empty fields.

Column	Name	Description
1	Chromosome	Must refer to a chromosome/contig in the genome FASTA.
2	Source	Unused.
3	Feature	`cellranger-atac count` requires the presence of an 'exon' row for each exon and a 'transcript' row for each transcript.
4	Start	Start position on the reference (1-based inclusive).
5	End	End position on the reference (1-based inclusive).
6	Score	Unused.
7	Strand	Strandedness of this feature on the reference: `+` or `-`.
8	Frame	Unused.
9	Attributes	A semicolon-delimited list of key-value pairs of the form `key "value"`. The attribute keys `transcript_id`, `gene_name` and `gene_id` are required. Only ASCII characters `a-z`, `A-Z`, and `_` are supported as attribute keys. Genes with values outside this limitation (e.g., numbers) will not appear in Loupe Browser.

After adding the necessary records to your FASTA file and the additional lines to your GTF file, run cellranger-atac mkref as described above.

Cell Ranger ATAC

Loupe

10x Genomics
Chromium Single Cell ATAC

Creating a Reference Package with cellranger-atac mkref

Requirements for reference construction

Frequently asked questions

Making a reference package

1. Download FASTA and GTF

2. (Optional) Filter annotations

3. (Optional) Add transcription factor motifs

4. Create a configuration file with reference information

5. Run cellranger-atac mkref

System Requirements

Can I use a Cell Ranger ARC reference with Cell Ranger ATAC?

Adding one or more genes to your reference

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger ATAC

Loupe

10x GenomicsChromium Single Cell ATAC

Creating a Reference Package with cellranger-atac mkref

Requirements for reference construction

Frequently asked questions

Making a reference package

1. Download FASTA and GTF

2. (Optional) Filter annotations

3. (Optional) Add transcription factor motifs

4. Create a configuration file with reference information

5. Run cellranger-atac mkref

System Requirements

Can I use a Cell Ranger ARC reference with Cell Ranger ATAC?

Adding one or more genes to your reference

10x Genomics
Chromium Single Cell ATAC