Software  ›   pipelines
If your question is not answered here, please email us at:  support@10xgenomics.com

10x Genomics
Chromium Single Cell Gene Expression

Reference Support

Cell Ranger provides pre-built hg19, mm10 and ercc92 reference packages for use with the pipeline. If you would like to use your genome FASTA or gene GTF annotations, Cell Ranger supports the use of customer-generated references.

Compatible Use Cases

Cell Ranger supports the use of customer-generated references for the following scenarios:

Making a Reference Package

There are 2 steps to construct a Cell Ranger-compatible reference.

1. Filter GTF

GTF files downloaded from sites like ENSEMBL and UCSC often contain many transcripts and genes which often need to be filtered from your final annotation. Often, it is helpful to filter genes based on their key-value pairs in the GTF attribute column. For example, to filter for only protein-coding genes, run the following command on your GTF.

$ cellranger mkgtf hg19-ensembl.gtf hg19-filtered-ensembl.gtf --attribute=gene_biotype:protein_coding

This will generate a filtered GTF file hg19-filtered-ensembl.gtf from the original unfiltered GTF file hg19-ensembl.gtf.

2. Index your FASTA and GTF

Single Species

To create a reference for only one species, run the cellranger mkref command on your FASTA and GTF files. Your FASTA and GTF files must meet the compatibility requirements above.

$ cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf
...
$ ls hg19
fasta/  genes/  pickle/  reference.json  star/

This utility copies your FASTA and GTF, indexes it in several formats, and outputs a folder named <genome>.

Multiple Species

To create a reference for multiple species, run the mkreference command with your FASTA and GTF files similar to the single species case above. However, the order of the --genome, --fasta and --genes options are important as the first --genome option listed corresponds to the first --fasta and --genes options listed.

$ cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf \
                   --genome=mm10 --fasta=mm10.fa --genes=mm10-filtered-ensembl.gtf
...
$ ls hg19_and_mm10
fasta/  genes/  pickle/  reference.json  star/

System Requirements

Indexing a typical human 3Gb FASTA file often takes up to 8 core hours and requires 32 GB of memory. We recommend you run the mkreference command with --nthreads equal to the number of cores available on your system.

You can also specify the amount of memory (in GB) cellranger should use during alignment via STAR. The default is set to 16 GB. Please note the amount of memory your reference uses during alignment must be greater than the number of gigabases in the input FASTA file.

Generating the Cell Ranger reference package

The references in Cell Ranger reference package were generated with the steps described above. When creating the Cell Ranger hg19 reference, the GTF file downloaded from ENSEMBL was filtered using the following cellranger mkgtf command.

$ cellranger mkgtf hg19-ensembl.gtf hg19-filtered-ensembl.gtf \
                   --attribute=gene_biotype:protein_coding \
                   --attribute=gene_biotype:lincRNA \
                   --attribute=gene_biotype:antisense

Additionally, "chr" was prepended to the chromosome entries in the gtf.

The hg19 FASTA was then downloaded from UCSC and once alternate haplotype chromosomes were removed (any chromsome containing hap e.g. chr4_ctg9_hap1), running cellranger mkref as described above produced the Cell Ranger hg19 reference.

$ cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf

The Cell Ranger mm10 reference was generated similarly using filtered ENSEMBL GTF and UCSC FASTA files.