Software  ›   pipelines

# Reference Support

Cell Ranger provides pre-built hg19, mm10 and ercc92 reference packages for use with the pipeline. If you would like to use your genome FASTA or gene GTF annotations, Cell Ranger supports the use of customer-generated references.

## Compatible Use Cases

Cell Ranger supports the use of customer-generated references for the following scenarios:

• Small numbers of overlapping gene annotations. Otherwise, the pipeline will likely detect very few molecules due to reads aligning non-uniquely to multiple genes.
• Your FASTA and GTF files must be compatible with the open source RNA-seq aligner STAR.

## Making a Reference Package

There are 2 steps to construct a Cell Ranger-compatible reference.

### 1. Filter GTF

GTF files downloaded from sites like ENSEMBL and UCSC often contain many transcripts and genes which often need to be filtered from your final annotation. Often, it is helpful to filter genes based on their key-value pairs in the GTF attribute column. For example, to filter for only protein-coding genes, run the following command on your GTF.

...
$ls hg19 fasta/ genes/ pickle/ reference.json star/  This utility copies your FASTA and GTF, indexes it in several formats, and outputs a folder named <genome>. #### Multiple Species To create a reference for multiple species, run the mkreference command with your FASTA and GTF files similar to the single species case above. However, the order of the --genome, --fasta and --genes options are important as the first --genome option listed corresponds to the first --fasta and --genes options listed. $ cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf \
--genome=mm10 --fasta=mm10.fa --genes=mm10-filtered-ensembl.gtf
...
ls hg19_and_mm10 fasta/ genes/ pickle/ reference.json star/  #### System Requirements Indexing a typical human 3Gb FASTA file often takes up to 8 core hours and requires 32 GB of memory. We recommend you run the mkreference command with --nthreads equal to the number of cores available on your system. You can also specify the amount of memory (in GB) cellranger should use during alignment via STAR. The default is set to 16 GB. Please note the amount of memory your reference uses during alignment must be greater than the number of gigabases in the input FASTA file. ## Generating the Cell Ranger reference package The references in Cell Ranger reference package were generated with the steps described above. When creating the Cell Ranger hg19 reference, the GTF file downloaded from ENSEMBL was filtered using the following cellranger mkgtf command.  cellranger mkgtf hg19-ensembl.gtf hg19-filtered-ensembl.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lincRNA \
--attribute=gene_biotype:antisense


Additionally, "chr" was prepended to the chromosome entries in the gtf.

The hg19 FASTA was then downloaded from UCSC and once alternate haplotype chromosomes were removed (any chromsome containing hap e.g. chr4_ctg9_hap1), running cellranger mkref as described above produced the Cell Ranger hg19 reference.

\$ cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf


The Cell Ranger mm10 reference was generated similarly using filtered ENSEMBL GTF and UCSC FASTA files.