Software  ›   pipelines
If your question is not answered here, please email us at:  support@10xgenomics.com

10x Genomics
Chromium Genome & Exome

Reference Support

Long Ranger algorithms are tuned and optimized for human haplotype phasing and structural variant calling, and 10x Genomics provides pre-built reference packages for use with the pipeline. The pre-built references have the following characteristics:

All our pre-built reference packages include ENSEMBL gene annotations. Use of the pre-built references is strongly recommended unless you have specific requirements that match one of the compatible use cases below.

Compatible Use Cases

Long Ranger supports user-generated references that meet the following criteria:

Example scenarios for user-generated references:

Making a Reference Package

There are 3 steps to construct a Long Ranger-compatible reference.

1. Index your FASTA

To create a reference, run the longranger mkref command on your FASTA file. The contigs in your FASTA must meet the compatibility requirements above.

$ longranger mkref hsapiens-asm19.fasta
... indexing may take over an hour ...
$ ls refdata-hsapiens-asm19
fasta/  genes/  genome  regions/  snps/

This utility copies your FASTA, indexes it in several formats, and outputs a folder named refdata-<fasta_name>. Note: to use GATK with your reference you will need to create a genome index .dict required by GATK. You can use Picard or GATK4 to create this file. The longranger mkref tool will instruct you on how to create it.

2. Add Optional Reference Files

See the Optional Reference Files section below for additional files that you should consider including in a custom reference.

3. Confirm Contents

If you have followed the steps above correctly, your reference folder should now contain the following files:

$ tree refdata-hsapiens-asm19
refdata-hsapiens-asm19/
├── fasta
│   ├── genome.fa
│   ├── genome.fa.amb
│   ├── genome.fa.ann
│   ├── genome.fa.bwt
│   ├── genome.fa.fai
│   ├── genome.fa.flat
│   ├── genome.fa.gdx
│   ├── genome.fa.pac
│   └── genome.fa.sa
├── genes
├── genome
├── regions
└── snps
4 directories, 13 files

Running Long Ranger

To run Long Ranger with your new reference, set the --reference argument of longranger to your new reference:

$ longranger wgs --reference=/path/to/refdata-hsapiens-asm19 ...

Optional Reference Files

An number of extra reference files are recognized by Long Ranger and can be used to customize some behavior of the pipeline. Refer to this documentation and the files in the 10x-supplied references for details on how to supply these files for your custom reference.

1. SV Calling Filter File

At this point, the reference folder created by longranger mkref is usable by Long Ranger, but it is strongly recommended that you also include a region filter for structural variant calling.

 The filter file is used by the SV algorithm to reduce false positives due to gaps in the reference, known or putative assembly issues such as unplaced contigs, and highly polymorphic regions.

For custom references that are based on hg19 or GRCh38, we provide pre-built filter files that you can simply copy into your reference. Follow the instructions below, depending on the naming convention of your reference:

hg19 Convention ("chr1")

$ cd refdata-hsapiens-asm19
$ cd regions
$ wget http://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed
$ wget http://cf.10xgenomics.com/supp/genome/hg19/segdups.bedpe

b37 Convention ("1")

$ cd refdata-hsapiens-asm19
$ cd regions
$ wget http://cf.10xgenomics.com/supp/genome/b37/sv_blacklist.bed
$ wget http://cf.10xgenomics.com/supp/genome/b37/segdups.bedpe

GRCh38

$ cd refdata-hsapiens-GRCh38
$ cd regions
$ wget http://cf.10xgenomics.com/supp/genome/GRCh38/sv_blacklist.bed
$ wget http://cf.10xgenomics.com/supp/genome/GRCh38/segdups.bedpe

The sv_blacklist.bed file should be placed in refdata-folder/regions/sv_blacklist.bed, where refdata-folder is the reference folder created by longranger mkref. The segdups.bedpe file should be placed in refdata-folder/regions/segdups.bedpe

For all other references, follow these instructions to create custom filter files.

2. Genes/Exons File for Loupe

To enable the display of the genes and exons tracks in the Loupe genome browser, download our gene annotations file into your reference. The annotation source can be found at ENSEMBL. This file will work regardless of the naming convention of your reference.

$ cd refdata-hsapiens-asm19
$ cd genes
$ wget http://cf.10xgenomics.com/supp/genome/gene_annotations.gtf.gz

This step is optional, but if you omit this file, you will not be able to search by gene name in Loupe, or see the genes and exons tracks in the Loupe Haplotype view. Loupe will accept any GTF subject to the following requirements:

The gene_annotations.gtf.gz file should be placed in refdata-folder/genes/gene_annotations.gtf.gz, where refdata-folder is the reference folder created by longranger mkref.

3. Primary Contigs File

To disable variant calling, phasing, and SV calling on non-standard contigs (e.g. unplaced, or alternate contigs), you can supply refdata-folder/fasta/primary_contigs.txt with a new-line separated list of the 'primary' contigs in the assembly. If this file is supplied, then variant calling, phasing, and SV calling will only be performed on the primary contigs. If you are creating a reference from an assembly with a large number of small contigs, you can concatentate the smallest assembly contigs into a single reference entry so that your reference has at most 500 FASTA entries. We highly recommend omitting the concatenated contigs from primary_contigs.txt. Analyzing this entry for SVs can cause Long Ranger to run extremely slowly, and is likely to generate many spurious SV calls.

4. Sex Chromosome File

Long Ranger can automatically determine the sex of a sample by comparing the coverage on a male-specific chromosome to the coverage on an autosomal chromosome. The file fasta/sex_chromosomes.tsv is used to indicate which chromosomes to use for this purpose. Create a two-line, tab-delimited file with the following format, indicating the name of the male-specific and autosomal chromosomes to use for sex determination:

male chrY
autosomal chr1