# Reference Support

Long Ranger algorithms are tuned and optimized for human haplotype phasing and structural variant calling, and 10x Genomics provides pre-built reference packages for use with the pipeline. The pre-built references have the following characteristics:

• Human GRCh37 build in two variants:
• hg19/UCSC-style chromosome naming convention ("chr1")
• b37/1000 Genomes-style chromosome naming convention ("1")
• Human GRCh38 build
• Our GRCh38 build contains decoy contigs, but not alternate haplotypes.

All our pre-built reference packages include ENSEMBL gene annotations. Use of the pre-built references is strongly recommended unless you have specific requirements that match one of the compatible use cases below.

## Compatible Use Cases

Long Ranger supports user-generated references that meet the following criteria:

• Diploid genome — The phasing algorithm assumes 2 haplotypes.
• 500 FASTA entries or fewer — if your assembly has more than 500 FASTA entries, concatenate smaller contigs together with 500 N's separating each original contig, until there are fewer than 500 FASTA entries total. Note: when creating a concatenated reference contig, you must create a primary_contigs.txt file, and omit the concatenated contig from primary_contigs.txt. See below for details.
• All contigs must be no more than 2^29-1 bp, or 528Mb, in length; this is a limitation of the BAM index file format.
• All contigs must have no colons or spaces in their names.

Example scenarios for user-generated references:

• Small numbers of additional contigs (e.g. decoy and viral sequences)
• Relatively complete assemblies of human and non-human genomes

## Making a Reference Package

There are 3 steps to construct a Long Ranger-compatible reference.

To create a reference, run the longranger mkref command on your FASTA file. The contigs in your FASTA must meet the compatibility requirements above.

$longranger mkref hsapiens-asm19.fasta ... indexing may take over an hour ...$ ls refdata-hsapiens-asm19
fasta/  genes/  genome  regions/  snps/


This utility copies your FASTA, indexes it in several formats, and outputs a folder named refdata-<fasta_name>. Note: to use GATK with your reference you will need to create a genome index .dict required by GATK. You can use Picard or GATK4 to create this file. The longranger mkref tool will instruct you on how to create it.

### 2. Add Optional Reference Files

See the Optional Reference Files section below for additional files that you should consider including in a custom reference.

### 3. Confirm Contents

If you have followed the steps above correctly, your reference folder should now contain the following files:



## Optional Reference Files

An number of extra reference files are recognized by Long Ranger and can be used to customize some behavior of the pipeline. Refer to this documentation and the files in the 10x-supplied references for details on how to supply these files for your custom reference.

### 1. SV Calling Filter File

At this point, the reference folder created by longranger mkref is usable by Long Ranger, but it is strongly recommended that you also include a region filter for structural variant calling.

The filter file is used by the SV algorithm to reduce false positives due to gaps in the reference, known or putative assembly issues such as unplaced contigs, and highly polymorphic regions.

For custom references that are based on hg19 or GRCh38, we provide pre-built filter files that you can simply copy into your reference. Follow the instructions below, depending on the naming convention of your reference:

hg19 Convention ("chr1")

$cd refdata-hsapiens-asm19$ cd regions
$wget http://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed$ wget http://cf.10xgenomics.com/supp/genome/hg19/segdups.bedpe


b37 Convention ("1")

$cd refdata-hsapiens-asm19$ cd regions
$wget http://cf.10xgenomics.com/supp/genome/b37/sv_blacklist.bed$ wget http://cf.10xgenomics.com/supp/genome/b37/segdups.bedpe


GRCh38

$cd refdata-hsapiens-GRCh38$ cd regions
$wget http://cf.10xgenomics.com/supp/genome/GRCh38/sv_blacklist.bed$ wget http://cf.10xgenomics.com/supp/genome/GRCh38/segdups.bedpe


The sv_blacklist.bed file should be placed in refdata-folder/regions/sv_blacklist.bed, where refdata-folder is the reference folder created by longranger mkref. The segdups.bedpe file should be placed in refdata-folder/regions/segdups.bedpe

For all other references, follow these instructions to create custom filter files.

### 2. Genes/Exons File for Loupe

To enable the display of the genes and exons tracks in the Loupe genome browser, download our gene annotations file into your reference. The annotation source can be found at ENSEMBL. This file will work regardless of the naming convention of your reference.

$cd refdata-hsapiens-asm19$ cd genes
\$ wget http://cf.10xgenomics.com/supp/genome/gene_annotations.gtf.gz


This step is optional, but if you omit this file, you will not be able to search by gene name in Loupe, or see the genes and exons tracks in the Loupe Haplotype view. Loupe will accept any GTF subject to the following requirements:

• Loupe only considers records with a "feature" of "gene" or "exon"; other feature types will be ignored.
• Loupe requires all records to have a "gene_id" and "gene_name" tag.
• Exons must have the same gene_id tag as their respective gene.
• The file must be compressed in gzip format.

The gene_annotations.gtf.gz file should be placed in refdata-folder/genes/gene_annotations.gtf.gz, where refdata-folder is the reference folder created by longranger mkref.

### 3. Primary Contigs File

To disable variant calling, phasing, and SV calling on non-standard contigs (e.g. unplaced, or alternate contigs), you can supply refdata-folder/fasta/primary_contigs.txt with a new-line separated list of the 'primary' contigs in the assembly. If this file is supplied, then variant calling, phasing, and SV calling will only be performed on the primary contigs. If you are creating a reference from an assembly with a large number of small contigs, you can concatentate the smallest assembly contigs into a single reference entry so that your reference has at most 500 FASTA entries. We highly recommend omitting the concatenated contigs from primary_contigs.txt. Analyzing this entry for SVs can cause Long Ranger to run extremely slowly, and is likely to generate many spurious SV calls.

### 4. Sex Chromosome File

Long Ranger can automatically determine the sex of a sample by comparing the coverage on a male-specific chromosome to the coverage on an autosomal chromosome. The file fasta/sex_chromosomes.tsv is used to indicate which chromosomes to use for this purpose. Create a two-line, tab-delimited file with the following format, indicating the name of the male-specific and autosomal chromosomes to use for sex determination:

male chrY
autosomal chr1