Long Ranger2.2, printed on 09/11/2024
Analysis software for 10x Genomics linked read products is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support. |
Long Ranger algorithms are tuned and optimized for human haplotype phasing and structural variant calling, and 10x Genomics provides pre-built reference packages for use with the pipeline. The pre-built references have the following characteristics:
All our pre-built reference packages include ENSEMBL gene annotations. Use of the pre-built references is strongly recommended unless you have specific requirements that match one of the compatible use cases below.
Long Ranger supports user-generated references that meet the following criteria:
Example scenarios for user-generated references:
There are 3 steps to construct a Long Ranger-compatible reference.
To create a reference, run the longranger mkref command on your FASTA file. The contigs in your FASTA must meet the compatibility requirements above.
$ longranger mkref hsapiens-asm19.fasta ... indexing may take over an hour ... $ ls refdata-hsapiens-asm19 fasta/ genes/ genome regions/ snps/
This utility copies your FASTA, indexes it in several formats, and outputs a folder named refdata-<fasta_name>
. Note: to use GATK with your reference you will need to create a genome index .dict
required by GATK. You can use Picard or GATK4 to create this file. The longranger mkref tool will instruct you on how to create it.
See the Optional Reference Files section below for additional files that you should consider including in a custom reference.
If you have followed the steps above correctly, your reference folder should now contain the following files:
$ tree refdata-hsapiens-asm19 refdata-hsapiens-asm19/ ├── fasta │ ├── genome.fa │ ├── genome.fa.amb │ ├── genome.fa.ann │ ├── genome.fa.bwt │ ├── genome.fa.fai │ ├── genome.fa.flat │ ├── genome.fa.gdx │ ├── genome.fa.pac │ └── genome.fa.sa ├── genes ├── genome ├── regions └── snps 4 directories, 13 files
To run Long Ranger with your new reference, set the --reference
argument of longranger to your new reference:
$ longranger wgs --reference=/path/to/refdata-hsapiens-asm19 ...
An number of extra reference files are recognized by Long Ranger and can be used to customize some behavior of the pipeline. Refer to this documentation and the files in the 10x-supplied references for details on how to supply these files for your custom reference.
At this point, the reference folder created by longranger mkref is usable by Long Ranger, but it is strongly recommended that you also include a region filter for structural variant calling.
For custom references that are based on hg19 or GRCh38, we provide pre-built filter files that you can simply copy into your reference. Follow the instructions below, depending on the naming convention of your reference:
hg19 Convention ("chr1")
$ cd refdata-hsapiens-asm19 $ cd regions $ wget https://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed $ wget https://cf.10xgenomics.com/supp/genome/hg19/segdups.bedpe
b37 Convention ("1")
$ cd refdata-hsapiens-asm19 $ cd regions $ wget https://cf.10xgenomics.com/supp/genome/b37/sv_blacklist.bed $ wget https://cf.10xgenomics.com/supp/genome/b37/segdups.bedpe
GRCh38
$ cd refdata-hsapiens-GRCh38 $ cd regions $ wget https://cf.10xgenomics.com/supp/genome/GRCh38/sv_blacklist.bed $ wget https://cf.10xgenomics.com/supp/genome/GRCh38/segdups.bedpe
The sv_blacklist.bed file should be placed in refdata-folder/regions/sv_blacklist.bed, where refdata-folder is the reference folder created by longranger mkref. The segdups.bedpe file should be placed in refdata-folder/regions/segdups.bedpe
For all other references, follow these instructions to create custom filter files.
To enable the display of the genes and exons tracks in the Loupe genome browser, download our gene annotations file into your reference. The annotation source can be found at ENSEMBL. This file will work regardless of the naming convention of your reference.
$ cd refdata-hsapiens-asm19 $ cd genes $ wget https://cf.10xgenomics.com/supp/genome/gene_annotations.gtf.gz
This step is optional, but if you omit this file, you will not be able to search by gene name in Loupe, or see the genes and exons tracks in the Loupe Haplotype view. Loupe will accept any GTF subject to the following requirements:
The gene_annotations.gtf.gz file should be placed in refdata-folder/genes/gene_annotations.gtf.gz, where refdata-folder is the reference folder created by longranger mkref.
To disable variant calling, phasing, and SV calling on non-standard contigs (e.g. unplaced, or alternate contigs), you can supply refdata-folder/fasta/primary_contigs.txt with a new-line separated list of the 'primary' contigs in the assembly. If this file is supplied, then variant calling, phasing, and SV calling will only be performed on the primary contigs. If you are creating a reference from an assembly with a large number of small contigs, you can concatentate the smallest assembly contigs into a single reference entry so that your reference has at most 500 FASTA entries. We highly recommend omitting the concatenated contigs from primary_contigs.txt. Analyzing this entry for SVs can cause Long Ranger to run extremely slowly, and is likely to generate many spurious SV calls.
Long Ranger can automatically determine the sex of a sample by comparing the coverage on a male-specific chromosome to the coverage on an autosomal chromosome. The file fasta/sex_chromosomes.tsv is used to indicate which chromosomes to use for this purpose. Create a two-line, tab-delimited file with the following format, indicating the name of the male-specific and autosomal chromosomes to use for sex determination:
male chrY autosomal chr1