10x Genomics
Chromium Single Cell Immune Profiling

Cell Ranger6.0, printed on 03/11/2025

Reference Support

Cell Ranger provides prebuilt human and mouse reference packages for use with the pipeline, downloadable here. Our reference packages are based on the T cell receptor (TRA, TRB) and B cell immunoglobin (IGH, IGL, IGK) gene annotations in Ensembl version 94 for the human and mouse references. Our reference also includes multiple corrections to various V, D, J, and C genes based on empirical observations and to correct clear errors such as frameshifts, leader peptide truncations, and nucleotides that are never observed in rearrangements. These changes are documented in the release notes of each version of Cell Ranger. See alsoo Prebuilt References for more information about how these references were created.

If you would like to use your own genome FASTA or gene GTF annotations, Cell Ranger supports the use of customer-generated Ensembl-based references. Cell Ranger also includes support for generating a V(D)J reference from the IMGT database.

There are two ways to generate a V(D)J reference:

Making a Genome-based Reference Package (e.g., using Ensembl)
Making a V(D)J Segment-based Reference Package (e.g., using IMGT)

Making a Genome-based Reference Package

The cellranger mkvdjref tool can be used to generate a custom reference package from a genome sequence FASTA File and a gene annotation GTF.

$ cellranger mkvdjref --genome=my_vdj_ref \
                      --fasta=GRCh38_ensembl.fasta \
                      --genes=GRCh38_ensembl.gtf

A Cell Ranger V(D)J reference consists of germline gene segment sequences. It assumes that these sequences are contained within a genome reference FASTA, and that an Ensembl-formatted gene annotation GTF points to the relevant gene segments.

Input Genome FASTA File

cellranger mkvdjref expects a FASTA file containing genomic reference sequences whose names are consistent with the names used in the GTF file.

Input Gene GTF File

Cell Ranger V(D)J expects a GTF file in an Ensembl-like format that contains information about V(D)J gene segments.

GTF Columns

Column	Name	Description
1	Chromosome	Must refer to a chromosome/contig in the genome fasta.
2	Source	Unused.
3	Feature	Cell Ranger `vdj` only uses rows where this line is equal to one of `CDS` or `five_prime_utr`.
4	Start	Start position on the reference (1-based inclusive).
5	End	End position on the reference (1-based inclusive).
6	Score	Unused.
7	Strand	Strandedness of this feature on the reference: `+` or `-`.
8	Frame	Unused.
9	Attributes	A semicolon-delimited list of key-value pairs of the form `key "value"`. The attribute keys used by Cell Ranger V(D)J are detailed below.

GTF Attributes

GTF Attribute	Description
transcript_id	Becomes the `record_id` in the Cell Ranger V(D)J reference entry format.
transcript_biotype	The value is used to infer the V(D)J segment type. Either `transcript_biotype` or `gene_biotype` must be a value in the "Accepted Biotypes" list below. If `transcript_biotype` is not on the accepted list, then `gene_biotype` is used.
gene_biotype	See `transcript_biotype`.
gene_name	Must be specified. Becomes the `gene_name` in the Cell Ranger V(D)J reference entry format.

Accepted Biotypes

TR_C_gene
TR_D_gene
TR_J_gene
TR_V_gene
IG_C_gene
IG_D_gene
IG_J_gene
IG_V_gene

Example Minimal GTF Row Used by Cell Ranger V(D)J

14      havana  CDS     21621904        21621946        .       +       0       transcript_id "ENST00000542354"; gene_name "TRAV1-1"; transcript_biotype "TR_V_gene";

Reference Package Format

cellranger mkvdjref creates a directory whose named is specified by the --genome argument.

$ tree my_vdj_ref
my_vdj_ref
├── fasta
│   └── regions.fa
└── reference.json

V(D)J Segment FASTA Format

This is a FASTA file where the description line contains V(D)J-specific metadata.

>id|display_name record_id|gene_name|region_type|chain_type|chain|isotype|allele_name
SEQUENCE

Field	Description
id	Unique integer ID for this feature.
display_name	This is used when displaying the segment in, e.g., Loupe V(D)J Browser.
record_id	Describes the accession ID of the source molecule. Unused.
gene_name	The name of the V(D)J gene, e.g. TRBV2-1.
region_type	The only used values are `L-REGION+V-REGION`, `D-REGION`, `J-REGION`, and `C-REGION`.
chain_type	Specifies whether this is a T- or B- cell receptor chain. The only used values are TR and IG.
isotype	Specifies the class of heavy chain constant region; set to `None` if not applicable.
allele_name	The identifier for the allele, e.g. 01 for TRBV2-1*01, or `None` if no allele is to be specified.

Examples

>1|TRAV1*01 AF259072|TRAV1|L-REGION+V-REGION|TR|TRA|None|01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>979|IGHA*01 J00475|IGHA|C-REGION|IG|IGH|A|01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT

You can directly generate a FASTA file in the segment format and create a V(D)J reference by passing this file to cellranger mkvdjref with the --seqs argument, for example:

$ cellranger mkvdjref --genome=my_vdj_ref \
                      --seq=custom_segments.fasta

Using IMGT

Cell Ranger comes with a script called fetch-imgt which downloads the relevant sequences from IMGT and generates a V(D)J segment FASTA file. This is then used to generate a V(D)J reference package.

This example generates a mouse V(D)J reference based on IMGT.

# source the environment of CellRanger 6.0.2 for your shell (bash/csh)
# (for bash shell)
source path/to/cellranger-6.0.2/sourceme.bash
# OR (for C shell)
source path/to/cellranger-6.0.2/sourceme.csh

echo "Check pip..."
python -m ensurepip

echo "version checks"
python -m pip --version
python --version

echo "Installing fetch-imgt packages"
python -m pip install lxml
python -m pip install 'biopython==1.74'
python -m pip install requests

# Using a script that comes with Cell Ranger, get data from IMGT and create a FASTA suitable for use by mkvdjref
# The option --species is the name of the species for which the data is to be downloaded.
# The option --genome provides the prefix used to name the 2 output files. Only the file with suffix -mkvdjref-input.fasta is used as input to the mkvdjref utility.
path/to/cellranger-6.0.2/cellranger-cs/6.0.2/lib/bin/fetch-imgt --genome vdj_IMGT_mouse --species "Mus musculus"
 
# Build the CR reference. could also include Cell Ranger on your PATH to avoid specifying the full path for cellranger.
# The option --genome is a single identifier with no special symbols aside from hyphen or underscore. The reference will be placed in a directory created with that name.
# The option --seqs is the mkvdjref-input.fasta file generated by the fetch-imgt command.
path/to/cellranger-6.0.2/cellranger mkvdjref --genome=vdj_IMGT_mouse --seqs=vdj_IMGT_mouse-mkvdjref-input.fasta

Cell Ranger

Loupe

10x Genomics
Chromium Single Cell Immune Profiling

Reference Support

Making a Genome-based Reference Package

Input Genome FASTA File

Input Gene GTF File

GTF Columns

GTF Attributes

Accepted Biotypes

Example Minimal GTF Row Used by Cell Ranger V(D)J

Reference Package Format

V(D)J Segment FASTA Format

Examples

Using IMGT

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger

Loupe

10x GenomicsChromium Single Cell Immune Profiling

Reference Support

Making a Genome-based Reference Package

Input Genome FASTA File

Input Gene GTF File

GTF Columns

GTF Attributes

Accepted Biotypes

Example Minimal GTF Row Used by Cell Ranger V(D)J

Reference Package Format

V(D)J Segment FASTA Format

Examples

Using IMGT

10x Genomics
Chromium Single Cell Immune Profiling