10x Genomics
Chromium Single Cell Immune Profiling

Cell Ranger7.1, printed on 04/02/2025

V(D)J Annotations

Structure of V(D)J transcript
File format overview
Annotation files overview
Clonotype CSV file
Contig annotation CSV files
Contig annotation BED files
Contig annotation JSON files
Consensus annotation CSV files
AIRR rearrangements TSV file
Next steps

Structure of V(D)J transcript

The structure of a typical V(D)J transcipt:

UTR: Untranslated region; FWR: Framework region; CDR: Complementarity determining region

The cellranger vdj pipeline provides amino acid and nucleotide sequences for framework and complementarity determining regions (CDRs). The V(D)J annotations on the assembled contigs and on the clonotype consensus sequences are produced in multiple formats.

Learn more about productive contigs on the Annotation Algorithm page.

File format overview

CSV: High-level annotations with one contig, consensus, or clonotype per row.
JSON: Detailed annotations, including alignment coordinates and amino acid translations.
BED: Germline V(D)J segments as features for use with tools like IGV.
TSV: Used for the AIRR rearrangement format of V(D)J contigs and consensus sequences.

Annotation files overview

clonotypes.csv: High-level descriptions of each clonotype.
consensus_annotations.csv: High-level and detailed annotations of each clonotype consensus sequence
filtered_contig_annotations.csv: High-level annotations of each high-confidence contigs from cell-associated barcodes. This is a subset of all_contig_annotations.csv.
all_contig_annotations.csv: High-level and detailed annotations of all contigs (from cell and background barcodes) in CSV format.
all_contig_annotations.bed: High-level and detailed annotations of all contigs (from cell and background barcodes) in BED format.
all_contig_annotations.json: High-level and detailed annotations of all contigs (from cell and background barcodes) in JSON format.
airr_rearrangement.tsv: Annotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format.

Clonotype CSV file

The clonotypes.csv file provides high-level descriptions of each clonotype.

Column	Description
`clonotype_id`	The ID of the clonotype to which this consensus sequence was assigned.
`frequency`	The observed number of cell barcodes with this clonotype.
`proportion`	The observed fraction of cell barcodes with this clonotype.
`cdr3s_aa`	A semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain.
`cdr3s_nt`	A semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain.
`inkt_evidence`	For T cells, this column indicates whether the clonotype is a group of iNKT cells. The evidence is semicolon-delimited list of `chain:matches`, where chain is one of TRA or TRB and matches is one of `genes`, `junction` or `genes+junction`. See iNKT/MAIT for more information.
`mait_evidence`	For T cells, this column indicates whether the clonotype is a group of MAIT cells. The evidence is semicolon-delimited list of `chain:matches`, where chain is one of TRA or TRB and matches is one of `genes`, `junction` or `genes+junction`. See iNKT/MAIT for more information.

Go back to annotation files overview section

Consensus Annotation CSV Files

The consensus_annotations.csv file provides high-level and detailed annotations of each clonotype consensus sequence.

Column	Description
`clonotype_id`	The ID of the clonotype to which this consensus sequence was assigned.
`consensus_id`	The ID of this consensus sequence.
`v_start`	0-based index of the V region start position on the consensus sequence.
`v_end`	0-based index of the V region end position on the consensus sequence.
`v_end_ref`	0-based index of the V gene end position on the reference.
`j_start`	0-based index of the J region start position on the consensus sequence.
`j_start_ref`	0-based index of the J gene start position on the reference.
`j_end`	0-based index of the J region end position on the consensus sequence.
`cdr3_start`	0-based index of the CDR3 region start position on the consensus sequence.
`cdr3_end`	0-based index of the CDR3 region end position on the consensus sequence.

The remaining columns are shared with those under the Contig Annotation CSV Files section.

Go back to annotation files overview section

Contig annotation CSV files

The all_contig_annotations.csv contains high-level and detailed annotations of all contigs (from cell and background barcodes) in CSV format. The filtered_contig_annotations.csv contains high-level annotations of each high-confidence contig from cell-associated barcodes. The filtered_contig_annotations.csv file contains a subsets of the contigs seen in all_contig_annotations.csv. Both files have these columns:

Column	Description
`barcode`	Cell barcode for this contig.
`is_cell`	True or False value indicating whether the barcode was called as a cell.
`contig_id`	Unique identifier for this contig.
`high_confidence`	True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or other artifact).
`length`	The contig sequence length in nucleotides.
`chain`	The chain associated with this contig: TRA, TRB, IGK, IGL, or IGH.
`v_gene`	The highest-scoring V segment, e.g., TRAV1-1.
`d_gene`	The highest-scoring D segment, e.g., TRBD1.
`j_gene`	The highest-scoring J segment, e.g., TRAJ1-1.
`c_gene`	The highest-scoring C segment, e.g., TRAC.
`full_length`	True or False value indicating if the contig was declared as full-length.
`productive`	True or False value indicating if the contig was declared as productive.
`fwr1`	The predicted FWR1 amino acid sequence.
`fwr1_nt`	The predicted FWR1 nucleotide sequence.
`cdr1`	The predicted CDR1 amino acid sequence.
`cdr1_nt`	The predicted CDR1 nucleotide sequence.
`fwr2`	The predicted FWR2 amino acid sequence.
`fwr2_nt`	The predicted FWR2 nucleotide sequence.
`cdr2`	The predicted CDR2 amino acid sequence.
`cdr2_nt`	The predicted CDR2 nucleotide sequence.
`fwr3`	The predicted FWR3 amino acid sequence.
`fwr3_nt`	The predicted FWR3 nucleotide sequence.
`cdr3`	The predicted CDR3 amino acid sequence.
`cdr3_nt`	The predicted CDR3 nucleotide sequence.
`fwr4`	The predicted FWR4 amino acid sequence.
`fwr4_nt`	The predicted FWR4 nucleotide sequence.
`reads`	The number of reads aligned to this contig.
`umis`	The number of distinct UMIs aligned to this contig.
`raw_clonotype_id`	The ID of the clonotype to which this cell barcode was assigned.
`raw_consensus_id`	The ID of the consensus sequence to which this contig was assigned.
`exact_subclonotype_id`	The ID of the exact subclontype to which this cell barcode was assigned.

Details on how the Cell Ranger algorithm delimits CDRs (Complementarity Determining Regions) and FWRs (Frame Work Regions) are provided on the enclone features page.

Go back to annotation files overview section

Contig annotation BED files

The all_contig_annotations.bed file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in BED format. The columns are not named but correspond to:

Contig name
Nucleotide position at which the contig annotation starts
Nucleotide position at which the contig annotation ends
Annotation

The all_contig_annotations.bed provides information about the structure of each assembled contig and allows further investigation into why some contigs were filtered out. An example all_contig_annotations.bed is shown here:

AAACCTGAGACAGGCT-1_contig_1	0	36	IGKV3-11_5'UTR
AAACCTGAGACAGGCT-1_contig_1	36	381	IGKV3-11_L-REGION+V-REGION
AAACCTGAGACAGGCT-1_contig_1	376	415	IGKJ2_J-REGION
AAACCTGAGACAGGCT-1_contig_1	415	551	IGKC_C-REGION

Go back to annotation files overview section

Contig annotation JSON files

The all_contig_annotations.json file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in JSON format. This file can be used to learn more about each assembled contig, and investigate why some contigs were filtered out. The all_contig_annotations.json file is the input required to run enclone.

Field	Description
`barcode`	Barcode sequence
`contig_name`	Name of the contig
`sequence`	Nucleotide sequence of the contig
`quals`	Contig quality score
`fraction_of_reads_for_this_barcode_provided_as_input_to_assembly`	Fraction of reads for this barcode that were provided as input to the assembly algorith
`read_count`	Number of reads assigned to this contig
`umi_count`	Number of UMIs assigned to this contig
`start_codon_pos`	Starting nucleotide base position of the start codon on the contig
`stop_codon_pos`	Last nucleotide base position of stop codon on the contig
`aa_sequence`	Amino acid sequence of the contig
`frame`	Unused field. Ignored by the algorithm.
`cdr3`	Amino acid sequence of the contig's CDR3
`cdr3_seq`	Nucleotide sequence of the contig's CDR3
`cdr3_start`	Starting base of the contig's CDR3
`cdr3_stop`	Last base of the contig's CDR3
`fwr1`-`fwr4`	Optional; Start and stop positions of the contig's FWR1-FWR4 regions
`cdr1`-`cdr2`	Optional; Start and stop positions of the contig's CDR1-CDR2 regions
`annotations`	The annotations for the contig from the reference file
`clonotype`	Null; filled in after clonotyping
`high_confidence`	TRUE or FALSE statement of whether the contig has high confidence
`validated_umis`	A list of UMIs that have been validated
`non_validated_umis`	A list of UMIs that have not been validated
`invalidated_umis`	A list of invalidated UMIs
`is_cell`	TRUE or FALSE statement about whether the barcode was declared a cell
`productive`	TRUE or FALSE statement about whether the contig was productive based on five criteria. NULL=not full length.
`filtered`	Always TRUE
`is_gex_cell`	TRUE or FALSE statement about whether the barcode was declared a cell by Gene expression data. Null=Data not available
`is_asm_cell`	TRUE or FALSE statement about whether the barcode was declared a cell by the VDJ assembler. Null=Data not available
`full_length`	TRUE or FALSE statement about whether the contig is full length.

Go back to annotation files overview section

AIRR rearrangements TSV file

The airr_rearrangement.tsv file provides the annotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format.

Column	Description
`cell_id`	Cell barcode defining the cell for the query sequence.
`clone_id`	Clonotype ID/clonotype assignment.
`sequence_id`	The name of the contig associated with the rearrangement.
`sequence`	The nucleotide sequence of the rearrangement.
`sequence_aa`	The amino acid sequence of the rearrangement.
`productive`	Whether or not the rearrangement is productive.
`rev_comp`	Set to `false` by default (10x Genomics V(D)J sequences are not reverse complemented).
`v_call`	The name of the aligned V gene for the rearrangement.
`v_cigar`	The CIGAR string of the V gene alignment.
`d_call`	The name of the aligned D gene for the rearrangement.
`d_cigar`	The CIGAR string of the D gene alignment.
`j_call`	The name of the aligned J gene for the rearrangement.
`j_cigar`	The CIGAR string of the J gene alignment.
`c_call`	The name of the aligned C gene for the rearrangement.
`c_cigar`	The CIGAR string of the C gene alignment.
`sequence_alignment`	The aligned sequence of the VDJ rearrangement.
`germline_alignment`	The assembled, aligned, full-length inferred germline sequence of the aligned sequence.
`junction`	The nucleotide sequence of the rearrangement's junction (CDR3).
`junction_aa`	The amino acid sequence of the rearrangement's junction (CDR3).
`junction_length`	The length of the rearrangement's junction nucleotide sequence.
`junction_aa_length`	The length of the rearrangement's junction amino acid sequence.
`v_sequence_start`	1-based index on the contig of the V region start position.
`v_sequence_end`	1-based index on the contig of the V region end position.
`d_sequence_start`	1-based index on the contig of the D region start position.
`d_sequence_end`	1-based index on the contig of the D region end position.
`j_sequence_start`	1-based index on the contig of the J region start position.
`j_sequence_end`	1-based index on the contig of the J region end position.
`c_sequence_start`	1-based index on the contig of the C region start position.
`c_sequence_end`	1-based index on the contig of the C region end position.
`consensus_count`	The number of reads associated with this rearrangement.
`duplicate_count`	The number of unique molecular identifiers associated with this rearrangement.
`is_cell`	Is this rearrangement cell-associated?

The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.

Go back to annotation files overview section

Next steps

Visit the V(D)J outputs Overview page for a list of output files
Learn more about the V(D)J algorithm
Learn more about BAM and FASTA/FASTQ output files
Explore the Loupe V(D)J Browser to visualize your data

Cell Ranger

Loupe

10x Genomics
Chromium Single Cell Immune Profiling

V(D)J Annotations

Table of Contents

Structure of V(D)J transcript

File format overview

Annotation files overview

Clonotype CSV file

Consensus Annotation CSV Files

Contig annotation CSV files

Contig annotation BED files

Contig annotation JSON files

AIRR rearrangements TSV file

Next steps

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger

Loupe

10x GenomicsChromium Single Cell Immune Profiling

V(D)J Annotations

Table of Contents

Structure of V(D)J transcript

File format overview

Annotation files overview

Clonotype CSV file

Consensus Annotation CSV Files

Contig annotation CSV files

Contig annotation BED files

Contig annotation JSON files

AIRR rearrangements TSV file

Next steps

10x Genomics
Chromium Single Cell Immune Profiling