HOME  ›   pipelines

# V(D)J Annotations

## Structure of V(D)J transcript

The structure of a typical V(D)J transcipt:

UTR: Untranslated region; FWR: Framework region; CDR: Complementarity determining region

The cellranger vdj pipeline provides amino acid and nucleotide sequences for framework and complementarity determining regions (CDRs). The V(D)J annotations on the assembled contigs and on the clonotype consensus sequences are produced in multiple formats.

## File format overview

• CSV: High-level annotations with one contig, consensus, or clonotype per row.
• JSON: Detailed annotations, including alignment coordinates and amino acid translations.
• BED: Germline V(D)J segments as features for use with tools like IGV.
• TSV: Used for the AIRR rearrangement format of V(D)J contigs and consensus sequences.

## Clonotype CSV file

The clonotypes.csv file provides high-level descriptions of each clonotype.

Column Description
clonotype_id The ID of the clonotype to which this consensus sequence was assigned.
frequency The observed number of cell barcodes with this clonotype.
proportion The observed fraction of cell barcodes with this clonotype.
cdr3s_aa A semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain.
cdr3s_nt A semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain.
inkt_evidence For T cells, this column indicates whether the clonotype is a group of iNKT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information.
mait_evidence For T cells, this column indicates whether the clonotype is a group of MAIT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information.

Go back to annotation files overview section

## Consensus Annotation CSV Files

The consensus_annotations.csv file provides high-level and detailed annotations of each clonotype consensus sequence.

Column Description
clonotype_id The ID of the clonotype to which this consensus sequence was assigned.
consensus_id The ID of this consensus sequence.
v_start 0-based index of the V region start position on the consensus sequence.
v_end 0-based index of the V region end position on the consensus sequence.
v_end_ref 0-based index of the V gene end position on the reference.
j_start 0-based index of the J region start position on the consensus sequence.
j_start_ref 0-based index of the J gene start position on the reference.
j_end 0-based index of the J region end position on the consensus sequence.
cdr3_start 0-based index of the CDR3 region start position on the consensus sequence.
cdr3_end 0-based index of the CDR3 region end position on the consensus sequence.

The remaining columns are shared with those under the Contig Annotation CSV Files section.

Go back to annotation files overview section

## Contig annotation CSV files

The all_contig_annotations.csv contains high-level and detailed annotations of all contigs (from cell and background barcodes) in CSV format. The filtered_contig_annotations.csv contains high-level annotations of each high-confidence contig from cell-associated barcodes. The filtered_contig_annotations.csv file contains a subsets of the contigs seen in all_contig_annotations.csv. Both files have these columns:

Column Description
barcode Cell barcode for this contig.
is_cell True or False value indicating whether the barcode was called as a cell.
contig_id Unique identifier for this contig.
high_confidence True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or other artifact).
length The contig sequence length in nucleotides.
chain The chain associated with this contig: TRA, TRB, IGK, IGL, or IGH.
v_gene The highest-scoring V segment, e.g., TRAV1-1.
d_gene The highest-scoring D segment, e.g., TRBD1.
j_gene The highest-scoring J segment, e.g., TRAJ1-1.
c_gene The highest-scoring C segment, e.g., TRAC.
full_length True or False value indicating if the contig was declared as full-length.
productive True or False value indicating if the contig was declared as productive.
fwr1 The predicted FWR1 amino acid sequence.
fwr1_nt The predicted FWR1 nucleotide sequence.
cdr1 The predicted CDR1 amino acid sequence.
cdr1_nt The predicted CDR1 nucleotide sequence.
fwr2 The predicted FWR2 amino acid sequence.
fwr2_nt The predicted FWR2 nucleotide sequence.
cdr2 The predicted CDR2 amino acid sequence.
cdr2_nt The predicted CDR2 nucleotide sequence.
fwr3 The predicted FWR3 amino acid sequence.
fwr3_nt The predicted FWR3 nucleotide sequence.
cdr3 The predicted CDR3 amino acid sequence.
cdr3_nt The predicted CDR3 nucleotide sequence.
fwr4 The predicted FWR4 amino acid sequence.
fwr4_nt The predicted FWR4 nucleotide sequence.
reads The number of reads aligned to this contig.
umis The number of distinct UMIs aligned to this contig.
raw_clonotype_id The ID of the clonotype to which this cell barcode was assigned.
raw_consensus_id The ID of the consensus sequence to which this contig was assigned.
exact_subclonotype_id The ID of the exact subclontype to which this cell barcode was assigned.

Details on how the Cell Ranger algorithm delimits CDRs (Complementarity Determining Regions) and FWRs (Frame Work Regions) are provided on the enclone features page.

Go back to annotation files overview section

## Contig annotation BED files

The all_contig_annotations.bed file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in BED format. The columns are not named but correspond to:

• Contig name
• Nucleotide position at which the contig annotation starts
• Nucleotide position at which the contig annotation ends
• Annotation

The all_contig_annotations.bed provides information about the structure of each assembled contig and allows further investigation into why some contigs were filtered out. An example all_contig_annotations.bed is shown here:

AAACCTGAGACAGGCT-1_contig_1	0	36	IGKV3-11_5'UTR
AAACCTGAGACAGGCT-1_contig_1	36	381	IGKV3-11_L-REGION+V-REGION
AAACCTGAGACAGGCT-1_contig_1	376	415	IGKJ2_J-REGION
AAACCTGAGACAGGCT-1_contig_1	415	551	IGKC_C-REGION


Go back to annotation files overview section

## Contig annotation JSON files

The all_contig_annotations.json file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in JSON format. This file can be used to learn more about each assembled contig, and investigate why some contigs were filtered out. The all_contig_annotations.json file is the input required to run enclone.

Field Description
barcode Barcode sequence
contig_name Name of the contig
sequence Nucleotide sequence of the contig
quals Contig quality score
fraction_of_reads_for_this_barcode_provided_as_input_to_assembly Fraction of reads for this barcode that were provided as input to the assembly algorith
read_count Number of reads assigned to this contig
umi_count Number of UMIs assigned to this contig
start_codon_pos Starting nucleotide base position of the start codon on the contig
stop_codon_pos Last nucleotide base position of stop codon on the contig
aa_sequence Amino acid sequence of the contig
frame Unused field. Ignored by the algorithm.
cdr3 Amino acid sequence of the contig's CDR3
cdr3_seq Nucleotide sequence of the contig's CDR3
cdr3_start Starting base of the contig's CDR3
cdr3_stop Last base of the contig's CDR3
fwr1-fwr4 Optional; Start and stop positions of the contig's FWR1-FWR4 regions
cdr1-cdr2 Optional; Start and stop positions of the contig's CDR1-CDR2 regions
annotations The annotations for the contig from the reference file
clonotype Null; filled in after clonotyping
high_confidence TRUE or FALSE statement of whether the contig has high confidence
validated_umis A list of UMIs that have been validated
non_validated_umis A list of UMIs that have not been validated
invalidated_umis A list of invalidated UMIs
is_cell TRUE or FALSE statement about whether the barcode was declared a cell
productive TRUE or FALSE statement about whether the contig was productive based on five criteria. NULL=not full length.
filtered Always TRUE
is_gex_cell TRUE or FALSE statement about whether the barcode was declared a cell by Gene expression data. Null=Data not available
is_asm_cell TRUE or FALSE statement about whether the barcode was declared a cell by the VDJ assembler. Null=Data not available
full_length TRUE or FALSE statement about whether the contig is full length.

Go back to annotation files overview section

## AIRR rearrangements TSV file

The airr_rearrangement.tsv file provides the annotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format.

Column Description
cell_id Cell barcode defining the cell for the query sequence.
clone_id Clonotype ID/clonotype assignment.
rev_comp Set to false by default (10x Genomics V(D)J sequences are not reverse complemented).
sequence_id The name of the contig associated with the rearrangement.
sequence The nucleotide sequence of the rearrangement.
sequence_aa The amino acid sequence of the rearrangement.
productive Whether or not the rearrangement is productive.
v_call The name of the aligned V gene for the rearrangement.
v_cigar The CIGAR string of the V gene alignment.
v_sequence_start 1-based index on the contig of the V region start position.
v_sequence_end 1-based index on the contig of the V region end position.
d_call The name of the aligned D gene for the rearrangement.
d_cigar The CIGAR string of the D gene alignment.
d_sequence_start 1-based index on the contig of the D region start position.
d_sequence_end 1-based index on the contig of the D region end position.
j_call The name of the aligned J gene for the rearrangement.
j_cigar The CIGAR string of the J gene alignment.
j_sequence_start 1-based index on the contig of the J region start position.
j_sequence_end 1-based index on the contig of the J region end position.
c_call The name of the aligned C gene for the rearrangement.
c_cigar The CIGAR string of the C gene alignment.
c_sequence_start 1-based index on the contig of the C region start position.
c_sequence_end 1-based index on the contig of the C region end position.
sequence_alignment The aligned sequence of the VDJ rearrangement.
germline_alignment The assembled, aligned, full-length inferred germline sequence of the aligned sequence.
junction The nucleotide sequence of the rearrangement's junction (CDR3).
junction_aa The amino acid sequence of the rearrangement's junction (CDR3).
duplicate_count The number of unique molecular identifiers associated with this rearrangement.
consensus_count The number of reads associated with this rearrangement.
junction_length The length of the rearrangement's junction nucleotide sequence.
junction_aa_length The length of the rearrangement's junction amino acid sequence.
is_cell Is this rearrangement cell-associated?

The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.

Go back to annotation files overview section