HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Immune Profiling

V(D)J Annotations

cellranger vdj pipeline produces V(D)J annotations on the assembled contigs and on the clonotype consensus sequences in multiple formats.

File Type Overview

File type | Description

  •     |-
    

CSV | High-level annotations with one contig, consensus, or clonotype per row. JSON | Detailed annotations, including alignment coordinates and amino acid translations. BED | Germline V(D)J segments as features, for use with a tool like IGV. TSV | Used for the AIRR rearrangement format of VDJ contigs and consensus sequences.


Annotation Files

File | Description

  •                                 |-
    

clonotypes.csv | High-level descriptions of each clonotype. consensus_annotations.csv | High-level and detailed annotations of each clonotype consensus sequence. filtered_contig_annotations.csv | High-level annotations of each high-confidence, cellular contig. This is a subset of all_contig_annotations.csv. all_contig_annotations.{csv,bed,json} | High-level and detailed annotations of each contig. airr_rearrangement.tsv | Annotated contigs and consensus sequences of VDJ rearrangements in the AIRR format.


Clonotype CSV File (clonotypes.csv)

Column Description
clonotype_id The ID of the clonotype to which this consensus sequence was assigned.
frequency The observed number of cell barcodes with this clonotype.
proportion The observed fraction of cell barcodes with this clonotype.
cdr3s_aa A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain.
cdr3s_nt A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain.
inkt_evidence For T cells, this column would contain the evidence, if any, that this clonotype is a group of iNKT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information
mait_evidence For T cells, this column would contain the evidence, if any, that this clonotype is a group of MAIT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information

Contig Annotation CSV Files (*contig_annotations.csv)

Column Description
barcode Cell-barcode for this contig.
is_cell True or False value indicating whether the barcode was called as a cell.
contig_id Unique identifier for this contig.
high_confidence True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or some other artifact).
length The contig sequence length in nucleotides.
chain The chain associated with this contig; for example, TRA, TRB, IGK, IGL, or IGH. A value of "Multi" indicates that segments from multiple chains were present.
v_gene The highest-scoring V segment, for example, TRAV1-1.
d_gene The highest-scoring D segment, for example, TRBD1.
j_gene The highest-scoring J segment, for example, TRAJ1-1.
c_gene The highest-scoring C segment, for example, TRAC.
full_length If the contig was declared as full-length.
productive If the contig was declared as productive.
cdr3 The predicted CDR3 amino acid sequence.
cdr3_nt The predicted CDR3 nucleotide sequence.
reads The number of reads aligned to this contig.
umis The number of distinct UMIs aligned to this contig.
raw_clonotype_id The ID of the clonotype to which this cell barcode was assigned.
raw_consensus_id The ID of the consensus sequence to which this contig was assigned.

Consensus Annotation CSV Files (consensus_annotations.csv)

Column Description
clonotype_id The ID of the clonotype to which this consensus sequence was assigned.
consensus_id The ID of this consensus sequence.
v_start 0-based index of the V region start position on the consensus sequence.
v_end 0-based index of the V region end position on the consensus sequence.
v_end_ref 0-based index of the V gene end position on the reference
j_start 0-based index of the J region start position on the consensus sequence.
j_start_ref 0-based index of the J gene start position on the reference.
j_end 0-based index of the J region end position on the consensus sequence.
cdr3_start 0-based index of the CDR3 region start position on the consensus sequence.
cdr3_end 0-based index of the CDR3 region end position on the consensus sequence.

The remaining columns are shared with those under the Contig Annotation CSV Files section.

AIRR Rearrangements TSV File (airr_rearrangement.tsv)

Column Description
cell_id Cell barcode defining the cell for the query sequence.
clone_id Clonotype ID/clonotype assignment.
rev_comp Set to false by default (10x Genomics VDJ sequences are not reverse complemented).
sequence_id The name of the contig associated with the rearrangement.
sequence The nucleotide sequence of the rearrangement.
sequence_aa The amino acid sequence of the rearrangement.
productive Whether or not the rearrangement is productive.
v_call The name of the aligned V gene for the rearrangement.
v_cigar The CIGAR string of the V gene alignment.
v_sequence_start 1-based index on the contig of the V region start position.
v_sequence_end 1-based index on the contig of the V region end position.
d_call The name of the aligned D gene for the rearrangement.
d_cigar The CIGAR string of the D gene alignment.
d_sequence_start 1-based index on the contig of the D region start position.
d_sequence_end 1-based index on the contig of the D region end position.
j_call The name of the aligned J gene for the rearrangement.
j_cigar The CIGAR string of the J gene alignment.
j_sequence_start 1-based index on the contig of the J region start position.
j_sequence_end 1-based index on the contig of the J region end position.
c_call The name of the aligned C gene for the rearrangement.
c_cigar The CIGAR string of the C gene alignment.
c_sequence_start 1-based index on the contig of the C region start position.
c_sequence_end 1-based index on the contig of the C region end position.
sequence_alignment The aligned sequence of the VDJ rearrangement.
germline_alignment The assembled, aligned, full-length inferred germline sequence of the aligned sequence.
junction The nucleotide sequence of the rearrangement's junction (CDR3).
junction_aa The amino acid sequence of the rearrangement's junction (CDR3).
duplicate_count The number of unique molecular identifiers associated with this rearrangement.
consensus_count The number of reads associated with this rearrangement.
junction_length The length of the rearrangement's junction nucleotide sequence.
junction_aa_length The length of the rearrangement's junction amino acid sequence.
is_cell Is this rearrangement cell-associated?

The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.