HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Immune Profiling

Annotation Algorithm

Table of Contents

Annotation algorithm overview

workflow

The three goals of V(D)J contig annotation are to 1) define the alignments of V, D, and J segments to a contig, 2) identify CDR3 sequences, and 3) from these data determine if a contig is productive, meaning that it is likely to correspond to a functional T or B cell receptor.

Alignment to the V(D)J reference

Cell Ranger first determines if the data are TCR or BCR. Then it aligns all contigs to the corresponding (TCR or BCR) reference sequences. Occasionally, contigs are aligned to both references. Alignment is seeded on 12-mer perfect matches, followed by heuristic extension. Cell Ranger also searches backwards from C segment alignments for J segment alignments that do not have 12-mer perfect matches, as these will arise occasionally from somatic hypermutation.

The choice of V(D)J reference sequences in an alignment can be arbitrary, depending on how similar the reference sequences are to each other. For D segments, which are short and highly mutated, it may not be possible to find a confident alignment.

Productive Contigs

A contig is termed productive if the following conditions are met:

CDR3

For each contig, Cell Ranger searches for a CDR3 sequence using the conserved sequence that flanks the CDR3 region. Then the CDR3 sequence and its flanking regions are compared to motifs derived from V and J reference segments for human and mouse, as shown below. A letter represents a specific amino acid and a dot represents any amino acid:

left flank   CDR3   right flank
LQPEDSAVYY   C...   LTFG.GTRVTV
VEASQTGTYF          LIWG.GSKLSI
ATSGQASLYL

Cell Ranger requires that a CDR3 sequence have at least 5 amino acids, start with a C, and not contain a stop codon. The flanking sequences for a candidate CDR3 are matched against the above motifs, and scored +1 for each position that matches one of the entries in a column.

For example, LTY.... scores 2 for the first three amino acids in the right flank. L matches an entry in the first column, contributing 1 to the score. T matches an entry in the second column, contributing 1 to the score. Y does not match the third column, and does not contribute to the score.

For a candidate CDR3 to be declared a CDR3 sequence, it must score at least 10. In addition the left flank must contribute at least 3 and the right flank must contribute at least 4.

Next, Cell Ranger finds the implied stop position of the end of the V segment on the contig. The implied stop is the start position of the V segment on the contig plus the length of the V segment. The CDR3 sequence is required to start at most 10 bases before the stop, and at most 20 bases after the stop of the V. These conditions for finding an implied stop are not applied in the denovo case.

If there is more than one CDR3 sequence, Cell Ranger chooses the one with the highest score. If there is a tie, the one with the later start position on the contig is chosen. If a tie remains, the longer CDR3 is chosen.

iNKT and MAIT cell annotation

Cell Ranger 5.0+ annotates T cells as likely iNKT or MAIT cells based on the TCR V genes, J genes, and CDR3 sequences of the TCR alpha and TCR beta. For more information about iNKT and MAIT cells and how the annotation is performed, please see the iNKT and MAIT cell Algorithms documentation.

Next steps