Software  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell Immune Profiling

Glossary: Single Cell Immune Profiling

CDR3 (Complementarity-Determining Region 3)

The three complementarity-determining regions are the portions of the amino acid sequence of a T or B cell receptor which are predicted to bind to an antigen. The nucleotide region encoding CDR3 spans the V(D)J junction, making it more diverse than that of the other CDRs. This serves as a useful way to identify unique chains. See here.

Cell Barcode (10x Barcode)

This is a known nucleotide sequence which serves as a unique identifier for a single GEM droplet. Each barcode usually contains reads from a single cell.

Clonotype

Collection of cells that share a set of productive CDR3 sequences by exact nucleotide match.

Consensus

For a single clonotype and chain, a consensus built among all of the cells with that clonotype for that chain. This consensus is built by reassembling the corresponding contigs from all cells with the clonotype.

Contig

Contiguous sequence of bases produced by assembly.

Full-length

A contig is full-length if it matches the initial part of a V gene, continues on, and ultimately matches the terminal part of a J gene.

Productive

See here.

GEM Group

When combining libraries made from different groups of GEMs into one analysis, we append in silico a small integer to the barcode of each read that identifies which library that read came from. This prevents barcode collisions, which otherwise create confusion in the form of virtual doublets.

N50

The N50 of a sorted list of numbers is the midway point by weight. Example:

There are implementation differences for exactly how this is computed but they matter little when the list is long. Unlike the mean and median, the N50 discounts the contribution of many small numbers. That is why people use it!

N-statistic

The N-statistics, such as N50 or N99, are measures of centrality often used in genomics because they are somewhat robust to contamination by large numbers of low-value elements. In particular, the NXX is the value of the smallest element in the subset comprising the fewest, largest members such that the sum of the values of the subset is at least XX% of the total sum of the values of the data set. A larger value of an N-statistic indicates that a larger proportion of the total can be accounted for by large individual values, and for a given data set and YY greater than XX, the value for NYY will be less than or equal to the value for NXX. Thus, the N50 is essentially a weighted median.

UMI (Unique Molecular Identifier)

Each first-strand cDNA synthesis from a transcript molecule incorporates a random 10 bp nucleotide sequence next to the cell barcode called the UMI. The UMI sequence in each read allows the pipeline to determine which reads came from the same transcript molecule. In other words, the cell barcode distinguishes between cells, and the UMI distinguishes between molecules (for example, RNA fragments) within a cell.