Cell Ranger2.2, printed on 11/13/2024
The three complementarity-determining regions are the portions of the amino acid sequence of a T or B cell which are predicted to bind to an antigen. The nucleotide region encoding CDR3 spans the V(D)J junction, making it more diverse than that of the other CDRs. This serves as a useful way to identify unique chains.
This is a known nucleotide sequence which serves as a unique identifier for a single GEM droplet. Each barcode usually contains reads from a single cell.
Collection of cells that share a set of productive CDR3 sequences by exact match.
For a single clonotype and chain, a consensus built among all of the cells with that clonotype for that chain. This consensus is built by reassembling the corresponding contigs from all cells with the clonotype.
Contiguous sequence of bases produced by de novo assembly.
A contig annotation is termed full-length if it has a valid V annotation (the contig aligns wells with at least 50% of the length of any V gene in the reference) and has a J gene annotation that spans until the 3′ end of the J region within one codon
When combining libraries made from different groups of GEMs into one analysis, we append in silico a small integer to the barcode of each read that identifies which library that read came from. This prevents barcode collisions, which otherwise create confusion in the form of virtual doublets.
The N-statistics, such as N50 or N99, are measures of centrality often used in genomics because they are somewhat robust to contamination by large numbers of low-value elements. In particular, the NXX is the value of the smallest element in the subset comprising the fewest, largest members such that the sum of the values of the subset is at least XX% of the total sum of the values of the data set. A larger value of an N-statistic indicates that a larger proportion of the total can be accounted for by large individual values, and for a given data set and YY greater than XX, the value for NYY will be less than or equal to the value for NXX. Thus, the N50 is essentially a weighted median.
Each first-strand cDNA synthesis from a transcript molecule incorporates a random 10 bp nucleotide sequence called the UMI. The UMI sequence in each read allows the pipeline to determine which reads came from the same transcript molecule. In other words, the cell barcode distinguishes between cells, and the UMI distinguishes between molecules (for example, RNA fragments) within a cell.