Cell Ranger5.0, printed on 09/20/2024
The three complementarity-determining regions are the portions of the amino acid sequence of a T or B cell receptor which are predicted to bind to an antigen. The nucleotide region encoding CDR3 spans the V(D)J junction, making it more diverse than that of the other CDRs. This serves as a useful way to identify unique chains. See here.
This is a known nucleotide sequence which serves as a unique identifier for a single GEM droplet. Each barcode usually contains reads from a single cell.
A set of adaptive immune cells that are clonal progeny of a fully recombined, unmutated common ancestor. T cell clonotypes are generally distinguished by the nucleotide sequence of the rearranged TCR, which does not undergo somatic hypermutation (SHM) in the majority of vertebrate species. B cell clonotypes are commonly divergent from each other at the nucleotide level, as described above. For this reason, B cell clonotypes also frequently contain multiple exact subclonotypes (see below).
A subset of cells within a clonotype that share identical immune receptor sequences at the nucleotide level, spanning the entirety of the V, D, and J genes and the V(D)J junction. Exact subclonotypes share the same V, D, J, and C gene annotations (e.g. cells that have identical V(D)J sequences but different C genes or isotypes are split into distinct exact subclonotypes).
The consensus sequence for a given clonotype chain is the sequence of that chain in the first exact subclonotype.
Contiguous sequence of bases produced by assembly.
A contig is full-length if it matches the initial part of a V gene, continues on, and ultimately matches the terminal part of a J gene.
See here.
The set of inferred germline sequences of one or more V, D, J, or C gene based on common mutations shared between single T and B cells from a single donor.
A set of partitioned cells (Gelbeads-in-Emulsion) from a single 10x Chromiumâ„¢ Chip channel. One or more sequencing libraries can be derived from a GEM well.
When combining libraries made from different GEM wells into one analysis, we append in silico an integer to the barcode of each read that identifies which library that read came from. This prevents barcode collisions, which otherwise create confusion in the form of virtual doublets.
The N50 of a sorted list of numbers is the midway point by weight. Example:
There are implementation differences for exactly how this is computed but they matter little when the list is long. Unlike the mean and median, the N50 discounts the contribution of many small numbers. That is why people use it!
The N-statistics, such as N50 or N99, are measures of centrality often used in genomics because they are somewhat robust to contamination by large numbers of low-value elements. In particular, the NXX is the value of the smallest element in the subset comprising the fewest, largest members such that the sum of the values of the subset is at least XX% of the total sum of the values of the data set. A larger value of an N-statistic indicates that a larger proportion of the total can be accounted for by large individual values, and for a given data set and YY greater than XX, the value for NYY will be less than or equal to the value for NXX. Thus, the N50 is essentially a weighted median.
Each first-strand cDNA synthesis from a transcript molecule incorporates a random 10 bp nucleotide sequence next to the cell barcode called the UMI. The UMI sequence in each read allows the pipeline to determine which reads came from the same transcript molecule. In other words, the cell barcode distinguishes between cells, and the UMI distinguishes between molecules (for example, RNA fragments) within a cell.
An individual from whom adaptive immune cells (T cells, B cells) are collected (e.g. a sister and a brother would each be considered unique donors for the purposes of V(D)J aggregation).
The specific source from which a dataset of cells is derived. This could be a timepoint (pre- or post-treatment or vaccination or time A/B/C), a tissue (PBMC, tumor, lung), or other metadata (healthy, diseased, condition). Origins must be unique to each donor. Replicates (e.g. multiple libraries from the same population of cells) may share origins within a donor, which triggers additional replicate-based filtering.
A set of Cell Ranger outputs belonging to a set of single cells originating from the same GEM well.