Cell Ranger ARC2.0, printed on 10/13/2024
Entries are ordered alphabetically. These definitions may be specific to the usage of these terms in the context of Chromium Next GEM Single Cell Multiome ATAC + Gene Expression.
GEM: Gel Bead-in-emulsion. A droplet containing some sample volume and a 10x Genomics-barcoded Gel Bead, forming an isolated reaction volume. When referring to the subset of the sample contained in the droplet, the term 'partition' may also be used.
GEM well (or GEM group): A set of partitioned cells (Gel Bead-in-emulsion) from a single 10x Genomics Chromium Chip channel. One or more sequencing libraries can be derived from a single GEM well.
Library (or Sequencing Library): A 10x Genomics-barcoded sequencing library prepared from a single GEM well. Single Cell Multiome ATAC + Gene Expression assay generates two single cell library types, ATAC (assay for transposase chromatin accessibility) library and 3' Gene Expression library, from the same GEM well.
Sample: A nuclei suspension extracted from a single biological source (blood, tissue, etc).
Sequencing Run (or Flow cell): A flow cell containing data from one sequencing instrument run. The sequencing data can be further demultiplexed by lane or by sample indices.
Barcode: A DNA sequence that can be used to uniquely identify a partition containing a Gel Bead, i.e., a GEM. We use the term "barcode" to refer to both the sequence as well as the GEM itself. Distinct GEMs are very likely to be associated with distinct barcode sequences.
Cell Barcode: Any barcodes that have been determined by the Cell Calling step of the pipeline to be associated with cells.
Feature: Can refer to a gene or a peak. Each feature is either a gene declared in the transcriptome reference or a peak determined either by the Peak Calling step of the pipeline or by a custom bed file input by the user. Corresponds to a row in the Count Matrix.
Feature Linkage: Refers to the extent of covariation between two features across cells in the sample and is characterized by a correlation and a significance. Feature linkage is calculated between a gene and proximal peaks and between pairs of proximal peaks across the genome. Note that feature linkage is a correlation measure and does not automatically imply a functional regulatory relationship between the linked features. See Feature linkage for more details on the algorithm.
Count Matrix (or Feature-Barcode Matrix): A matrix of counts representing the number of unique observations of each Feature within each Cell Barcode. Each feature (gene or peak) is a row in the matrix, while each barcode is a column. For a gene feature the count represents the number of UMIs observed while for a peak the count represents the number of transposition sites within the peak region.
Adapters: Custom short nucleotide sequence which includes sequencing primers that is introduced upon transposition.
ATAC: Assay for Transposase Accessible Chromatin.
Chromatin: Macromolecular complex formed by DNA, nucleosomes, and other proteins that bind DNA (for example transcription factors).
Cut-site: Genome location where transposase enzyme cuts the DNA and inserts adapters.
Duplicates: Two read pairs that originate from the same template molecule are called duplicates. Duplicates arise during the library preparation workflow when template molecules are amplified via PCR or linear amplification. Additionally, duplicate reads could also arise during the sequencing process and are generally referred to as optical duplicates. Duplicate reads provide redundant information and are identified computationally and collapsed into a single fragment record for downstream analysis.
Enhancer: An enhancer is a short (50–1,500 bp) region of DNA that can be bound by transcription factors to increase the likelihood that transcription of a particular gene will occur.
Fragment: A piece of genomic DNA, bounded by two adjacent cut sites, that has been converted into a sequencer-compatible molecule with an attached 10x Barcode. The alignment interval of the fragment is obtained by correcting the alignment interval of the sequenced fragment by +4 bp on the left end of the fragment, and -5 bp on the right end (where left and right are relative to genomic coordinates). This is to account for the 9 bp of DNA that the tranposase enzyme occupies when it cuts the DNA (accessibility is recorded around the center of this 9 bp stretch; see figure in Algorithms). Most fragment-based metrics computed by the pipeline are based on fragments that pass various quality filters.
High-Quality Fragment: A read-pair with mapping quality > 30, that is not chimerically mapped, has a valid 10x Barcode, and maps to any nuclear contig (not mitochondria) that contains at least one gene.
Nucleosome: Structural units formed by histones that help package the eukaryotic DNA into well organized chromosomes.
Peak: A compact region of the genome identified as having 'open chromatin' due to an enrichment of cut-sites inside the region (see Peak Calling for details).
Promoter: A promoter is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA.
Transcription Factor (TF): A protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequences (like promoter or enhancers) that are commonly located in the vicinity of the gene they control.
TSS (or Transcription Start Site): The transcription start site is the location where transcription starts at the 5' end of a gene sequence.
Spacer: An 8 bp sequence on the Gel Bead ATAC Barcode oligo that enables barcode attachment to transposed DNA fragments.
Transposase Enzyme: Cuts open chromatin and ligates adapters to the 3' end of each strand.
Transposition: Reaction carried out by the transposase enzyme.
Wavelet transform: A method to transform a one-dimensional signal (which can be thought of a time series) into a sum of linearly independent basis functions (wavelets) that are localized in time. It can be thought of as a generalization of the familiar Fourier transform that decomposes a time-varying signal into single-frequency sinusoidal waveforms.
UMI (or Unique Molecular Identifier): Each first-strand cDNA synthesis from a transcript molecule incorporates a random 12 bp nucleotide sequence adjacent to the cell barcode called the UMI. The UMI sequence in each read allows the pipeline to determine which reads came from the same transcript molecule. In other words, the cell barcode distinguishes between cells, and the UMI distinguishes between molecules within a cell.
Transcriptomic Read: A read that uniquely maps to one gene and is considered during the UMI counting step. In the default counting mode, a transcriptomic read is any read that maps to the gene body in the sense orientation (intronic or exonic). When cellranger-arc count is run with the flag --gex-exclude-introns a transcriptomic read is one that maps to the exons of a gene in the sense orientation and is compatible with annotated splice junctions.
Confidently Mapped to the Transcriptome: When a transcriptomic read has mapping quality 255 and uniquely maps to only one gene in the transcriptome it is said to be confidently mapped.
TSO (or Template Switch Oligo): An oligonucleotide primer that hybridizes to untemplated C nucleotides added by the reverse transcriptase enzyme during reverse transcription in GEMs. The TSO sequence can be observed in a small fraction of reads and generally arises from shorter cDNA molecules that are not fragmented.