Cell Ranger6.1, printed on 11/11/2024
Cell Ranger pipelines output key metrics in text format. Below are the definitions of the reported metrics.
The cellranger count and multi pipelines output metrics_summary.csv
which contains a number of key metrics about the barcoding and sequencing process.
Metric | Description |
---|---|
Estimated Number of Cells | The number of barcodes associated with cell-containing partitions, estimated from the barcode UMI count distribution. |
Median reads per cell | Median number of read pairs sequenced from the cells assigned to this sample. In case of multiplexing, only cell-associated barcodes assigned exactly one CMO can be assigned to a sample. |
Mean Reads per Cell | The total number of reads divided by the estimated number of cells. |
Median Genes per Cell | The median number of genes detected (with nonzero UMI counts) across all cell-associated barcodes. |
Number of Reads | Total number of sequenced reads. |
Number of Reads Assigned to the Sample | Total number of sequenced reads assigned to a given sample. |
Valid Barcodes | Fraction of reads with cell-barcodes that match the whitelist. |
Valid UMIs | Fraction of reads with valid UMIs; i.e. UMI sequences that do not contain Ns and that are not homopolymers. |
Reads Mapped to Genome | Fraction of reads that mapped to the genome. |
Reads Confidently Mapped to Genome | Fraction of reads that mapped uniquely to the genome. If a gene mapped to exonic loci from a single gene and also to non-exonic loci, it is considered uniquely mapped to one of the exonic loci. |
Reads Mapped Confidently to Transcriptome | Fraction of reads that mapped to a unique gene in the transcriptome with a high mapping quality score as reported by the aligner. |
Reads Mapped Confidently to Exonic Regions | Fraction of reads that mapped to the exonic regions of the genome with a high mapping quality score as reported by the aligner. |
Reads Mapped Confidently to Intronic Regions | Fraction of reads that mapped to the intronic regions of the genome with a high mapping quality score as reported by the aligner. |
Reads Mapped Confidently to Intergenic Regions | Fraction of reads that mapped to the intergenic regions of the genome with a high mapping quality score as reported by the aligner. |
Reads Confidently Mapped Antisense | Fraction of reads confidently mapped to the transcriptome, but on the opposite strand of their annotated gene. A read is counted as antisense if it has any alignments that are consistent with an exon of a transcript but antisense to it, and has no sense alignments. |
Sequencing Saturation | Fraction of reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is a ratio where: the denominator is the number of confidently-mapped reads with a valid cell-barcode and valid UMI, and the numerator is the subset of those reads that had a non-unique combination of (cell-barcode, UMI, gene). This metric was called "cDNA PCR Duplication" in versions of Cell Ranger prior to 1.2. |
Q30 Bases in Barcode | Fraction of bases with Q-score at least 30 in the cell barcode sequences. This is the i7 index (I1) read for the Single Cell 3' v1 chemistry and the R1 read for the Single Cell 3' v2 chemistry. |
Q30 Bases in RNA Read | Fraction of bases with Q-score at least 30 in the RNA read sequences. This is Illumina R1 for the Single Cell 3' v1 chemistry and Illumina R2 for the Single Cell 3' v2 chemistry. |
Q30 Bases in Sample Index | Fraction of bases with Q-score at least 30 in the sample index sequences. This is the i5 index (I2) read for the Single Cell 3' v1 chemistry and the i7 index (I1) read for the Single Cell 3' v2 chemistry. |
Q30 Bases in UMI | Fraction of bases with Q-score at least 30 in the UMI sequences. This is the R2 read for the Single Cell 3' v1 chemistry and the R1 read for the Single Cell 3' v2 chemistry. |
Fraction Reads in Cells | The fraction of cell-barcoded, confidently mapped reads with cell-associated barcodes. |
Total Genes Detected | The number of genes with at least one UMI count in any cell. |
Median UMI Counts per Cell | The median number of total UMI counts across all cell-associated barcodes. |
Number of Short Reads Skipped | Total number of read pairs that were ignored by the pipeline because they do not satisfy the minimum length requirements (for example Read-1 less that 26 bases in 3' v2 or 3' v3 or 5'). |
If the sample was a Targeted Gene Expression sample, these additional metrics will also appear:
Metric | Description |
---|---|
Mean Targeted Reads per Cell | The total number of targeted reads in cells divided by the number of barcodes associated with cell-containing partitions. |
Number of Targeted Genes | Number of targeted genes as specified in the input target panel file. |
Median Targeted Genes per Cell | The median number of targeted genes detected per cell-associated barcode. Detection is defined as the presence of at least 1 UMI count. This metric will appear instead of the metric Median Genes per Cell above. |
Median Targeted UMI Counts per Cell | The median number of Targeted UMI counts per cell-associated barcode. This metric will appear instead of the metric Median UMI Counts per Cell above. |
Targeted Sequencing Saturation | The fraction of targeted reads originating from an already-observed targeted UMI. This is a function of library complexity and sequencing depth. More specifically, this is the fraction of confidently mapped, valid cell-barcode, valid targeted UMI reads that had a non-unique (cell-barcode, UMI, gene). This metric will appear instead of the metric Sequencing Saturation above. |
Total Targeted Genes Detected | The number of targeted genes with at least one UMI count in any cell. This metric will appear instead of the metric Total Genes Detected above. |
Number of Targeted Genes | Number of targeted genes as specified in the input target panel file. |
Number of Non-Targeted Genes | Number of genes in the reference genome that are not targeted. |
Number of Targeted Genes >= 10 UMIs | Number of targeted genes with at least 10 UMIs in cell-associated barcodes. These genes are then considered in the calculation of per-gene enrichments. |
Number of Non-Targeted Genes >= 10 UMIs | Number of non-targeted genes with at least 10 UMIs in cell-associated barcodes. These genes are then considered in the calculation of per-gene enrichments. |
Number of Enriched Targeted Genes | Number of targeted genes that are classified as enriched as a result of having high mean reads per UMI. Only genes with at least 10 cell-associated UMIs can be enriched. See Targeted Gene Expression Algorithms. |
Number of Enriched Non-Targeted Genes | Number of non-targeted genes that are classified as enriched as a result of having high mean reads per UMI. Only genes with at least 10 cell-associated UMIs can be enriched. See Targeted Gene Expression Algorithms.. |
Mean Reads per UMI per Targeted Gene | Mean number of reads per UMI per targeted gene, averaged across all targeted genes with at least 10 cell-associated UMIs. |
Mean Reads per UMI per Non-Targeted Gene | Mean number of reads per UMI per non-targeted gene, averaged across all targeted genes with at least 10 cell-associated UMIs. |
Fraction of Reads Removed by UMI Filtering | Fraction of reads confidently mapped to the targeted transcriptome and removed by targeted UMI filtering. |
Reads per UMI threshold for UMI filtering | Reads per UMI threshold for UMI filtering. UMIs in targeted genes where the read support is (strictly) lower than this threshold are filtered out. |
The cellranger aggr pipeline outputs summary.json
which contains metrics relating to the aggregated datasets. Note: square brackets denote a variable that depends on the pipeline input,
e.g. [library_id]_frac_reads_kept
means that if your aggregation contains two libraries with IDs sample123
and sample456
, there will be two output metrics sample123_frac_reads_kept
and sample456_frac_reads_kept
.
Metric | Description |
---|---|
filtered_bcs_transcriptome_union | The estimated number of barcodes associated with cell-containing partitions, summed across all input libraries. |
total_reads | Total number of sequenced reads, summed across all input libraries. |
multi_transcriptome_total_raw_reads_per_filtered_bc | total_reads divided by filtered_bcs_transcriptome_union . |
[library_id]_pre_normalization_raw_reads_per_filtered_bc | The mean total reads per cell prior to depth normalization, for the library denoted by library_id . |
[library_id]_pre_normalization_cmb_reads_per_filtered_bc | The mean confidently mapped and barcoded (CMB) reads per cell prior to depth normalization, for the library denoted by library_id . |
[library_id]_frac_reads_kept | The fraction of reads that were retained after depth normalization for the library denoted by library_id . |
lowest_frac_reads_kept | The lowest fraction of reads retained, corresponding to the library which lost the most reads during normalization. A low value may indicate a large disparity in the initial depth of the input libraries. |
If one or more of the aggregated samples was a Targeted Gene Expression sample, these additional metrics will also appear:
Metric | Description |
---|---|
[library_id]_pre_normalization_targeted_reads_per_filtered_bc | The mean targeted reads per cell prior to depth normalization, for the library denoted by library_id . |
[library_id]_frac_targeted_reads_kept | The fraction of reads mapped uniquely and confidently to targeted genes that were retained after depth normalization for the library denoted by library_id . This field will be shown instead of the metric [library_id]_frac_reads_kept above. |