10x Genomics
Chromium Single Cell CNV

Cell Ranger DNA1.0, printed on 04/02/2025

Per cell summary metrics

Each row in the per_cell_summary_metrics.csv file corresponds to a cell and each column a corresponding metric. The columns are as follows:

barcode

16-base 10x barcode labeling the partition containing the cell followed by the GEM group -1.

cell_id

Numerical index for cells in the experiment ranging from 0 to N-1, where N is the number of cells.

total_num_reads

Total number of sequencing reads associated with the cell.

num_unmapped_reads

Total number of sequencing reads associated with the cell that cannot be mapped to the reference genome.

num_lowmapq_reads

Total number of sequencing reads that map to the genome with mapping quality less than 30.

num_duplicate_reads

Total number of sequencing reads with mapping quality at least 30 that are duplicates.

num_mapped_dedup_reads

Total number of sequencing reads with mapping quality at least 30 and are not duplicates. These reads are used for CNV calling.

frac_mapped_duplicates

num_duplicate_reads divided by total_num_reads.

effective_depth_of_coverage

Fraction of the genome covered by non-duplicate reads with mapping quality at least 30. Equals num_mapped_dedup_reads multiplied by the average read length divided by the genome size.

effective_reads_per_1Mbp

num_mapped_dedup_reads divided by the genome size in megabases.

raw_mapd

MAPD of the number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. MAPD is a measure of unevenness of the coverage per bin distribution that is robust to the presence of copy number events. This includes unevenness caused by low sequencing depth. See the interpreting metrics page for more information about MAPD.

normalized_mapd

MAPD of the GC-corrected number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. See the interpreting metrics page for more information about MAPD.

raw_dimapd

DIMAPD of the number of non-duplicate read-pairs per 500 kb bin. DIMAPD is a measure of residual unevenness of the coverage per bin distribution after subtracting out unevenness caused by random fluctuations due to finite sequencing depth effects. See the interpreting metrics page for more information about DIMAPD.

normalized_dimapd

DIMAPD of the GC-corrected number of non-duplicate read-pairs per 500 kb bin. See the interpreting metrics page for more information about DIMAPD.

mean_ploidy

average ploidy or average copy number of the cell, approximately 2 for a diploid genome.

ploidy_confidence

a score measuring the overall confidence of the copy number estimation algorithm. The copy number is determined by minimizing an objective function as described in the CNV calling section. The ploidy confidence is calculated as the difference in objective function values between the next to lowest minimum and the absolute minimum. Scores greater than 2 are considered confident. Negative values are special signal values.

-1 : CNV calling was performed with the options --soft-min-avg-ploidy or --soft-max-avg-ploidy and the confidence score is not estimated in these cases.
-2 : copy number was determined by picking the solution with average ploidy closest to 2. This is the case when most of the genome has the same copy number.
-3 : only a single minimum to the objective function was found and therefore the score cannot be calculated.
-4 : the ploidy confidence score of this cell was <= 2 and the average ploidy of the best-fit solution was significantly different from other cells in the sample with highly similar read count profiles. This occurs when the cell is degraded or the DNA was inaccessible due to other reasons. In this case, we override the solution chosen by minimizing the objective function and instead pick a solution that is closer in average ploidy to other similar cells.

is_high_dimapd

is 1 when the cell is has a DIMAPD value that is an outlier relative to the other cells in the sample, and 0 otherwise. We fit a Gaussian distribution to the DIMAPD per cell distribution and define outliers as cells whose DIMAPD deviates from the Gaussian with a significance threshold of 0.01.

is_noisy

is 1 if a cell is noisy and 0 otherwise. A cell is deemed noisy if is_high_dimapd is 1 or if ploidy_confidence is -4 or if the ploidy_confidence is between 0 and 2.

Analysis summary metrics

The summary.csv file contains sample metrics that are aggregated over all the reads or cells in CSV format.

total_num_reads: total number of sequencing reads.
frac_bases_R1_Q30: fraction of read 1 bases with base quality at least 30.
frac_bases_R2_Q30: fraction of read 2 bases with base quality at least 30.
correct_bc_rate: fraction of total sequencing reads that can be associated with a valid 10x barcode.
frac_non_cell_barcode: fraction of total sequencing reads that are associated with barcodes that do not correspond to cells, i.e., they label empty partitions.
shortest_primary_contig: the shortest primary contig in the reference genome on which CNV calling was performed.
frac_mappable_bins: the fraction of 20 kb bins in the reference genome that have high mappability. See the preprocessing section for more details.
num_cells: the number of barcodes that label partitions containing cells. See the preprocessing section for more details.
total_num_reads_in_cells: total number of sequencing reads with barcodes associated to cells.
total_num_mapped_dedup_reads_in_cells: total number of sequencing reads associated with cells that are not duplicates and have mapping quality at least 30.
median_frac_mapped_duplicates_per_cell: median over cells of the fraction of total sequencing reads per cell that have mapping quality at least 30 and are duplicates.
mean_mapped_dedup_reads_per_cell: mean over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30.
median_effective_reads_per_1Mbp: median over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30 divided by the genome size in megabases.
median_unmapped_frac: median over the cells of the fraction of total reads per cell that cannot be mapped to the genome.
mean_ploidy_p25, mean_ploidy_p50, mean_ploidy_p75: quartiles of the average ploidy per cell distribution.
raw_mapd_p25, raw_mapd_p50, raw_mapd_p75: quartiles of the MAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.
normalized_mapd_p25, normalized_mapd_p50, normalized_mapd_p75: quartiles of the MAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.
normalized_dimapd_p25, normalized_dimapd_p50, normalized_dimapd_p75: quartiles of the DIMAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.
raw_dimapd_p25, raw_dimapd_p50, raw_dimapd_p75: quartiles of the DIMAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.
frac_noisy_cells: fraction of cells in the sample that are considered noisy as described in the previous section. See the interpreting data page for more information.

10x Genomics
Chromium Single Cell CNV

Per cell summary metrics

Analysis summary metrics

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium Single Cell CNV

Per cell summary metrics

Analysis summary metrics

10x Genomics
Chromium Single Cell CNV