10x Genomics
Chromium De Novo Assembly

Supernova1.2, printed on 04/05/2025

Assembly Statistics

On the successful completion of a Supernova pipeline a number of useful statistics about the input data and the assembly are logged in outs/summary.csv. We define below the various statistics contained there.

Input statistics

abbreviation	name	definition
`sample_id`	sample identifier	identifier of the sample
`nreads`	number of reads	number of reads provided as input, after downsampling if requested
`bases_per_read`	mean read length	mean read length after removing the first 23 bases from the beginning of read one of each pair (the 16-base 10x barcode plus 7 additional bases)
`dup_perc`	read duplication percent	percentage of read pairs that are duplicated, as determined by identical start and stop positions on the assembly graph
`hetdist`	distance between het sites	mean distance between heterozygous sites
`lw_mean_mol_len`	LWM molecule length	estimated length-weighted mean of molecule lengths
`median_ins_sz`	median insert size	estimated size of median inserts in library, as determined by read positions on the assembly graph
`placed_frac`	fraction of reads placed	fraction of reads uniquely placed on final (phased) assembly
`proper_pairs_perc`	proper pairs percent	of read pairs for which both reads are placed on the assembly, inferred fraction for which the reads have the correct orientation and separation
`q30_r2_perc`	read two q30 percent	fraction of bases assigned quality score ≥ 30 on read two
`rpb_N50`	N50 reads per barcode	N50 number of reads per 10x barcode
`valid_bc_perc`	valid barcode percent	percent of reads assigned a valid 10x barcode

Output statistics

Note that assembly size and N50 values are computed after removing scaffolds ≤ 10 kb and do not count N's.

abbreviation	name	definition
`assembly_size`	assembly size	size of assembly in bases, counting only one allele
`edge_N50`	N50 edge size	N50 size of raw graph assembly edges in bases
`contig_N50`	N50 contig size	N50 size of contigs in bases
`phase_block_N50`	N50 phase block size	N50 size of phase blocks in bases
`scaffold_N50`	N50 scaffold size	N50 size of scaffolds in bases
`scaffolds_1kb_plus`	number of scaffolds ≥ 1 kb	number of scaffolds that are at least 1 kb long
`scaffolds_10kb_plus`	number of scaffolds ≥ 10 kb	number of scaffolds that are at least 10 kb long

Auxiliary statistics

In addition to the metrics contained in the outs/summary.csv file, the outs/assembly/stats/ folder contains more fine-grained information about the input data and the assembly as discussed below.

File	Content
`histogram_reads_per_barcode.json`	histogram of the number of reads that share a common 10x barcode (bin size = 10)
`histogram_kmer_count.json`	histogram of the frequency of kmers (K=48) amongst the reads, after removing potentially erroneous kmers based on quality scores, low multiplicity, or occurrence in only one barcode (histogram uses bin size = 1)
`kmer_spectrum.pdf`	plot of the histogram in histogram_kmer_count.json, truncated to kmer frequencies in the range 0 - 100.
`histogram_molecules.json`	histogram of the inferred length of input DNA that was used to generate Linked-Reads (in 1 kb bins, with minimum molecule length threshold of 1 kb)
`molecule_lengths.pdf`	plot of the percentage of input DNA mass in 1 kb molecule length bins in the window of 1 - 300 kb. This plot is a length-weighted version of the histogram in the above file that has been smoothed using the LOWESS algorithm.
`histogram_edge.json`	histogram of assembly graph edge lengths (in 1 kb bins)
`histogram_contig.json`	histogram of contig lengths (in 1 kb bins)
`histogram_phase_block.json`	histogram of phase block lengths in the assembly (in 1 kb bins)
`histogram_scaffold.json`	histogram of scaffold lengths (in 10 kb bins)

10x Genomics
Chromium De Novo Assembly

Assembly Statistics

Input statistics

Output statistics

Auxiliary statistics

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium De Novo Assembly

Assembly Statistics

Input statistics

Output statistics

Auxiliary statistics

10x Genomics
Chromium De Novo Assembly