10x Genomics
Chromium De Novo Assembly

Supernova2.0, printed on 03/11/2025

Supernova 2.0 performance

The Supernova 2.0 release includes a number of significant changes to the code, with corresponding changes in performance. We compared the performance of versions 1.2 and 2.0 with respect to 20 different datasets, including ones that had been run previously, customer datasets, and novel datasets created just for this purpose. Most of these, along with their assemblies, are available for download.

Synopsis of Supernova's current performance

Supernova has been tested across a broad range of sample types, including vertebrates, plants, and insects; performance on all three categories is markedly improved.
Assembly quality is greatly improved. For example, for human samples, the typical contig length is now 160 kb and the typical scaffold length is now 40 Mb.
We demonstrate end-to-end, practical assembly of single insects, avoiding the problems of standard approaches that inbreed or mix wild individuals.
Streamlined user experience, with many critical metrics added and improved, including accurate measurement of molecule length and genome size.
Assemblies are produced by a turnkey laboratory and computational process based on one library and at markedly low cost. A new simplified workflow requires only standard depth sequencing, and is compatible with the NovaSeq platform.
Supernova runs on a single server, and uses 256 GB memory for most genomes (18 of the 20 tested), and 512 GB for most other genomes. Run time varies from a few hours for small genomes to a few days for human and other similarly sized genomes. The longest observed run time was 8 days, for maize. Several computational bottlenecks have been removed.

Samples

We have tested Supernova on an extensive range of samples, ranging from controls to wild-caught specimens. For each sample, we created a single Chromium Linked-Read library, which we sequenced and then assembled using both Supernova 1.2 and 2.0.0, without any tuning or specification of parameters, except varying the number of input reads in a few cases (see below).

#	Sample	Description	Material	DNA Prep	Notes

1	hgp	Human Genome project, male [1]	blood	MagAttract	control
2	chm	equimolar mix of CHM1/CHM13	cell line	MagAttract	control
3	wfu	NA12878, European, female	cell line	MagAttract	control
4	chi	HG00512, Chinese, male	cell line	MagAttract
5	yor	NA19240, Yoruba, female	cell line	MagAttract
6	yorm	NA19238, Yoruba, female	cell line	MagAttract
7	ash	NA24385, Ashkenazi, male	cell line	MagAttract
8	pr	HG00733, Puerto Rican, female	cell line	MagAttract

9	hummer	hummingbird [2]	tissue	KingFisher
10	fish	zebrafish SAT from ZIRC	tissue	Amplicon Express	control
11	ruby	dog named Ruby	blood	MagAttract

12	grape	flame seedless grape [3]	leaves	grape protocol
13	maize	maize B73	leaves	Amplicon Express	control
14	chili	chili pepper [4]	leaves	modified CTAB

15	fly	fruit fly iso-1 x Canton-S [5]	one insect	salting out	control
16	omoth	one moth collected in Pleasanton	one insect	salting out
17	pmoth	second moth collected in Pleasanton	one insect	salting out
18	cater	caterpillar collected in Pleasanton	one insect	salting out
19	aphid	aphid collected in Pleasanton	one insect	salting out
20	aedes	Aedes aegypti F1 ref cross [6]	one insect	salting out	control


	1. Anonymous donor	4. Allen van Deynze, UCD, Hort Res 5
	2. Erich Jarvis, HHMI, bioRxiv	5. Bloomington Stock Center
	3. Doreen Ware, CSHL	6. Ben Matthews, Rockefeller Institute

The number of reads provided as input was generally our best guess based on the estimated genome size, so as to yield about 56x coverage. In those cases for which we did not know the genome size, a preliminary run with a guess was sufficient to get an estimate from Supernova. In a few cases, it was advantageous to raise the coverage above 56x; the actual number of reads used is available.

Genome and data characteristics

Supernova calculates various metrics on different aspects of the input data and the genome it represents, such as genome size, repetitivity, heterozygosity, and molecule length. For these datasets, they vary widely, as shown below; brief descriptions of the metrics follow the table.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	raw_cov

hgp	3274	8.1	1.42	0.09	139	234	X	55.0
chm	3212	6.5	1.30	0.10	79	139	X	56.0
wfu	3391	8.1	1.38	0.09	95	146	X	53.1
chi	3247	8.1	1.61	0.11	103	125	X	55.4
yor	3156	6.4	1.03	0.10	122	156	X	57.0
yorm	3288	7.4	1.13	0.10	119	132	X	54.7
ash	3124	7.2	1.39	0.11	119	140	X	57.6
pr	3399	8.5	1.44	0.11	103	146	X	53.0

hummer	1102	4.2	0.36	0.06	66	230	2500	61.2
fish	1680	12.6	0.30	0.47	89	93	Nova	54.3
ruby	2407	4.5	0.86	0.22	81	180	2500	54.0

grape	602	20.5	0.21	1.03	74	247	2500	47.0
maize	2219	35.8	7.01	0.03	81	175	X	64.0
chili	3215	6.7	0.29	0.25	45	88	X	61.1

fly	143	8.3	0.23	0.12	68	455	Nova	68.8
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4
pmoth	330	6.0	0.17	0.20	22	40	Nova	72.8
cater	458	13.0	0.16	0.08	20	18	Nova	72.1
aphid	512	15.7	0.41	0.99	30	78	Nova	57.1
aedes	1323	17.6	0.41	0.04	70	69	Nova	62.9

Sample: A nickname for the sample, used in these charts.
gsize (est_genome_size): The estimated genome size, in megabases (Mb).
%rep (repfrac): Repeat Content Index (%): the percent of read kmers having depth ≥ twice the expected depth.
het (hetdist): The estimated mean separation between heterozygous sites, in kilobases (kb).
%hat (high_AT_index): High AT Index (%): the percent of kmers in the reads having ≥ 90% AT content. Locally extreme AT content is correlated with assembly gaps.
mol_len (lw_mean_mol_len): The length-weighted mean molecule length, in kilobases (kb).
p10 (p10): For an average point on the genome, the estimated number of molecules that extend 10 kb in both directions from that point, counting both alleles.
seq (likely_sequencers): Abbreviated name of the sequencing instrument used.
raw_cov (raw_coverage): Total number of bases in all of the sequence reads, before trimming off barcodes, divided by the estimated genome size.

Contigs, phase blocks, and scaffolds are all longer

The following table adds N50 contig, phase block, and scaffold sizes. (In the metrics description table, these metrics are contig_N50, phase_block_N50 and scaffold_N50, respectively.) These all show a notable improvement in almost every case. For example, for the hgp sample, the N50 contig size rose from 120.9 to 162.0 kb, the N50 phase block size rose from 4.30 to 5.83 Mb, and the N50 scaffold size rose from 17.18 to 45.60 Mb, a greater than two-fold improvement.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	raw_cov	1.2 contig	2.0 contig	1.2 phase	2.0 phase	1.2 scaff	2.0 scaff

hgp	3274	8.1	1.42	0.09	139	234	X	55.0	120.9	162.0	4.30	5.83	17.18	45.60
chm	3212	6.5	1.30	0.10	79	139	X	56.0	116.4	175.2	2.65	3.21	14.78	39.53
wfu	3391	8.1	1.38	0.09	95	146	X	53.1	120.1	165.5	2.79	3.15	18.31	39.92
chi	3247	8.1	1.61	0.11	103	125	X	55.4	113.7	156.1	2.60	3.12	15.51	38.17
yor	3156	6.4	1.03	0.10	122	156	X	57.0	119.2	167.4	9.76	14.15	15.23	47.78
yorm	3288	7.4	1.13	0.10	119	132	X	54.7	113.4	159.0	8.68	12.55	19.42	49.47
ash	3124	7.2	1.39	0.11	119	140	X	57.6	106.1	153.8	4.02	5.26	16.71	36.11
pr	3399	8.5	1.44	0.11	103	146	X	53.0	122.3	169.0	3.29	3.96	18.16	46.30

hummer	1102	4.2	0.36	0.06	66	230	2500	61.2	100.5	175.0	11.38	17.48	12.42	31.86
fish	1680	12.6	0.30	0.47	89	93	Nova	54.3	17.1	20.5	0.17	1.70	0.68	4.04
ruby	2407	4.5	0.86	0.22	81	180	2500	54.0	77.5	100.4	2.91	3.69	13.05	36.24

grape	602	20.5	0.21	1.03	74	247	2500	47.0	38.3	55.7	0.48	1.70	0.58	2.29
maize	2219	35.8	7.01	0.03	81	175	X	64.0	20.9	31.0	0.04	0.04	0.27	1.78
chili	3215	6.7	0.29	0.25	45	88	X	61.1	105.7	167.2	1.72	3.91	3.09	13.60

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	113.7	166.5	5.00	13.68	9.12	20.49
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	37.8	63.2	0.23	0.68	0.24	0.69
pmoth	330	6.0	0.17	0.20	22	40	Nova	72.8	63.7	107.8	0.97	2.23	1.71	6.68
cater	458	13.0	0.16	0.08	20	18	Nova	72.1	21.7	32.8	0.07	0.17	0.06	0.06
aphid	512	15.7	0.41	0.99	30	78	Nova	57.1	75.4	104.7	0.98	4.48	1.04	5.00
aedes	1323	17.6	0.41	0.04	70	69	Nova	62.9	20.3	29.7	0.09	0.35	0.07	0.15

Assembly accuracy and organization are also improved

The following table adds assembly accuracy and organization measures. Because the perfect stretch and misassembly estimate rely on a reference sequence, these two metrics are not generally available; we have calculated them here where we can. All three metrics are described below the table.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	raw_cov	1.2 perf	2.0 perf	1.2 mis	2.0 mis	1.2 m10	2.0 m10

hgp	3274	8.1	1.42	0.09	139	234	X	55.0	22.77	26.79	1.13	0.44	2.41	1.89
chm	3212	6.5	1.30	0.10	79	139	X	56.0	.	.	0.32	0.12	2.05	1.57
wfu	3391	8.1	1.38	0.09	95	146	X	53.1	20.04	21.74	1.02	0.69	2.08	1.59
chi	3247	8.1	1.61	0.11	103	125	X	55.4	.	.	0.75	0.42	2.38	1.93
yor	3156	6.4	1.03	0.10	122	156	X	57.0	.	.	0.41	0.26	2.18	1.73
yorm	3288	7.4	1.13	0.10	119	132	X	54.7	.	.	0.38	0.18	2.51	1.99
ash	3124	7.2	1.39	0.11	119	140	X	57.6	.	.	0.62	0.35	2.46	1.90
pr	3399	8.5	1.44	0.11	103	146	X	53.0	.	.	0.46	0.24	2.24	1.84

hummer	1102	4.2	0.36	0.06	66	230	2500	61.2	.	.	.	.	6.02	5.40
fish	1680	12.6	0.30	0.47	89	93	Nova	54.3	.	.	.	.	31.62	25.18
ruby	2407	4.5	0.86	0.22	81	180	2500	54.0	.	.	.	.	2.89	2.11

grape	602	20.5	0.21	1.03	74	247	2500	47.0	.	.	.	.	26.67	15.26
maize	2219	35.8	7.01	0.03	81	175	X	64.0	15.82	30.55	2.14	1.40	26.38	9.85
chili	3215	6.7	0.29	0.25	45	88	X	61.1	.	.	.	.	6.79	4.48

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	29.27	37.10	0.62	0.09	7.06	5.66
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	.	.	.	.	26.39	14.16
pmoth	330	6.0	0.17	0.20	22	40	Nova	72.8	.	.	.	.	6.88	3.29
cater	458	13.0	0.16	0.08	20	18	Nova	72.1	.	.	.	.	36.93	20.14
aphid	512	15.7	0.41	0.99	30	78	Nova	57.1	.	.	.	.	12.28	6.92
aedes	1323	17.6	0.41	0.04	70	69	Nova	62.9	10.30	13.22	3.76	3.32	44.41	22.27

perf: This column provides the N50 perfect stretch, in kb. This can only be computed and is only shown for samples having a reference sequence from the same sample. It measures the N50 size of sequences in the reference that are perfectly mirrored in the assembly. Such ‘perfect stretches’ are terminated either by errors or gaps. In this context, transitioning from one allele to the other is an error. The N50 perfect stretch increased for all samples where it could be measured.
mis: This column shows the percent of the assembly that is misassembled. This can only be computed for assemblies having a reference sequence. As an example of the accounting, if a scaffold connects 5 Mb of one chromosome to 10 Mb of another chromosome, that counts as a 5 Mb error, which gets converted into a fraction by dividing by the assembly size. Errors of order and orientation are also included. The misassembly rate declined for all assemblies for which it could be measured.
m10 (m10): This metric estimates the percent of genomic kmers that are either missing from the assembly entirely or present only in scaffolds shorter than 10 kb. Each kmer counts once regardless of its multiplicity in the genome and thus this measure discounts repeats. It measures assembly disorganization. Assembly disorganization is less in all cases and markedly less in some.

Computational performance of Supernova 2.0

The following table adds computational performance statistics. As shown, all but two assemblies were run on 256 GB servers. Memory use for Supernova has increased about 10% since version 1.2, and run times have increased on average by 60%. However, because of targeted optimizations, the likelihood of the extreme run times experienced by some users of 1.2 should now be much lower.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	rawcov	mem	days

hgp	3274	8.1	1.42	0.09	139	234	X	55.0	256	3.2
chm	3212	6.5	1.30	0.10	79	139	X	56.0	256	2.8
wfu	3391	8.1	1.38	0.09	95	146	X	53.1	256	3.4
chi	3247	8.1	1.61	0.11	103	125	X	55.4	256	3.3
yor	3156	6.4	1.03	0.10	122	156	X	57.0	256	2.9
yorm	3288	7.4	1.13	0.10	119	132	X	54.7	256	3.1
ash	3124	7.2	1.39	0.11	119	140	X	57.6	256	3.2
pr	3399	8.5	1.44	0.11	103	146	X	53.0	256	3.4

hummer	1102	4.2	0.36	0.06	66	230	2500	61.2	256	1.1
fish	1680	12.6	0.30	0.47	89	93	Nova	54.3	256	2.3
ruby	2407	4.5	0.86	0.22	81	180	2500	54.0	256	1.9

grape	602	20.5	0.21	1.03	74	247	2500	47.0	256	0.7
maize	2219	35.8	7.01	0.03	81	175	X	64.0	512	7.9
chili	3215	6.7	0.29	0.25	45	88	X	61.1	512	3.4

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	256	0.1
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	256	0.2
pmoth	330	6.0	0.17	0.20	22	40	Nova	72.8	256	0.3
cater	458	13.0	0.16	0.08	20	18	Nova	72.1	256	0.8
aphid	512	15.7	0.41	0.99	30	78	Nova	57.1	256	0.5
aedes	1323	17.6	0.41	0.04	70	69	Nova	62.9	256	2.4

mem (mem_peak): The amount of memory (RAM) in GB on the server used for the assembly.
days (etime_h): The total number of days elapsed during the assembly.

All assemblies were carried out on 28 core servers at 10x Genomics, having processor “Intel Xeon CPU E5-2697 v3 @ 2.6GHz”.

All assemblies were run twice to confirm exact reproducibility of results.

10x Genomics
Chromium De Novo Assembly

Supernova 2.0 performance

Synopsis of Supernova's current performance

Samples

Genome and data characteristics

Contigs, phase blocks, and scaffolds are all longer

Assembly accuracy and organization are also improved

Computational performance of Supernova 2.0

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium De Novo Assembly

Supernova 2.0 performance

Synopsis of Supernova's current performance

Samples

Genome and data characteristics

Contigs, phase blocks, and scaffolds are all longer

Assembly accuracy and organization are also improved

Computational performance of Supernova 2.0

10x Genomics
Chromium De Novo Assembly