10x Genomics
Chromium De Novo Assembly

Supernova2.1, printed on 04/20/2025

Supernova version 2.x performance

Analysis software for 10x Genomics linked read products is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support.

The Supernova 2.x releases include a number of significant changes to the code, with corresponding changes in performance. We compared the performance of versions 1.2 and 2.1 with respect to 20 different datasets, including ones that had been run previously, customer datasets, and novel datasets created just for this purpose. Most of these, along with their assemblies, are available for download.

Synopsis of Supernova's current performance

Supernova has been tested across a broad range of sample types, including vertebrates, plants, and insects; performance on all three categories is markedly improved.
Assembly quality is greatly improved. For example, for human samples, the typical contig length is now 160 kb and the typical scaffold length is now 40 Mb.
We demonstrate end-to-end, practical assembly of single insects, avoiding the problems of standard approaches that inbreed or mix wild individuals.
Streamlined user experience, with many critical metrics added and improved, including accurate measurement of molecule length and genome size.
Assemblies are produced by a turnkey laboratory and computational process based on one library and at markedly low cost. A new simplified workflow requires only standard depth sequencing, and is compatible with the NovaSeq platform.
Supernova runs on a single server, uses 256 GB memory for 18 of the 20 genomes tested, and 512 GB for the others. Run time varies from a few hours for small genomes to a few days for human and other similarly sized genomes. The longest observed run time was 8 days, for maize. Several computational bottlenecks have been removed, however genomes larger than human are expected to have longer run times and memory usage, especially if repeat content is high. Any genome above approximately 4 GB should be considered experimental and is not supported.

Before beginning any experiment you should consult our current guidance for increasing the likelihood of obtaining a successful Supernova assembly. If you have performed an assembly using an older version of Supernova, please update to the most recent version of the software.

If you have followed our guidance and still encounter performance issues with Supernova, please follow the instructions in How To Get Help.

Even if our guidance is closely followed, genome properties, sample quality, and data quality vary widely, so you may encounter substantially greater variability in run time and assembly metrics than we encountered in our test datasets described here.

Samples

We tested Supernova on an extensive range of samples, ranging from controls to wild-caught specimens. For each sample, we created a single Chromium Linked-Read library, which we sequenced and then assembled using both Supernova 1.2 and 2.1.0, without any tuning or specification of parameters, except varying the number of input reads in a few cases (see below).

#	Sample	Description	Material	DNA Prep	Notes

1	hgp	Human Genome project, male [1]	blood	MagAttract	control
2	chm	equimolar mix of CHM1/CHM13	cell line	MagAttract	control
3	wfu	NA12878, European, female	cell line	MagAttract	control
4	chi	HG00512, Chinese, male	cell line	MagAttract
5	yor	NA19240, Yoruba, female	cell line	MagAttract
6	yorm	NA19238, Yoruba, female	cell line	MagAttract
7	ash	NA24385, Ashkenazi, male	cell line	MagAttract
8	pr	HG00733, Puerto Rican, female	cell line	MagAttract

9	hummer	hummingbird [2]	tissue	KingFisher
10	fish	zebrafish SAT from ZIRC	tissue	Amplicon Express	control
11	ruby	dog named Ruby	blood	MagAttract

12	grape	flame seedless grape [3]	leaves	grape protocol
13	maize	maize B73	leaves	Amplicon Express	control
14	chili	chili pepper [4]	leaves	modified CTAB

15	fly	fruit fly iso-1 x Canton-S [5]	one insect	salting out	control
16	omoth	one moth collected in Pleasanton	one insect	salting out
17	pmoth	second moth collected in Pleasanton	one insect	salting out
18	cater	caterpillar collected in Pleasanton	one insect	salting out
19	aphid	aphid collected in Pleasanton	one insect	salting out
20	aedes	Aedes aegypti F1 ref cross [6]	one insect	salting out	control


	1. Anonymous donor	4. Allen van Deynze, UCD, Hort Res 5
	2. Erich Jarvis, HHMI, bioRxiv	5. Bloomington Stock Center
	3. Doreen Ware, CSHL	6. Ben Matthews, Rockefeller Institute

The number of reads provided as input was generally our best guess based on the
estimated genome size, so as to yield about 56x coverage. In those cases for
which we did not know the genome size, a preliminary run with a guess was
sufficient to get an estimate from Supernova. In a few cases, it was
advantageous to raise the coverage above 56x; the actual number of reads used
is available.

Genome and data characteristics

Supernova calculates various metrics on different aspects of the input data and the genome it represents, such as genome size, repetitivity, heterozygosity, and molecule length. For these datasets, they vary widely, as shown below; brief descriptions of the metrics follow the table.

Sample	gsize	%rep	het	%gc	%hat	%di	mol_len	p10	seq	rawcov

hgp	3274	8.1	1.40	40.9	0.09	0.21	137	233	X	55.0
chm	3212	6.5	1.30	40.9	0.10	0.16	82	139	X	56.0
wfu	3390	8.1	1.39	40.9	0.09	0.18	96	143	X	53.1
chi	3247	8.1	1.59	40.9	0.11	0.16	104	125	X	55.4
yor	3156	6.4	1.04	40.9	0.10	0.16	123	156	X	57.0
yorm	3288	7.4	1.08	40.9	0.10	0.17	118	132	X	54.7
ash	3124	7.2	1.39	40.9	0.11	0.19	118	140	X	57.6
pr	3399	8.5	1.47	40.9	0.11	0.24	109	146	X	53.0

hummer	1102	4.2	0.33	41.5	0.06	0.13	70	230	2500	61.2
fish	1682	12.6	0.32	36.8	0.47	1.98	89	94	Nova	54.2
ruby	2407	4.5	0.88	41.1	0.22	0.24	79	181	2500	54.0

grape	602	20.5	0.19	34.7	1.03	0.07	77	247	2500	47.0
maize	2219	35.8	6.55	46.8	0.03	0.01	81	176	X	64.0
chili	3215	6.7	0.29	34.9	0.25	0.06	46	88	X	61.1

fly	143	8.3	0.23	42.4	0.12	0.07	68	455	Nova	68.8
omoth	199	7.7	0.08	31.6	0.51	0.03	20	34	Nova	56.4
pmoth	330	6.0	0.17	35.1	0.20	0.07	21	40	Nova	72.8
cater	458	13.0	0.16	36.6	0.08	0.03	20	17	Nova	72.1
aphid	512	15.7	0.38	30.2	0.99	0.11	28	78	Nova	57.1
aedes	1323	17.6	0.37	37.9	0.04	0.04	70	68	Nova	62.9

Sample: A nickname for the sample, used in these charts.
gsize (est_genome_size): The estimated genome size, in megabases (Mb).
%rep (repfrac): Repeat Content Index (%): the percent of read kmers having depth ≥ twice the expected depth.
het (hetdist): The estimated mean separation between heterozygous sites, in kilobases (kb).
%gc (gc_percent): Estimated gc content of genome (%).
%hat (high_AT_index): The percent of kmers in the reads having ≥ 90% AT content. Locally extreme AT content is correlated with assembly gaps.
%di (dinucleotide_percent): Estimated dinucleotide content of genome (%).
mol_len (lw_mean_mol_len): The length-weighted mean molecule length, in kilobases (kb).
p10 (p10): For an average point on the genome, the estimated number of molecules that extend 10 kb in both directions from that point, counting both alleles.
seq (likely_sequencers): Abbreviated name of the sequencing instrument used.
raw_cov (raw_coverage): Total number of bases in all of the sequence reads, before trimming off barcodes, divided by the estimated genome size.

Contigs, phase blocks, and scaffolds are all longer

The following table adds N50 contig, phase block, and scaffold sizes. (In the metrics description table, these metrics are contig_N50, phase_block_N50 and scaffold_N50, respectively.) These all show a notable improvement in almost every case. For example, for the hgp sample, the N50 contig size rose from 120.9 to 160.7 kb, the N50 phase block size rose from 4.30 to 6.07 Mb, and the N50 scaffold size rose from 17.18 to 45.63 Mb, a greater than two-fold improvement.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	rawcov	1.2 contig	2.1 contig	1.2 phase	2.1 phase	1.2 scaff	2.1 scaff

hgp	3274	8.1	1.40	0.09	137	233	X	55.0	120.9	160.7	4.30	6.07	17.18	45.63
chm	3212	6.5	1.30	0.10	82	139	X	56.0	116.4	174.3	2.65	3.33	14.78	39.56
wfu	3390	8.1	1.39	0.09	96	143	X	53.1	120.1	163.0	2.79	3.29	18.31	44.06
chi	3247	8.1	1.59	0.11	104	125	X	55.4	113.7	154.8	2.60	3.14	15.51	38.17
yor	3156	6.4	1.04	0.10	123	156	X	57.0	119.2	166.1	9.76	14.77	15.23	47.42
yorm	3288	7.4	1.08	0.10	118	132	X	54.7	113.4	157.2	8.68	12.64	19.42	48.49
ash	3124	7.2	1.39	0.11	118	140	X	57.6	106.1	152.5	4.02	5.50	16.71	38.17
pr	3399	8.5	1.47	0.11	109	146	X	53.0	122.3	166.5	3.29	4.14	18.16	46.74

hummer	1102	4.2	0.33	0.06	70	230	2500	61.2	100.5	168.5	11.38	21.55	12.42	31.88
fish	1682	12.6	0.32	0.47	89	94	Nova	54.2	17.1	19.5	0.17	2.24	0.68	3.69
ruby	2407	4.5	0.88	0.22	79	181	2500	54.0	77.5	99.0	2.91	3.76	13.05	36.22

grape	602	20.5	0.19	1.03	77	247	2500	47.0	38.3	52.6	0.48	1.93	0.58	2.04
maize	2219	35.8	6.55	0.03	81	176	X	64.0	20.9	29.3	0.04	0.19	0.27	1.59
chili	3215	6.7	0.29	0.25	46	88	X	61.1	105.7	161.3	1.72	3.96	3.09	12.76

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	113.7	165.4	5.00	13.88	9.12	20.45
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	37.8	56.9	0.23	0.72	0.24	0.65
pmoth	330	6.0	0.17	0.20	21	40	Nova	72.8	63.7	105.4	0.97	2.48	1.71	7.09
cater	458	13.0	0.16	0.08	20	17	Nova	72.1	21.7	30.4	0.07	0.40	0.06	0.06
aphid	512	15.7	0.38	0.99	28	78	Nova	57.1	75.4	101.0	0.98	4.68	1.04	5.00
aedes	1323	17.6	0.37	0.04	70	68	Nova	62.9	20.3	27.0	0.09	0.51	0.07	0.14

Assembly accuracy and organization are also improved

The following table adds assembly accuracy and organization measures. Because the perfect stretch and misassembly estimate rely on a reference sequence, these two metrics are not generally available; we have calculated them here where we can. All three metrics are described below the table.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	rawcov	1.2 perf	2.1 perf	1.2 mis	2.1 mis	1.2 m10	2.1 m10

hgp	3274	8.1	1.40	0.09	137	233	X	55.0	22.77	26.89	1.13	0.45	2.41	1.90
chm	3212	6.5	1.30	0.10	82	139	X	56.0	.	.	0.32	0.11	2.05	1.57
wfu	3390	8.1	1.39	0.09	96	143	X	53.1	20.04	22.97	1.02	0.52	2.08	1.68
chi	3247	8.1	1.59	0.11	104	125	X	55.4	.	.	0.75	0.42	2.38	1.94
yor	3156	6.4	1.04	0.10	123	156	X	57.0	.	.	0.41	0.29	2.18	1.74
yorm	3288	7.4	1.08	0.10	118	132	X	54.7	.	.	0.38	0.19	2.51	1.99
ash	3124	7.2	1.39	0.11	118	140	X	57.6	.	.	0.62	0.41	2.46	1.90
pr	3399	8.5	1.47	0.11	109	146	X	53.0	.	.	0.46	0.24	2.24	1.84

hummer	1102	4.2	0.33	0.06	70	230	2500	61.2	.	.	.	.	6.02	5.39
fish	1682	12.6	0.32	0.47	89	94	Nova	54.2	.	.	.	.	31.62	25.30
ruby	2407	4.5	0.88	0.22	79	181	2500	54.0	.	.	.	.	2.89	2.12

grape	602	20.5	0.19	1.03	77	247	2500	47.0	.	.	.	.	26.67	15.26
maize	2219	35.8	6.55	0.03	81	176	X	64.0	15.82	30.34	2.14	1.70	26.38	9.87
chili	3215	6.7	0.29	0.25	46	88	X	61.1	.	.	.	.	6.79	4.83

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	29.27	37.05	0.62	0.10	7.06	5.67
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	.	.	.	.	26.39	14.13
pmoth	330	6.0	0.17	0.20	21	40	Nova	72.8	.	.	.	.	6.88	3.30
cater	458	13.0	0.16	0.08	20	17	Nova	72.1	.	.	.	.	36.93	20.18
aphid	512	15.7	0.38	0.99	28	78	Nova	57.1	.	.	.	.	12.28	6.94
aedes	1323	17.6	0.37	0.04	70	68	Nova	62.9	10.30	13.22	3.76	3.29	44.41	22.22

perf: This column provides the N50 perfect stretch, in kb. This can only be computed and is only shown for samples having a reference sequence from the same sample. It measures the N50 size of sequences in the reference that are perfectly mirrored in the assembly. Such ‘perfect stretches’ are terminated either by errors or gaps. In this context, transitioning from one allele to the other is an error. The N50 perfect stretch increased for all samples where it could be measured.
mis: This column shows the percent of the assembly that is misassembled. This can only be computed for assemblies having a reference sequence. As an example of the accounting, if a scaffold connects 5 Mb of one chromosome to 10 Mb of another chromosome, that counts as a 5 Mb error, which gets converted into a fraction by dividing by the assembly size. Errors of order and orientation are also included. The misassembly rate declined for all assemblies for which it could be measured.
m10 (m10): This metric estimates the percent of genomic kmers that are either missing from the assembly entirely or present only in scaffolds shorter than 10 kb. Each kmer counts once regardless of its multiplicity in the genome and thus this measure discounts repeats. It measures assembly disorganization. Assembly disorganization is less in all cases and markedly less in some.

Computational performance

The following table adds computational performance statistics. As shown, all but two assemblies were run on 256 GB servers. Memory use for Supernova has increased about 10% since version 1.2, and run times have increased on average by 60%. However, because of targeted optimizations, the likelihood of the extreme run times experienced by some users of 1.2 should now be much lower.

Sample	gsize	%rep	het	%hat	mol_len	p10	seq	rawcov	1.2 mem	2.1 mem	1.2 days	2.1 days

hgp	3274	8.1	1.40	0.09	137	233	X	55.0	256	256	1.9	3.3
chm	3212	6.5	1.30	0.10	82	139	X	56.0	256	256	1.7	3.2
wfu	3390	8.1	1.39	0.09	96	143	X	53.1	256	256	1.7	3.4
chi	3247	8.1	1.59	0.11	104	125	X	55.4	256	256	1.6	3.4
yor	3156	6.4	1.04	0.10	123	156	X	57.0	256	256	1.6	2.9
yorm	3288	7.4	1.08	0.10	118	132	X	54.7	256	256	1.6	3.1
ash	3124	7.2	1.39	0.11	118	140	X	57.6	256	256	1.8	3.2
pr	3399	8.5	1.47	0.11	109	146	X	53.0	256	256	1.7	3.1

hummer	1102	4.2	0.33	0.06	70	230	2500	61.2	256	256	0.6	1.2
fish	1682	12.6	0.32	0.47	89	94	Nova	54.2	256	256	1.5	2.3
ruby	2407	4.5	0.88	0.22	79	181	2500	54.0	256	256	1.0	1.9

grape	602	20.5	0.19	1.03	77	247	2500	47.0	256	256	0.5	0.8
maize	2219	35.8	6.55	0.03	81	176	X	64.0	512	512	8.1	7.8
chili	3215	6.7	0.29	0.25	46	88	X	61.1	512	512	1.8	3.6

fly	143	8.3	0.23	0.12	68	455	Nova	68.8	256	256	0.1	0.2
omoth	199	7.7	0.08	0.51	20	34	Nova	56.4	256	256	0.2	0.2
pmoth	330	6.0	0.17	0.20	21	40	Nova	72.8	256	256	0.1	0.3
cater	458	13.0	0.16	0.08	20	17	Nova	72.1	256	256	0.2	0.7
aphid	512	15.7	0.38	0.99	28	78	Nova	57.1	256	256	0.3	0.5
aedes	1323	17.6	0.37	0.04	70	68	Nova	62.9	256	256	1.2	2.4

mem (mem_peak): The amount of memory (RAM) in GB on the server used for the assembly.
days (etime_h): The total number of days elapsed during the assembly.

All assemblies were carried out on 28 core servers at 10x Genomics, having processor “Intel Xeon CPU E5-2697 v3 @ 2.6GHz”.

All assemblies were run twice to confirm exact reproducibility of results.

10x Genomics
Chromium De Novo Assembly

Supernova version 2.x performance

Synopsis of Supernova's current performance

Samples

Genome and data characteristics

Contigs, phase blocks, and scaffolds are all longer

Assembly accuracy and organization are also improved

Computational performance

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium De Novo Assembly

Supernova version 2.x performance

Synopsis of Supernova's current performance

Samples

Genome and data characteristics

Contigs, phase blocks, and scaffolds are all longer

Assembly accuracy and organization are also improved

Computational performance

10x Genomics
Chromium De Novo Assembly