Supernova 2.0, printed on 12/14/2019
Before generating Supernova data, please carefully read Achieving Success with De Novo Assembly. Please also review Supernova performance on twenty human and nonhuman datasets.
Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a single whole-genome library from an individual DNA source. A key feature of Supernova is that it creates diploid assemblies, thus separately representing maternal and paternal chromosomes over very long distances. Almost all other methods instead merge homologous chromosomes into single incorrect 'consensus' sequences. Supernova is the only practical method for creating diploid assemblies of large genomes.
The Supernova software package includes two processing pipelines and one for post-processing:
supernova mkfastq wraps Illumina's bcl2fastq to correctly demultiplex Chromium-prepared sequencing samples and to convert barcode and read data to FASTQ files.
supernova run takes FASTQ files containing barcoded reads from supernova mkfastq and builds a graph-based assembly. The approach is to first build an assembly using read kmers (K = 48), then resolve this assembly using read pairs (to K = 200), then use barcodes to effectively resolve this assembly to K ≈ 100,000. The final step pulls apart homologous chromosomes into phase blocks, which are often several megabases in length.
supernova mkoutput takes Supernova's graph-based assemblies and produces several styles of FASTA suitable for downstream processing and analysis.
Please refer to our 2017 paper "Direct determination of diploid genome sequences" for broad algorithmic details and assessment of computational performance and assembly quality for Supernova 1.2. There have been changes to algorithms and results since then.
For the Linked-Read laboratory technology that Supernova exploits, please refer to