Supernova1.1, printed on 11/21/2024
Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a single whole-genome library from an individual DNA source. A key feature of Supernova is that it creates diploid assemblies, thus separately representing maternal and paternal chromosomes over very long distances. Almost all other methods instead merge homologous chromosomes into single incorrect 'consensus' sequences. Supernova is the only practical method for creating diploid assemblies of large genomes.
The Supernova software package includes two processing pipelines:
supernova mkfastq wraps Illumina's bcl2fastq to correctly demultiplex Chromium-prepared sequencing samples and to convert barcode and read data to FASTQ files.
supernova run takes FASTQ files containing barcoded reads from supernova mkfastq and builds a graph-based assembly. The approach is to first build an assembly using read kmers (K = 48), then resolve this assembly using read pairs (to K = 200), then use barcodes to effectively resolve this assembly to K ≈ 100,000. The final step pulls apart homologous chromosomes into phase blocks, which are typically multi-megabase for human genomes.
and for post-processing:
Supernova is tested and validated on human germline data. Please see the Supernova guidance for non-human considerations.
See the manuscript "Direct determination of diploid genome sequences" for broad algorithmic details and assessment of performance and assembly quality for Supernova.