10x Genomics
Chromium De Novo Assembly
Supernova, printed on 03/29/2024
Release Notes for 2.0
Supernova 2.0.1
Enhancements
- Add new genome metric: ploidy_histogram
- Truncate large metadata files when generating a tarball for upload to 10x, rather than omitting them.
Bug Fixes
- Fix an issue where
supernova mkoutput
would emit both reverse-complement and forward versions of records corresponding to the expansion of cycle gap edges. This affects the megabubbles, pseudohap and pseudohap2 output styles. It is safe to use 2.0.1 to generate new FASTA files from 2.0.0 assemblies. The new files show “ver=1.10” in the header.
- Fix an issue where unzipped FASTQ files were no longer accepted as input.
Failures, Crashes, and Forensics
- Fix a number of failures in
ASSEMBLER_M2
(viz. error messages regarding "TrimAdapter") related to unexpected read lengths. We still recommend against trimming or otherwise pre-processing Linked-Read data prior to running Supernova.
- Fix a number of crashes in the
ASSEMBLER_DM
and ASSEMBLER_ML
stages (viz. “remove duplicate edges”, “computing division points” or “translating pairs to matches”) related to runs with very high coverage depth. We still recommend running Supernova with coverage between 38x and 56x for your genome.
- Fix a bug that caused the
ASSEMBLER_ACP
stage to crash.
- Fix a potential pipeline failure in
ASSEMBLER_DF
(viz. “Map/Reduce operation has failed at pass 0”) related to users using a different number of cores than we tested. Note that many parts of the Supernova pipeline are not capable of using more than 32 cores.
- Fix a condition where
ASSEMBLER_PR
could exit prematurely (viz. “unneeded vertices”).
- Fix a potential infinite loop in
ASSEMBLER_CL
(viz. “identifying redundant edges”).
- Certain failures to memory map files now print extensive diagnostic information.
- Some users who run the Supernova executables from a Lustre filesystem experience exec() failures (viz. “Re-exec to adjust stack size failed"). The software is now more robust and in case of failure will provide remedial guidance.
Resource Utilization
- Fix an issue that caused unnecessary virtual memory use in the
ASSEMBLER_DF
stage. Note that Supernova uses memory-mapped files and therefore needs virtual address space (VMEM) that is generally larger than the maximum resident set size (RSS) of the process.
- Fix an issue that caused
ASSEMBLER_PR
to run very slowly on certain genomes (viz. “indexing closure paths”).
Supernova 2.0.0
Documentation
Data generation
- Barcode subsampling is now deprecated. This also simplifies the workflow and
reduces the amount of sequencing that is required.
- We provide a new 'optimized salting out' protocol that can be used to easily
prepare DNA from a wide range of sample types and which we demonstrate on single
insects.
Assembly quality
Bug fixes
Resource utilization
- Memory usage has increased on average by about 10%. Nevertheless, of
20 test assemblies,
18 ran on a server having 256 GB RAM, and the remaining 2 ran on a
server having 512 GB RAM.
The 8 human assemblies in the set
(all at about 56x coverage) ran on a server having 256 GB RAM, however it is possible
that for stochastic reasons, some human datasets may require somewhat more memory.
- The mean run time for Supernova has increased, however the variance is lower.
Several extreme run time phenotypes are gone.
Usability
- Molecule length is
now more accurately computed.
A plot is now provided showing the inferred distribution of molecule lengths and in
comparison to control samples. This replaces the previous estimation, histogram_molecules.json.
- Kmer histogram and a pdf plot is reintroduced.
- Several new metrics about the genome and data (including genome size) are now
computed.
- Total wallclock time for assembly is now shown in the text summary file.
- An alert is now issued if the estimated genome size seems too low or too high.
- An alert is now issued if coverage seems too low or too high.
- Alerts are now shown in the text summary file.
- The representation of cycles in FASTA output has been improved.