Supernova1.2, printed on 10/27/2020
If the supernova mkfastq or supernova run pipelines fail, they will automatically generate a "debug tarball" named sample_id.mri.tgz that contains the logs and metadata generated by the pipestance leading up to failure. This file will make it immensely easier to diagnose an assembly problem. It should not contain confidential information about a particular sample. In order to send this tarball to 10x, you can use the supernova upload command:
$ supernova upload [email protected] sample_id.mri.tgz
If you are unable to use the mechanism above you can email the debug tarball file directly to the 10x software team to help resolve any issues with using Supernova.
Memory. Supernova can run out of memory. In such cases it may exit with a diagnostic message, suggesting possible causes and solutions. Or it may exit prematurely without a diagnostic message. We are working to improve this messaging. If you are unsure what has happened, or need help, please contact us. While Supernova 1.1 has been designed to assemble a human genome using less than 384 GB of memory, it will attempt to use more memory if a larger system is detected. To limit this behavior, please see the --localmem option to supernova run.
Threads/Cores. Supernova is designed to use up to 28 cores installed in a system. You can request that Supernova use fewer cores using the --localcores option to supernova run.
File space. Filesystem overflow can lead to premature termination, without explanation. This is something to check. Again, if you are unsure what has happened, please contact us.
DNA quality. By far the most common cause of subpar assembly results is poor input DNA quality. Input DNA might be short, or nicked, and subsequently broken at nick sites during the 10x library construction process. If you have access to fresh blood or a cell line, then the protocols we recommend should consistently yield very high quality DNA. Protocols for creating DNA from tissue are less universal, and at this time we have limited information on assembly results obtained from specific protocols. We are highly interested in learning about your experiences, and would like to share successful protocols with the community.
Algorithmic improvements. Some hard regions of the genome are at present not assembled optimally by Supernova (or other methods), and some genomes are enriched for hard regions. Much of our current R&D is focused on these regions, and improvements will be made available in subsequent versions of Supernova.
If you wish to troubleshoot a pipeline failure yourself, it is important to identify if you are experiencing a preflight failure, an in-flight failure, or an alert.
|The remainder of this guide uses the term pipestance to refer to a specific instance of a pipeline running.|
Preflight failures are the most common and are the result of invalid input data or runtime parameters. Because they occur before the pipeline actually runs, there will be no pipeline output and the error is reported directly to your terminal.
supernova mkfastq will generate the following error if Illumina's bcl2fastq software is not installed:
[error] No bcl2fastq found on path. demux requires bcl2fastq v2.17 or greater for RTA version: 2.7.3
In-flight failures may be the result of factors external to the pipeline such as running out of system memory or disk space, or may be due to issues with the data that could not be detected prior to de novo assembly. Different stages may fail in different ways, so the specific error messages vary widely.
There are a few important files that are saved to your pipeline output directory which, by default, is named according to the flowcell serial number for supernova mkfastq (e.g., HAWT7ADXX) and your --id name for supernova run.
The pipeline execution log that is output to your terminal during pipeline execution is also saved to output_dir/_log.
Stages that experience a hard failure generate an _errors file containing the precise error that caused a stage to halt. You view these error logs, if they exist, using find output_dir -name _errors | xargs cat
Each stage also logs its stdout and stderr streams to _stderr and _stdout files.
These logs can be listed using find output_dir -name _stderr and may contain elucidating error messages in certain stages that call separate binaries, such as ASSEMBLER_DF and ASSEMBLER_CP.
A more detailed description of the pipeline output directory and its contents is given in the Pipestance Structure page.
|If you are unable to diagnose a failure yourself, you can always contact the 10x software support team for help.|
Once you have determined the reason for failure and are ready to continue running the pipeline, you can typically issue the same supernova run or supernova mkfastq command to continue execution of the pipestance from the stage that originally failed.
When supernova run or supernova mkfastq is run, it will detect if its intended output directory already exists. If it does, this existing pipeline output directory will be treated as an incomplete pipestance and resume execution. This feature allows pipelines to be stopped and resumed with great flexibility, but it can also result in errors such as:
RuntimeError: /home/jdoe/runs/sample345 is not a pipestance directory
which indicates that you specified a --id that corresponds to an existing directory that was not created by supernova run.
The following error:
RuntimeError: pipestance 'HAWT7ADXX' already exists and is locked by another Martian instance. If you are sure no other Martian instance is running, delete the _lock file in /home/jdoe/runs/HAWT7ADXX and start Martian again.
indicates that you may already have a copy of supernova run or supernova mkfastq running that is using the same output directory. If you are sure that there is no pipestance running in the given output directory, you can either remove that output directory entirely (mv HAWT7ADXX HAWT7ADXX.old) to restart the pipestance from the beginning, or you can remove the pipestance's lock file (rm HAWT7ADXX/_lock) and re-run the supernova run command to resume pipeline execution.
If you encounter the following error when attempting to resume a pipestance:
RuntimeError: pipestance 'sample345' already exists with different invocation file /home/jdoe/runs/sample345/_invocation
you are attempting to resume a pipestance using command-line arguments that are different from those used to first run it. You can view the parameters input to the existing pipeline by examining the _log file located in the output directory (e.g., head -n20 /home/jdoe/runs/sample345/_log)
During de novo assembly supernova run collects metrics that characterize the quality of input data. These metrics capture the quality of various stages of the data preparation workflow, including library preparation and sequencing. When the data are less than ideal we raise alerts that are displayed in the pipeline output after pipeline execution completes.
For example, if the user runs Supernova with paired-end reads of length 140 bases, the following alert is displayed:
Alerts: We observe many reads shorter than 150 bases.The ideal read length for Supernova is 150 bases. Reads shorter than the ideal length are likely to yield a lower quality assembly.
In rare cases, when we detect serious issues with the input data that render the output of supernova run to be completely unreliable, we terminate execution. For example, if we find that a large majority of the reads do not have valid 10x barcodes, we exit with the following message:
[error] The fraction of input reads having valid barcodes is 20.3 percent, whereas the ideal is at least 80 percent. This condition could have multiple causes including wrong library type, failed library construction and low sequence quality on the barcode bases. This could have a severe effect on assembly performance, and Supernova has not been tested on data with these properties, so execution will be terminated.