10x Genomics
Chromium De Novo Assembly

Supernova1.2, printed on 04/16/2025

Assembly Process

Supernova generates highly-contiguous, phased, whole-genome de novo assemblies from a Chromium-prepared library.

Supernova should be run with at most 1.2 billion reads, and at 38-56x coverage of the genome. Please see Sample Requirements and System Requirements before creating your Chromium libraries for assembly.

This involves the following steps:

Run supernova mkfastq on the Illumina BCL output folder to generate FASTQ files.
Run supernova run separately for each sample to generate a whole genome de novo assembly for each.
Run supernova mkoutput in order to generate various styles of FASTA output for your assemblies.

For the following example, assume that the Illumina BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX.

Run supernova mkfastq

First, follow the instructions on running supernova mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX, then supernova mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path.

Run supernova run for de novo assembly

To run Supernova, you use the supernova run command, with the following parameters:

For help on which arguments to use to target a particular set of FASTQs, consult Running 10x Pipelines on FASTQ Files.

Argument	Description
`--id`	A unique run ID string: e.g. `sample345`
`--fastqs`	Path of the FASTQ folder generated by `supernova mkfastq` e.g. `/home/jdoe/runs/HAWT7ADXX/outs/fastq_path`
`--sample`	(optional) Can be used to select only a single sample of those specified in the sample sheet supplied to `mkfastq`. By default, all samples are used.
`--description`	(optional) Description of the data set. This will be included, along with the run ID string, in various output files.
`--maxreads`	(optional) Downsample if more than the specified number of reads is provided. (default: 1.2B reads)
`--bcfrac`	(optional) Fraction of barcodes in the sample to use. This is intended to aid in the assembly of small genomes. Randomly chooses the specified fraction of all barcodes and retains only reads belonging to the chosen barcodes. Unbarcoded reads are selected randomly at the same rate. If `--maxreads` is specified as well, the data are examined after barcode subsampling and reads are randomly chosen to achieve the desired number.
`--localcores`	(optional) limits concurrent sections of Supernova to use the specified number of cores.
`--localmem`	(optional) limits memory use on shared systems where Supernova may attempt to use more resources than a user is allowed. Note that this is not a hard limit, but is used as a hint for high-memory portions of the assembly process that deliberately scale to the amount of memory installed in a system.

The following options are deprecated, but preserved for processing data created using the older, supernova demux data preparation step:

Argument	Description
`--indices`	[deprecated and optional; demux only] Sample indices associated with this sample. Comma-separated list of: index set plate wells: `SI-GA-A1,SI-GA-H12` index sequences: `TCGCCATA,GTATACAC`
`--lanes`	[deprecated and optional; demux only] Lanes associated with this sample

After determining these input arguments, call supernova run:

$ cd /home/jdoe/runs
$ supernova run --id=sample345 \
                --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path

Note that Supernova has been designed for stand-alone operation on a single, large system. Portions of the assembly process will scale to use all of the installed memory on a system. If you need to limit memory use by Supernova, e.g. on a shared system, please see the --localmem command line option. Likewise, parallel sections of code will use all cores on a system and this behavior can be limited with --localcores.

Following a set of preflight checks to validate input arguments, Supernova pipeline stages will begin to run:

supernova run
Copyright (c) 2016 10x Genomics, Inc.  All rights reserved.
-----------------------------------------------------------------------------
Martian Runtime - 2.2.2
 
Running preflight checks (please wait)...
2016-01-01 00:00:01 [runtime] (ready)           ID.sample345.ASSEMBLER_CS._ASSEMBLER_PREP.SETUP_CHUNKS
2016-01-01 00:00:01 [runtime] (split_complete)  ID.sample345.ASSEMBLER_CS._ASSEMBLER_PREP.SETUP_CHUNKS
...

supernova run will use all of the sequence data available in the FASTQ folder, up to a default of 1.2B reads. If you would like to use less (or more) of the data, you can specify the number of reads that Supernova should assemble using the --maxreads option. Note that using more than 1.2B reads is not recommended for human samples, but may be appropriate for larger genomes. Please see the Supernova Guidance tech note for more information. If you are processing data prepared with the older, deprecated supernova demux process, you can also specify --indices and --lanes to further select the data to be processed. For new datasets, this selection is performed in the samplesheet provided to supernova mkfastq.

supernova run assumes that all of the cores on your system are available for its use, but you can use the --localcores option to limit this. Similarly, supernova run assumes that all of the memory on your system is available for its use. You can use --localmem to suggest limits, however memory utilization in certain sections of the code will scale with the size of the genome, the number of input reads, and the quality of the data and may exceed this limit.

The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345) for its output. If this folder already exists, supernova run will assume it is an existing pipestance and attempt to resume running it.

Watching Supernova Progress

The standard output from supernova run displays lines that indicate the progress through pipeline stages. The core Supernova assembly algorithm comprises two long-running pipeline stages called ASSEMBLER_DF and ASSEMBLER_CP. The standard output will pause during these long-running stages with a message such as:

...
2016-01-03 00:00:01 [runtime] (run:local)       ID.sample345.ASSEMBLER_CS._ASSEMBLER_DF.fork0.chnk0

and may appear to have stalled. If you wish to monitor the progress of one of these stages, you can view the stage-specific standard output:

e.g.

$ cd /home/jdoe/runs
$ tail sample345/ASSEMBLER_CS/ASSEMBLER_DF/fork0/chnk0/_stdout

and likewise for ASSEMBLER_CP

Output Files

A successful supernova run execution should conclude with a message that looks similar to this:

...
2016-01-03 00:00:01 [runtime] (chunks_complete) ID.sample345.ASSEMBLER_CS._ASSEMBLER_CP
2016-01-03 00:00:01 [runtime] (run:local)       ID.sample345.ASSEMBLER_CS._ASSEMBLER_CP.fork0.join
2016-01-03 00:00:03 [runtime] (join_complete)   ID.sample345.ASSEMBLER_CS._ASSEMBLER_CP
 
Outputs:
- Run summary:        /home/jdoe/runs/sample345/outs/summary.csv
- Run report:         /home/jdoe/runs/sample345/outs/report.txt
- Raw assembly files: /home/jdoe/runs/sample345/outs/assembly
 
Pipestance completed successfully!
Saving pipestance info to sample345/sample345.mri.tgz

The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345). The subfolder named outs will contain the main pipeline output files that are described in more detail in Output Overview.

Run supernova mkoutput for FASTA output

First, familiarize yourself with the representation of a genome assembly as a graph structure. Next, follow the instructions on running supernova mkoutput to generate FASTA files.

10x Genomics
Chromium De Novo Assembly

Assembly Process

Run supernova mkfastq

Run supernova run for de novo assembly

Watching Supernova Progress

Output Files

Run supernova mkoutput for FASTA output

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium De Novo Assembly

Assembly Process

Run supernova mkfastq

Run supernova run for de novo assembly

Watching Supernova Progress

Output Files

Run supernova mkoutput for FASTA output

10x Genomics
Chromium De Novo Assembly