Software  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium De Novo Assembly

Generating Output

Once your assembly has completed (yielding binary data structures), use the command supernova mkoutput to generate a FASTA file representing your assembly. The full format is:

supernova mkoutput \
        --asmdir=/path/to/outs/assembly \
        --outprefix=output_filename_prefix \
        --style=raw|megabubbles|pseudohap|pseudohap2 \
        [ --minsize=n ] \
        [ --headers=short|full ]

Required Style Option

There are four styles of FASTA output:

--style=raw

The raw style represents every edge in the assembly as a FASTA record (seen as red segments in the above cartoon). These include microbubble arms and gaps. Gaps that are captured by read pairs are represented by 100 Ns. Gaps that are not captured by read pairs are represented by a stretch of Ns of length equal to the estimated gap size (but always more than 100). In addition, where cycles are present in the graph, an arbitrary path is chosen through the cycle, and the sequence for that path is suffixed by 10 Ns. Bubbles and gaps generally appear once per 10-20 kb. Raw graph records are roughly two orders of magnitude shorter than megabubble arms. For each edge in the raw graph, there is also an edge written to the FASTA file representing the reverse complement sequence.

For the remaining output styles, we flatten each bubble by selecting the branch having highest coverage, merge gaps with adjacent sequences (leaving 100 Ns), and drop reverse complement edges.

--style=megabubbles

In this style each megabubble arm corresponds to a FASTA record, as does each intervening sequence.

--style=pseudohap

The pseudohap style generates a single record per scaffold. For example in the cartoon for style two, the seven red edges on top (corresponding to seven FASTA records) are combined into a single FASTA record. Megabubble arms are chosen arbitrarily so many records will mix maternal and paternal alleles.

--style=pseudohap2

This style is like the pseudohap option, except that for each scaffold, two ‘parallel’ pseudohaplotypes are created and placed in separate FASTA files. Records in these files are parallel to each other.

Required Non-Style Options

--asmdir should be set to the path of the assembly output directory created by Supernova. This will be the directory outs/assembly underneath where your pipeline is stored.

--outprefix is a prefix filename for assembly output. This can be a relative or absolute pathname. For instance, specifying --outprefix=/x/y/z will create a FASTA file in the directory /x/y called z.fasta. Note that for the pseudohap2 option, which creates two files, these would be called z.1.fasta and z.2.fasta

Optional Parameters

--minsize=n

For output styles other than raw, you may choose to print only those FASTA records longer than a given size. In raw mode all FASTA records are printed.

--minsize=n [specify minimum FASTA record size in bases, default: 1000]

--headers=<mode>

By default, FASTA header lines show only the start and end edges of the path associated to the record. Optionally, the entire path associated with each edge may be displayed. This will yield some huge header lines, which may break other software.

--headers=full  [verbose output: all edge ids written]
--headers=short [only first and last edge ids shown; default]

A short record might look like this:

>55 edges=4..6 left=15 right=88 ver=1.7 style=1
ACTTTAGACGGGGACCCTAGACTTACTTGAGAAAACGTTTTTACACTTACCAACCATATATATCCCCAGAGGAGGGATTT
TTAGGACATTAGCCCACCAAATTTACACACTTATATATATTTTATCGGAGCTCCAGTCCCGCCCAAAAACTTTACGTTTT

And the same example shown above with --headers=full might have the following header line:

>55 edges=4,15,33,7,6 left=15 right=88 ver=1.7 style=1
ACTTTAGACGGGGACCCTAGACTTACTTGAGAAAACGTTTTTACACTTACCAACCATATATATCCCCAGAGGAGGGATTT
TTAGGACATTAGCCCACCAAATTTACACACTTATATATATTTTATCGGAGCTCCAGTCCCGCCCAAAAACTTTACGTTTT