10x Genomics
Chromium De Novo Assembly

Supernova2.1, printed on 03/30/2025

Generating FASTQs with supernova mkfastq

Analysis software for 10x Genomics linked read products is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support.

Overview
Example Workflows
Arguments and Options
Example Data
Running mkfastq with a simple CSV samplesheet
Running mkfastq with an Illumina Experiment Manager sample sheet
Checking FASTQ output
Reading Quality Control Metrics
Troubleshooting

Overview

The supernova workflow starts by demultiplexing the Illumina sequencer's base call files (BCLs) for each flowcell directory into FASTQ files. 10x recommends using supernova mkfastq, a pipeline that wraps Illumina's bcl2fastq and provides a number of convenient features in addition to the features of bcl2fastq:

Translates 10x sample index set names into the corresponding list of four sample index oligonucleotides. For example, well A1 can be specified in the samplesheet as SI-GA-A1, and supernova mkfastq will recognize the four oligos GGTTTACT, CTAAACGG, TCGGCGTC, and AACCGTAA and merge the resulting FASTQ files.
Supports a simplified CSV samplesheet format to handle 10x use cases.
Generates sequencing and 10x-specific quality control metrics, including barcode quality, accuracy, and diversity.
Supports most bcl2fastq arguments, such as --use-bases-mask.

Example Workflows

In this example, we have two 10x libraries (each processed through a separate Chromium chip channel) that are multiplexed on a single flowcell. Note that after running supernova mkfastq, we run a separate instance of the pipeline on each library:

two libraries, one flowcell

In this example, we have one 10x library sequenced on two flowcells. Note that after running supernova mkfastq, we run a single instance of the pipeline on all the FASTQ files generated:

one library, two flowcells

Arguments and Options

supernova mkfastq will accept additional options beyond those shown in the table below because it is a wrapper around bcl2fastq. Please consult the User Guide for Illumina's bcl2fastq for more information.

Parameter	Function
`--run`	(Required) The path of Illumina BCL run folder.
`--id`	(Optional; defaults to the name of the flowcell referred to by `--run`) Name of the folder created by mkfastq.
`--samplesheet`	(Optional) Path to an Illumina Experiment Manager-compatible sample sheet which contains 10x sample index set names (e.g., SI-GA-A12) in the sample index column. All other information, such as sample names and lanes, should be in the sample sheet.
`--sample-sheet`	(Optional) Equivalent to `--samplesheet` above.
`--csv`	(Optional) Path to a simple CSV with lane, sample, and index columns, which describe the way to demultiplex the flowcell. The index column should contain a 10x sample set name (e.g., SI-GA-A12 or the actual oligo sequence used). This is an alternative to the Illumina IEM sample sheet, and will be ignored if `--samplesheet` is specified.
`--simple-csv`	(Optional) Equivalent to `--csv` above.
`--ignore-dual-index`	(Optional) Ignores the second index on a dual-indexed flowcell.
`--qc`	(Optional) Calculate both sequencing and 10x-specific metrics, including per-sample barcode matching rate. Will not be performed unless this flag is specified. Not supported for NovaSeq flow cells.
`--lanes`	(bcl2fastq option) Comma-delimited series of lanes to demultiplex (e.g. 1,3). Use this if you have a sample sheet for an entire flowcell but only want to generate a few lanes for further 10x analysis.
`--use-bases-mask`	(bcl2fastq option) Same meaning as for `bcl2fastq`. Use to clip extra bases off a read if you ran extra cycles for QC.
`--delete-undetermined`	(bcl2fastq option) Delete the `Undetermined` FASTQs generated by `bcl2fastq`. Useful if you are demultiplexing a small number of samples from a large flowcell.
`--output-dir`	(bcl2fastq option) Generate FASTQ output in a path of your own choosing, instead of `flowcell_id/outs/fastq_path`.
`--project`	(bcl2fastq option) Custom project name, to override the samplesheet or to use in conjunction with the `--csv` argument.
`--jobmode`	(Martian option) Job manager to use. Valid options: `local` (default), `sge`, `lsf`, or a .template file.
`--localcores`	(Martian option) Set max cores the pipeline may request at one time. Only applies when `--jobmode=local`.
`--localmem`	(Martian option) Set max GB the pipeline may request at one time. Only applies when `--jobmode=local`.

Example Data

supernova mkfastq recognizes two file formats for describing samples: a simple, three-column CSV format, and the Illumina Experiment Manager (IEM) sample sheet format used by bcl2fastq. There is an example below for running mkfastq with each format.

To follow along, please do the following:

Download the tiny-bcl tar file.
Untar the tiny-bcl tar file in a convenient location. This will create a new tiny-bcl subdirectory.
Download the simple CSV layout file: tiny-bcl-simple-2.1.0.csv.
Download the Illumina Experiment Manager sample sheet: tiny-bcl-samplesheet-2.1.0.csv.

Running mkfastq with a Simple CSV Samplesheet

We recommend the simple CSV samplesheet for most sequencing experiments. The simple CSV format has only three columns (Lane, Sample, Index), and is thus less prone to formatting errors. You can see an example of this in tiny-bcl-simple-2.1.0.csv:

Lane,Sample,Index
1,test_sample,SI-GA-A3

Here are the options for each column:

Lane	Which lane(s) of the flowcell to process. Can be either a single lane, a range (e.g., 2-4) or '*' for all lanes in the flowcell.
Sample	The name of the sample. This name will be the prefix to all the generated FASTQs, and will correspond to the `--sample` argument in all downstream 10x pipelines. Sample names must conform to the Illumina `bcl2fastq` naming requirements. Only letters, numbers, underscores and hyphens area allowed; no other symbols, including dots (".") are allowed.
Index	The 10x sample index set that was used in library construction, e.g., SI-GA-A12.

To run mkfastq with a simple layout CSV, use the --csv argument. Here's how to run mkfastq on the tiny-bcl sequencing run with the simple layout:

$ supernova mkfastq --id=tiny-bcl \
                     --run=/path/to/tiny_bcl \
                     --csv=tiny-bcl-simple-2.1.0.csv
 
supernova mkfastq
Copyright (c) 2017 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Martian Runtime - 2.1.1-v2.3.3
Running preflight checks (please wait)...
2017-08-09 16:33:54 [runtime] (ready)           ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2017-08-09 16:33:57 [runtime] (split_complete)  ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2017-08-09 16:33:57 [runtime] (run:local)       ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET.fork0.chnk0.main
2017-08-09 16:34:00 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
...

Running mkfastq with an Illumina Experiment Manager Sample Sheet

The supernova mkfastq pipeline can also be run with a samplesheet in the Illumina Experiment Manager (IEM) format. If you didn't sequence with an i7 index, you'll need to use this format. Let's briefly look at tiny-bcl-samplesheet-2.1.0.csv before running the pipeline. You will see a number of fields specific to running on Illumina platforms, and then a [Data] section.

That section is where to put your sample, lane and index information. Here's an example:

[Data]
Lane,Sample_ID,index,Sample_Project
1,Sample1,SI-GA-A3,tiny-bcl

Here, SI-GA-A3 refers to a 10x sample index, a set of four oligo sequences. supernova mkfastq also supports listing oligo sequences explicitly.

In this example, only reads from lane 1 will be used. To demultiplex the given sample index across all lanes, omit the lanes column entirely.

Sample names must conform to the Illumina bcl2fastq naming requirements. Specifcally only letters, numbers, underscores and hyphens area allowed. No other symbols, including dots (.) are allowed.

Also note that while an authentic IEM sample sheet will contain other sections above the [Data] section, these are optional for demultiplexing. For demultiplexing an existing run with supernova mkfastq, only the [Data] section is required.

Next, run the supernova mkfastq pipeline, using the --samplesheet argument:

$ supernova mkfastq --id=tiny-bcl \
                     --run=/path/to/tiny_bcl \
                     --samplesheet=tiny-bcl-samplesheet-2.1.0.csv
 
supernova mkfastq
Copyright (c) 2017 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Martian Runtime - 2.1.1-v2.3.3
Running preflight checks (please wait)...
2017-08-09 16:25:49 [runtime] (ready)           ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2017-08-09 16:25:52 [runtime] (split_complete)  ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2017-08-09 16:25:52 [runtime] (run:local)       ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET.fork0.chnk0.main
2017-08-09 16:25:58 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
...

If you encounter any preflight errors, please refer to the Troubleshooting page.

Checking FASTQ Output

Once the supernova mkfastq pipeline has successfully completed, the output can be found in a new folder named with the value you provided to supernova mkfastq in the --id option (if not specified, defaults to the name of the flowcell):

$ ls -l
drwxr-xr-x 4 jdoe  jdoe     4096 Sep 13 12:05 tiny-bcl

The key output files can be found in outs/fastq_path, and is organized in the same manner as a conventional bcl2fastq run:

$ ls -l tiny-bcl/outs/fastq_path/
drwxr-xr-x 3 jdoe jdoe         3 Aug  9 12:26 Reports
drwxr-xr-x 2 jdoe jdoe         8 Aug  9 12:26 Stats
drwxr-xr-x 3 jdoe jdoe         3 Aug  9 12:26 tiny-bcl
-rw-r--r-- 1 jdoe jdoe  20615106 Aug  9 12:26 Undetermined_S0_L001_I1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe  51499694 Aug  9 12:26 Undetermined_S0_L001_R1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 152692701 Aug  9 12:26 Undetermined_S0_L001_R2_001.fastq.gz
 
$ tree tiny-bcl/outs/fastq_path/tiny_bcl/
tiny-bcl/outs/fastq_path/tiny_bcl/
  Sample1
    Sample1_S1_L001_I1_001.fastq.gz
    Sample1_S1_L001_R1_001.fastq.gz
    Sample1_S1_L001_R2_001.fastq.gz

This example was produced with a sample sheet that included "tiny-bcl" as the Sample_Project, so the directory containing the sample folders is named tiny-bcl. If a Sample_Project wasn't specified, or if a simple layout CSV file was used (which does not have a Sample_Project column), the directory containing the sample folders would be named according to the flow cell ID instead.

If you want to remove the Undetermined FASTQs from the output to save space, you can run mkfastq with the --delete-undetermined flag. To see all supernova mkfastq options, run supernova mkfastq --help.

Assessing Quality Control Metrics

When the --qc flag is specified, the supernova mkfastq pipeline writes both sequencing and 10x-specific quality control metrics into a JSON file. The metrics are in the outs/qc_summary.json file.

The use of --qc flag is not supported on NovaSeq flow cells.

The qc_summary.json file contains a number of useful metrics. The sample_qc key is a good place to start exploring your data.

"sample_qc": {
  "Sample1": {
    "5": {
      "barcode_exact_match_ratio": 0.9336158258904611,
      "barcode_q30_base_ratio": 0.9611993091728814,
      "bc_on_whitelist": 0.9447542078230667,
      "mean_barcode_qscore": 37.770630795934,
      "number_reads": 2748155,
      "read1_q30_base_ratio": 0.8947676653366835,
      "read2_q30_base_ratio": 0.7771883245304577
    },
    "all": {
      "barcode_exact_match_ratio": 0.9336158258904611,
      "barcode_q30_base_ratio": 0.9611993091728814,
      "bc_on_whitelist": 0.9447542078230667,
      "mean_barcode_qscore": 37.770630795934,
      "number_reads": 2748155,
      "read1_q30_base_ratio": 0.8947676653366835,
      "read2_q30_base_ratio": 0.7771883245304577
    }
  }
}

The sample_qc metric is a series of key value pairs for each sample in the sample sheet, and one metrics structure per lane per sample, plus an 'all' structure in case a sample spans multiple lanes.

The metrics are as follows:

Key	Meaning
`barcode_exact_match_ratio`	The percentage of barcode sequences that exactly match a whitelisted 10x barcode.
`barcode_q30_base_ratio`	The percentage of barcode bases at or above Q30.
`bc_on_whitelist`	The percentage of barcode sequences that match a 10x barcode on the whitelist, post error-correction. Corresponds to the "Valid Barcodes" value in `supernova` output metrics.
`mean_barcode_qscore`	Mean quality score of barcode bases.
`number_reads`	Reads per lane matching the sample's sample index (or overall in 'all').
`read1_q30_base_ratio`	The percentage of R1 bases at or above Q30.
`read2_q30_base_ratio`	The percentage of R2 bases at or above Q30.

By looking at this output, you can diagnose low barcode mapping rates and read quality before running a supernova pipeline.

Additional metrics in outs/qc_summary.json include per-cycle quality metrics, yield, cluster density and %passing filter, and both supernova and bcl2fastq version information.

Troubleshooting

If you encounter a crash while running supernova mkfastq, please upload the tarball (with the extension .mri.tgz) in your output directory:

supernova upload youremail@institution.edu jobid.mri.tgz

...where jobid is what you input into the --id option of mkfastq (if not specified, defaults to the ID of the flowcell). This tarball contains numerous diagnostic logs that we can use for debugging.

You should then receive an automated email from 10x Genomics (If not, please email support@10xgenomics.com). For the fastest service please respond with the following:

The exact command line you used.
The sample sheet that you used.
The RunInfo.xml and runParameters.xml files from your BCL directory.
The kind of libraries you are demultiplexing (including chemistry).

10x Genomics
Chromium De Novo Assembly

Generating FASTQs with supernova mkfastq

Table of Contents

Overview

Example Workflows

Arguments and Options

Example Data

Running mkfastq with a Simple CSV Samplesheet

Running mkfastq with an Illumina Experiment Manager Sample Sheet

Checking FASTQ Output

Assessing Quality Control Metrics

Troubleshooting

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium De Novo Assembly

Generating FASTQs with supernova mkfastq

Table of Contents

Overview

Example Workflows

Arguments and Options

Example Data

Running mkfastq with a Simple CSV Samplesheet

Running mkfastq with an Illumina Experiment Manager Sample Sheet

Checking FASTQ Output

Assessing Quality Control Metrics

Troubleshooting

10x Genomics
Chromium De Novo Assembly