HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Visium Spatial Gene Expression

Generating FASTQs with spaceranger mkfastq

Table of Contents

Overview

The spaceranger workflow starts by demultiplexing the Illumina sequencer's base call files (BCLs) for each flow cell directory into FASTQ files. 10x has developed spaceranger mkfastq, a pipeline that wraps
Illumina's bcl2fastq and provides a number of convenient features in addition to the features of bcl2fastq:

Example workflows

In this example, there are two 10x libraries (each processed through a separate capture area) that are multiplexed on a single flow cell. Note that after running spaceranger mkfastq, we run a separate instance of the spaceranger pipeline on each library.

two libraries, one flowcell

In this example, one 10x library is sequenced on two flow cells. Note that after running spaceranger mkfastq, we run a single instance of the spaceranger pipeline on all the FASTQ files generated.

one library, two flowcells

Command line arguments

spaceranger mkfastq accepts additional options beyond those shown in the table below because it is a wrapper around bcl2fastq. Consult the User Guide for Illumina's bcl2fastq for more information.

ParameterFunction
--runRequired. The path of Illumina BCL run folder.
--idOptional. Name of the folder created by mkfastq. If this is not specified the output folder name defaults to the name of the flow cell referred to by --run.
--samplesheetOptional. Path to an Illumina Experiment Manager-compatible sample sheet which contains 10x sample dual-index names (e.g., SI-TT-A12) in the sample index column. All other information, such as sample names and lanes, should be in the sample sheet.
--sample-sheetOptional. Equivalent to --samplesheet above.
--csvOptional. Path to a simple CSV with lane, sample, and index columns, which describe the way to demultiplex the flow cell. The index column should contain a 10x sample dual-index name (e.g., SI-TT-A12). This is an alternative to the Illumina IEM sample sheet, and will be ignored if --samplesheet is specified.
--simple-csvOptional. Equivalent to --csv above.
--filter-dual-indexOptional. Only demultiplex samples identified by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-index samples. Single-index samples will not be demultiplexed. Also notice that spaceranger will run single-index data, but it is not supported.
--lanesbcl2fastq option. Comma-delimited series of lanes to demultiplex (e.g. 1,3). Use this if you have a sample sheet for an entire flow cell but only want to generate a few lanes for further 10x analysis.
--use-bases-maskbcl2fastq option. Same meaning as for bcl2fastq. Use to clip extra bases off a read if you ran extra cycles for QC.
--delete-undeterminedbcl2fastq option. Delete the Undetermined FASTQs generated by bcl2fastq. Useful if you are demultiplexing a small number of samples from a large flow cell.
--barcode-mismatchesbcl2fastq option. Same meaning as for bcl2fastq. Use this option to change the number of allowed mismatches per index adapter (0, 1, 2). Default: 1.
--output-dirbcl2fastq option. Generate FASTQ output in a path of your own choosing, instead of flowcell_id/outs/fastq_path.
--projectbcl2fastq option. Custom project name, to override the sample sheet or to use in conjunction with the --csv argument.
--jobmodeOptional. Job manager to use. Valid options: local (default), sge, lsf, or a .template file.
--localcoresRecommended when run in localmode. Set max cores the pipeline may request at one time. Only applies when --jobmode=local.
--localmemRecommended when run in localmode. Set max GB the pipeline may request at one time. Only applies when --jobmode=local.

Example data

spaceranger mkfastq recognizes two file formats for describing samples: a simple, three-column CSV format, or the Illumina Experiment Manager (IEM) sample sheet format used by bcl2fastq. There is an example below for running mkfastq with each format.

The example (tiny-bcl) dataset is solely designed to demo the spaceranger mkfastq pipeline. It cannot be used to run downstream pipelines (e.g. spaceranger count).

To follow along, do the following:

  1. Download the tiny-bcl tar file.
  2. Untar the tiny-bcl tar file in a convenient location. This will create a new tiny-bcl subdirectory.
  3. Download the simple CSV layout file: spaceranger-tiny-bcl-simple-1.0.0.csv.
  4. Download the Illumina Experiment Manager sample sheet: spaceranger-tiny-bcl-samplesheet-1.0.0.csv.

Running mkfastq with a Simple CSV sample sheet

A simple CSV sample sheet is recommended for most sequencing experiments. The simple CSV format has only three columns (Lane, Sample, Index), and is thus less prone to formatting errors. You can see an example of this in spaceranger-tiny-bcl-simple-1.0.0.csv:

Lane,Sample,Index
1,test_sample,SI-TT-D9

Here are the options for each column:

LaneWhich lane(s) of the flow cell to process. Can be either a single lane, a range (e.g., 2-4) or '*' for all lanes in the flow cell.
SampleThe name of the sample. This name is the prefix to all the generated FASTQs, and corresponds to the --sample argument in all downstream 10x pipelines.
Sample names must conform to the Illumina bcl2fastq naming requirements. Only letters, numbers, underscores, and hyphens are allowed; no other symbols, including dots ("."), are allowed.
IndexThe 10x sample dual-index that was used in library construction, e.g., SI-TT-D9.

To run mkfastq with a simple layout CSV, use the --csv argument. Here's how to run mkfastq on the tiny-bcl sequencing run with the simple layout (replace code in red with the path to tiny_bcl on your system):

$ spaceranger mkfastq --id=tiny-bcl \
                     --run=/path/to/tiny_bcl \
                     --csv=spaceranger-tiny-bcl-simple-1.0.0.csv

Running mkfastq with an Illumina Experiment Manager sample sheet

The spaceranger mkfastq pipeline can also be run with a sample sheet in the Illumina Experiment Manager (IEM) format (example: spaceranger-tiny-bcl-samplesheet-1.0.0.csv). An IEM sample sheet has several fields specific to running on Illumina platforms, including a [Data] section where sample and index information is specified. spaceranger mkfastq supports listing either index set names or the oligo sequences.

Dual-indexing example

Version 1: "SI-TT-D9" refers to a 10x Genomics dual-index sample index, so mkfastq auto-detects that this is a dual-index sample. In this example, only reads from lane 1 will be used. To demultiplex the given sample index across all lanes, omit the Lane column entirely.

[Data]
Lane,Sample_ID,index
1,test_sample,SI-TT-D9

Version 2: The index sequences for "SI-TT-D9" are specified in the two index and index2 columns.

[Data]
Lane,Sample_ID,index,index2
1,test_sample,TGGTCCCAAG,ACGCCAGAGG

Sample names must conform to the Illumina bcl2fastq naming requirements. Specifically only letters, numbers, underscores, and hyphens are allowed. No other symbols, including dots ("."), are allowed.

Also note that while an authentic IEM sample sheet will contain other sections above the [Data] section, these are optional for demultiplexing. To avoid data loss from trimming, we do not recommend including adapter sequences in the [Settings] section of the sample sheet (see this article for details). For demultiplexing an existing run with spaceranger mkfastq, only the [Data] section is required.

Next, run the spaceranger mkfastq pipeline, using the --samplesheet argument (replace code in red with the path to tiny_bcl on your system):

$ spaceranger mkfastq --id=tiny-bcl \
                     --run=/path/to/tiny_bcl \
                     --samplesheet=spaceranger-tiny-bcl-samplesheet-1.0.0.csv

If you encounter any preflight errors, refer to the Troubleshooting page.

Checking FASTQ output

Once the spaceranger mkfastq pipeline has successfully completed, the output can be found in a new folder named with the value you provided to spaceranger mkfastq in the --id option (if not specified, defaults to the name of the flow cell):

$ ls -l
drwxr-xr-x 4 jdoe  jdoe     4096 Nov 14 12:05 tiny-bcl

The key output files can be found in outs/fastq_path, and are organized in the same manner as a conventional bcl2fastq run:

$ ls -l tiny-bcl/outs/fastq_path/
drwxr-xr-x 3 jdoe jdoe         3 Nov  14 12:26 Reports
drwxr-xr-x 2 jdoe jdoe         8 Nov  14 12:26 Stats
drwxr-xr-x 3 jdoe jdoe         3 Nov  14 12:26 tiny-bcl
-rw-r--r-- 1 jdoe jdoe  20615106 Nov  14 12:26 Undetermined_S0_L001_I1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe  51499694 Nov  14 12:26 Undetermined_S0_L001_R1_001.fastq.gz
-rw-r--r-- 1 jdoe jdoe 152692701 Nov  14 12:26 Undetermined_S0_L001_R2_001.fastq.gz
 
$ tree tiny-bcl/outs/fastq_path/tiny_bcl/
tiny-bcl/outs/fastq_path/tiny_bcl/
  Sample1
    Sample1_S1_L001_I1_001.fastq.gz
    Sample1_S1_L001_R1_001.fastq.gz
    Sample1_S1_L001_R2_001.fastq.gz

This example was produced with a sample sheet that included tiny-bcl as the Sample_Project, so the directory containing the sample folders is called tiny-bcl/. If a Sample_Project was not specified, or if a simple layout CSV file was used (which does not have a Sample_Project column), the directory containing the sample folders would be named according to the flow cell ID instead.

If you want to remove the Undetermined FASTQs from the output to save space, you can run mkfastq with the --delete-undetermined flag. To see all spaceranger mkfastq options, run spaceranger mkfastq --help.

Troubleshooting

If you encounter a crash while running spaceranger mkfastq, upload the tarball
(with the extension .mri.tgz) in your output directory. Replace the code in red with your email:

$ spaceranger upload [email protected] jobid.mri.tgz

where jobid is what you input into the --id option of mkfastq (if not specified, defaults to the ID of the flow cell). This tarball contains numerous diagnostic logs that we can use for debugging.

You will receive an automated email from 10x. If not, email [email protected]. For the fastest service, respond with the following: