10x Genomics
Chromium Single Cell Immune Profiling

Cell Ranger7.1, printed on 03/27/2025

Specifying Input FASTQ Files for cellranger multi

The cellranger pipeline requires FASTQ files as input, which typically come from running cellranger mkfastq, a 10x-aware convenience wrapper for bcl2fastq. However, it is possible to use FASTQ files from other sources, such as Illumina's bcl2fastq, Illumina's BCL Convert, a published dataset, or our bamtofastq.

Here are the columns available in the [libraries] section of the multi config CSV for specifying which FASTQ files cellranger multi should use:

Column	Brief Description
`fastq_id`	(Required) The Illumina sample name to analyze. This will be as specified in the sample sheet supplied to `mkfastq` or `bcl2fastq`. Multiple names may be supplied as a comma-separated list, in which case they will be treated as one sample.
`fastqs`	(Required) The folder containing the FASTQ files to be analyzed. Generally, this will be the `fastq_path` folder generated by `cellranger mkfastq`.
`feature_types`	(Required) The underlying feature type of the library, which must be one of 'Gene Expression', 'VDJ', 'VDJ-T', 'VDJ-B', 'Antibody Capture', 'CRISPR Guide Capture', or 'Antigen Capture'.
`lanes`	(Optional) Lanes associated with this sample. Defaults to using all lanes.

FASTQ file naming convention

There are a wide range of ways bcl2fastq and mkfastq can be invoked, resulting in a wide range of potential file names and locations as the output.

To serve as inputs for cellranger, FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq:

[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz

Where Read Type is one of:

I1: Sample index read (optional)
I2: Sample index read (optional)
R1: Read 1
R2: Read 2

The FASTQ files are specified by providing the path to the folder containing them (via the fastqs column) and their Illumina sample name (via the fastq_id column) and optionally restricting the selection further by specifying the lanes of interest.

Finding the right FASTQ files to process and the right arguments to process those files as desired can be confusing. To assist users, this page illustrates examples of how to handle common scenarios involving different FASTQ file folder hierarchies or naming conventions.

Quick Start:

Where are your FASTQ files?

In an output folder from mkfastq or bcl2fastq (fastq_path) and:
In a different folder:
- I don't see Reports or Stats anywhere. The files are named like "MySample_S1_L001_I1_001.fastq.gz"

How are they named?

Consistent with bcl2fastq/mkfastq, e.g. "mysample_S1_L001_I1_001.fastq.gz" (see above).
Like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz".
Unlike any of the above examples.

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in a subdirectory next to Reports and Stats folders, with expected sample name prefixes

How did I get here?

By running mkfastq with a simple CSV layout file or Illumina Experiment Manager samplesheet, or by running bcl2fastq directly (with an IEM samplesheet) on a flowcell. If you ran mkfastq, your files will be in a (MKFASTQ_ID)/outs/fastq_path folder, and your file hierarchy probably looks something like this:

MKFASTQ_ID
|-- MAKE_FASTQS_CS
|-- outs
    |-- fastq_path
        |-- HFLC5BBXX
            |-- test_sample1
            |   |-- test_sample1_S1_L001_I1_001.fastq.gz
            |   |-- test_sample1_S1_L001_R1_001.fastq.gz
            |   |-- test_sample1_S1_L001_R2_001.fastq.gz
            |   |-- test_sample1_S1_L002_I1_001.fastq.gz
            |   |-- test_sample1_S1_L002_R1_001.fastq.gz
            |   |-- test_sample1_S1_L002_R2_001.fastq.gz
            |   |-- test_sample1_S1_L003_I1_001.fastq.gz
            |   |-- test_sample1_S1_L003_R1_001.fastq.gz
            |   |-- test_sample1_S1_L003_R2_001.fastq.gz
            |-- test_sample2
            |   |-- test_sample2_S2_L001_I1_001.fastq.gz
            |   |-- test_sample2_S2_L001_R1_001.fastq.gz
            |   |-- test_sample2_S2_L001_R2_001.fastq.gz
            |   |-- test_sample2_S2_L002_I1_001.fastq.gz
            |   |-- test_sample2_S2_L002_R1_001.fastq.gz
            |   |-- test_sample2_S2_L002_R2_001.fastq.gz
            |   |-- test_sample2_S2_L003_I1_001.fastq.gz
            |   |-- test_sample2_S2_L003_R1_001.fastq.gz
            |   |-- test_sample2_S2_L003_R2_001.fastq.gz
        |-- Reports
        |-- Stats
        |-- Undetermined_S0_L001_I1_001.fastq.gz
        ...
        |-- Undetermined_S0_L003_R2_001.fastq.gz

If you ran bcl2fastq directly, then the output root folder would be where fastq_path is in the hierarchy above.

"Expected sample name prefixes" means you have one set of fastq files per sample, prefixed with the name of the sample as it appears in the simple CSV layout file or IEM samplesheet. Other situations described later on this page deal with the presence of four separate sets of files (four "samples" from bcl2fastq's point of view) per single biological sample/library.

For more information on the naming conventions, please visit Illumina's support site or refer to the bcl2fastq User Guide. The scenario where your files do not conform to the naming convention is described in a different section later on this page.

The table below describes the arguments you would pass into any analysis pipeline to target the right fastq files in this scenario. Be sure to substitute the capitalized text as appropriate. Also note that in most cases you will be passing a single sample into any given pipeline. Exceptions to this are described in the documentation for the individual pipelines.

Situation	[libraries] section of multi config CSV
Gene Expression and V(D)J (mkfastq), one flowcell	`[libraries] fastq_id,fastqs,feature_types test_sample1,MKFASTQ_ID/outs/fastq_path,Gene Expression test_sample2,MKFASTQ_ID/outs/fastq_path,VDJ`
Gene Expression and V(D)J (mkfastq), multiple flowcells	`[libraries] fastq_id,fastqs,feature_types test_sample1,MKFASTQ_ID/outs/fastq_path1,Gene Expression test_sample2,MKFASTQ_ID/outs/fastq_path2,VDJ`
Gene Expression and V(D)J (bcl2fastq direct)	`[libraries] fastq_id,fastqs,feature_types test_sample1,/PATH/TO/bcl2fastq_output,Gene Expression test_sample2,/PATH/TO/bcl2fastq_output,VDJ`
Gene Expression and V(D)J from lanes 1 and 3 only (mkfastq)	`[libraries] fastq_id,fastqs,lanes,feature_types test_sample1,MKFASTQ_ID/outs/fastq_path,1\|3,Gene Expression test_sample2,MKFASTQ_ID/outs/fastq_path,1\|3,VDJ`

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, but there are multiple folders per sample index, like "SI-GA-A1_1" and "SI-GA-A1_2"

How did I get here?

It is likely that an input samplesheet was used that explicitly separated the four oligos in a 10x sample index set into four separate sample names. You may see a file hierarchy like this:

bcl2fastq_output
|-- HFLC5BBXX
    |-- SI-GA-A1_1
    |   |-- SI-GA-A1_1_S1_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_1_S1_L001_R1_001.fastq.gz
    |   |-- SI-GA-A1_1_S1_L001_R2_001.fastq.gz
    |-- SI-GA-A1_2
    |   |-- SI-GA-A1_2_S2_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_2_S2_L001_R1_001.fastq.gz
    |   |-- SI-GA-A1_2_S2_L001_R2_001.fastq.gz
    |-- SI-GA-A1_3
    |   |-- SI-GA-A1_3_S3_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_3_S3_L001_R1_001.fastq.gz
    |   |-- SI-GA-A1_3_S3_L001_R2_001.fastq.gz
    |-- SI-GA-A1_4
    |   |-- SI-GA-A1_4_S4_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_4_S4_L001_R1_001.fastq.gz
    |   |-- SI-GA-A1_4_S4_L001_R2_001.fastq.gz
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
|-- Undetermined_S0_L001_R1_001.fastq.gz
|-- Undetermined_S0_L001_R2_001.fastq.gz

You probably want to be able to merge All samples from the SI-GA-A1 index into a single analysis. If you only run one index at a time, you will see a smaller number of reads than expected, which may translate to lower coverage or cell count than you expect for your experiment.

Situation	[libraries] section of multi config CSV
Process all `SI-GA-A1` reads in a single analysis	`[libraries] fastq_id,fastqs,feature_types SI-GA-A1_1,MKFASTQ_ID/outs/fastq_path,Gene Expression SI-GA-A1_2,MKFASTQ_ID/outs/fastq_path,Gene Expression SI-GA-A1_3,MKFASTQ_ID/outs/fastq_path,Gene Expression SI-GA-A1_4,MKFASTQ_ID/outs/fastq_path,Gene Expression`
Only process first sample index	`[libraries] fastq_id,fastqs,feature_types SI-GA-A1_1,MKFASTQ_ID/outs/fastq_path,Gene Expression`

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in the same directory as the Reports and Stats folders

How did I get here?

An Illumina Experiment Manager-formatted samplesheet was used with either no entry or a blank entry for the Sample_Project column. Your hierarchy likely looks something like this:

fastq_path
|-- Reports
|-- Stats
|-- test_sample_S1_L001_I1_001.fastq.gz
|-- test_sample_S1_L001_R1_001.fastq.gz
|-- test_sample_S1_L001_R2_001.fastq.gz
|-- test_sample_S1_L002_I1_001.fastq.gz
|-- test_sample_S1_L002_R1_001.fastq.gz
|-- test_sample_S1_L002_R2_001.fastq.gz
|-- test_sample_S1_L003_I1_001.fastq.gz
|-- test_sample_S1_L003_R1_001.fastq.gz
|-- test_sample_S1_L003_R2_001.fastq.gz
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
|-- Undetermined_S0_L003_R2_001.fastq.gz

This is fine; you would use the same arguments as if the FASTQs were organized into subfolders within the output folder.

Situation	[libraries] section of multi config CSV
Process `test_sample` from all lanes (mkfastq)	`[libraries] fastq_id,fastqs,feature_types test_sample,MKFASTQ_ID/outs/fastq_path,Gene Expression`
Process `test_sample` from lane 1 only (mkfastq)	`[libraries] fastq_id,fastqs,lanes,feature_types test_sample,MKFASTQ_ID/outs/fastq_path,1,Gene Expression`

Scenario: My FASTQs are in a different folder; I don't see Reports or Stats anywhere. The files are named like "MySample_S1_L001_I1_001.fastq.gz"

How did I get here?

It is likely that FASTQ files have been transferred from either a mkfastq or bcl2fastq run into another folder. They still retain the names assigned by bcl2fastq, which is a combination of sample name, sample order, lane, read type, and chunk. Your file hierarchy may look like this:

PROJECT_FOLDER
|-- MySample_S1_L001_I1_001.fastq.gz
|-- MySample_S1_L001_R1_001.fastq.gz
|-- MySample_S1_L001_R2_001.fastq.gz
|-- MySample_S1_L002_I1_001.fastq.gz
|-- MySample_S1_L002_R1_001.fastq.gz
|-- MySample_S1_L002_R2_001.fastq.gz

This is fine; since the files are named according to the bcl2fastq standard, you would use the same arguments as if the FASTQs were organized into a flowcell folder or mkfastq output folder.

Situation	[libraries] section of multi config CSV
Process `MySample` from all lanes	`[libraries] fastq_id,fastqs,feature_types MySample,/PATH/TO/PROJECT_FOLDER,Gene Expression`
Process `MySample` from lane 1 only	`[libraries] fastq_id,fastqs,lanes,feature_types test_sample,/PATH/TO/PROJECT/FOLDER,1,Gene Expression`

My FASTQs are named like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz"

How did I get here?

The 10x demux pipeline was used to demultiplex the flowcell instead of mkfastq. This pipeline has been deprecated and cellranger no longer directly supports using FASTQ files in this layout. Please contact support@10xgenomics.com for assistance.

My FASTQs are not named like any of the above examples.

How did I get here?

It is likely that you received files that were processed through a proprietary LIMS system, which employs its own naming conventions.

10x pipelines need files named in the bcl2fastq convention in order to run properly. You will need to determine which file corresponds to which sample and which read type, likely by consulting your sequencing core or the individual who demultiplexed your flowcell.

It is highly likely that these files were initially processed with bcl2fastq, so you will need to rename the files in the following format, once you track down their origin:

[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz

Where Read Type is one of:

I1: Sample index read (optional)
I2: Sample index read (optional)
R1: Read 1
R2: Read 2

After you have renamed those files into that format, you'll use the following arguments:

Situation	[libraries] section of multi config CSV
Process `SAMPLENAME` from all lanes	`[libraries] fastq_id,fastqs,feature_types SAMPLENAME,/PATH/TO/PROJECT_FOLDER,Gene Expression`
Process `SAMPLENAME` from lane 1 only	`[libraries] fastq_id,fastqs,lanes,feature_types test_sample,/PATH/TO/PROJECT/FOLDER,1,Gene Expression`

Cell Ranger

Loupe

10x Genomics
Chromium Single Cell Immune Profiling

Specifying Input FASTQ Files for cellranger multi

FASTQ file naming convention

Quick Start:

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in a subdirectory next to Reports and Stats folders, with expected sample name prefixes

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, but there are multiple folders per sample index, like "SI-GA-A1_1" and "SI-GA-A1_2"

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in the same directory as the Reports and Stats folders

Scenario: My FASTQs are in a different folder; I don't see Reports or Stats anywhere. The files are named like "MySample_S1_L001_I1_001.fastq.gz"

My FASTQs are named like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz"

My FASTQs are not named like any of the above examples.

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger

Loupe

10x GenomicsChromium Single Cell Immune Profiling

Specifying Input FASTQ Files for cellranger multi

FASTQ file naming convention

Quick Start:

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in a subdirectory next to Reports and Stats folders, with expected sample name prefixes

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, but there are multiple folders per sample index, like "SI-GA-A1_1" and "SI-GA-A1_2"

Scenario: My FASTQs are in an output folder from mkfastq or bcl2fastq, in the same directory as the Reports and Stats folders

Scenario: My FASTQs are in a different folder; I don't see Reports or Stats anywhere. The files are named like "MySample_S1_L001_I1_001.fastq.gz"

My FASTQs are named like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz"

My FASTQs are not named like any of the above examples.

10x Genomics
Chromium Single Cell Immune Profiling