HOME  ›   pipelines

# Running Multi-Library Samples

One scenario not covered by the built-in cellranger count FASTQ input options is that of multiple libraries from the same sample that need to be pooled and analyzed as a single sample. This is different from the usual case of combining different libraries representing different samples into a single analysis using cellranger aggr.

The cellranger count pipeline allows combining multiple libraries from the same sample into a single cellranger run. To do so, you will have to write your own MRO file for the cellranger pipeline. MRO is the language used to define pipelines to the Martian pipeline framework which is responsible for managing pipeline execution. The cellranger command is simply a shell script that converts command line arguments into an MRO file which is passed to the Martian pipeline execution command, cellranger mrp, and writing MROs directly allows you to access the full range of options available for each pipeline.

In the example below we describe how to construct an MRO file to specify multiple libraries as well as multiple flow cells (since most often the multiple libraries will have been sequenced on different runs).

## Understanding the Pipeline Invocation MRO

The easiest way to write your own MRO is to start with the MRO from a previous pipeline. Assuming you have already run a single-flowcell sample (e.g., sample345), examine the _invocation file contained in its output directory.

Note: this example assumes that the input flowcells were processed with cellranger mkfastq.

$cat sample345/_invocation @include "sc_rna_counter_cs.mro" call SC_RNA_COUNTER_CS( sample_id = "sample345", sample_def = [ { "fastq_mode": "ILMN_BCL2FASTQ", "gem_group": null, "lanes": null, "read_path": "/home/jdoe/runs/HBA2TADXX", "sample_indices": [ "any" ], "sample_names": [ "Sample1" ] } ], sample_desc = "", reference_path = "/opt/refdata-cellranger-GRCh38-1.2.0", recovered_cells = null, force_cells = null, no_secondary_analysis = false, )  The sample_def argument controls the parameters used to define this sample and is a JSON-encoded list of maps that define: • fastq_mode - set this to "ILMN_BCL2FASTQ" • gem_group - indicates GEM chip channel corresponding to a single sample across multiple flowcells. This field will be described in more detail in the next section. • lanes - a list of lanes from this flowcell to be included in this sample (e.g., [ 1, 2 ], [ 2 ], etc) or null to use all lanes • read_path - a directory containing FASTQs from a single flowcell • sample_indices - set this to "any" when working with mkfastq output • sample_names - a list of names associated with particular sample indices (as specified in the mkfastq sample sheet for this flowcell) Make a copy of this _invocation file; this will be the MRO from which we will build our multi-library invocation MRO. ## Analyzing Multiple Libraries From A Single Sample Across Multiple Flowcells Continuing with the example MRO above, we would make the following changes: 1. Give the sample an appropriate sample_id. This corresponds to the --id option used in a normal cellranger count analysis. 2. Duplicate the dict contained in the sample_def definition as a second item in the sample_def list. Make sure that sample_names reflects the actual sample names encoded in the FASTQ filenames for the respective flow cells. This corresponds to the --sample option used in a normal cellranger count analysis. 3. Change the read_path for each of these sample_def objects to point to the locations of their respective FASTQ output directories. This corresponds to the --fastqs option used in a normal cellranger count analysis. 4. Change lanes and/or sample_names to reflect the flowcell configuration used in sequencing, if necessary. 5. Change gem_group to to incrementally increasing integers, starting with 1. This is the part that the normal command-line options don't handle. $ cp sample345/_invocation sample345-multi.mro
$nano sample345-multi.mro ...$ cat sample345-multi.mro

@include "sc_rna_counter_cs.mro"

call SC_RNA_COUNTER_CS(
sample_id = "sample345-multi",
sample_def = [
{
"fastq_mode": "ILMN_BCL2FASTQ",
"gem_group": 1,
"lanes": null,
"sample_indices": [ "any" ],
"sample_names": [ "Sample1" ]
},
{
"fastq_mode": "ILMN_BCL2FASTQ",
"gem_group": 2,
"lanes": null,
"sample_indices": [ "any" ],
"sample_names": [ "Sample1" ]
}
],
sample_desc = "",
reference_path = "/opt/refdata-cellranger-GRCh38-1.2.0",
recovered_cells = null,
force_cells = null,
no_secondary_analysis = false,
)


The cellular barcode sequences will include suffixes from the different gem groups, i.e. libraries.

AGAATGGTCTGCAT-1
CTGATCGATATCGA-1
GTAGCAACGTCGTA-2
AGAATGGTCTGCAT-2


This is how cellranger prevents the same barcode from different cells in different libraries from being erroneously combined into a single cell based only on the barcode sequence.

## Running Cell Ranger

Once you have this single-sample, multi-library, multi-flowcell MRO, confirm that its syntax is valid with cellranger mrc, the MRO compiler included with Cell Ranger:

$cellranger mrc sample345-multi.mro Successfully compiled 1 mro files.  Then run the MRO file using cellranger's alternate MRO-mode syntax: $ cellranger count sample345-multi sample345-multi.mro --uiport=3600
Martian Runtime - 2.2.2
Serving UI at http://localhost:3600