HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Genome & Exome

Running Multi-Library Samples

Aggregation and Custom Configuration

Long Ranger can be used to analyze multiple libraries. Depending on your exact scenario, the approach may be different. Here are two different circumstances:

  1. You created one library, but sequenced it more than once. This might have been to increase coverage depth, or for some other reason. You might have sequenced the library across different lanes of the same flowcell, or across multiple flow cells. In any of these cases, provided that this all comes from a single library, you can analyze these runs as a single sample using longranger wgs or targeted by Specifying Input FASTQs.

  2. You created multiple libraries from the same sample. If it is necessary to pool multiple libraries and analyze them as a single sample, you can do that. It just requires learning a little more about exactly how Long Ranger works, and writing a customized pipeline configuration file, called an MRO. This page covers this final scenario.

What is an MRO file?

Long Ranger uses a pipeline management framework called Martian. The pipeline and each stage in it is specified by a configuration file called an MRO file. Usually, the longranger commands create the appropriate MRO files for you, but in the case that you want to do something outside the normal workflows, it is possible to create a custom MRO file to directly exercise the full range of features.

In the example below we describe how to construct an MRO file to specify multiple libraries as well as multiple flow cells (since most often the multiple libraries will have been sequenced on different runs).

Understanding the Pipeline Invocation MRO

The easiest way to write your own MRO is to start with the MRO from a previous pipeline. Assuming you have already run an analysis on the same pipeline (meaning, wgs or targeted as appropriate) on a single-flowcell sample (e.g., sample345), examine the _invocation file contained in its output directory.

Note: this example assumes that the input flowcells were processed with longranger mkfastq.

$ cat sample345/_invocation
 
@include "phaser_svcaller_cs.mro"
 
call PHASER_SVCALLER_CS(
    fastq_mode = "BCL_PROCESSOR",
    sample_id = "sample345",
    sample_def = [
        {
            "bc_in_read": 1,
            "bc_length": 16,
            "gem_group": null,
            "lanes": null,
            "read_path": "/home/jdoe/runs/HBA2TADXX",
            "sample_indices": [ "any" ]
        }
    ],
    reference_path = "/opt/refdata-hg19-2.1.0",
    sample_desc = "",
    sex = "f",
    targets = null,
    vc_mode = "freebayes",
    vc_ground_truth = null,
    restrict_locus = null
)

The sample_def argument controls the parameters used to define this sample and is a JSON-encoded list of maps that define:

Make a copy of this _invocation file; this will be the MRO from which we will build our multi-library invocation MRO.

Analyzing Multiple Libraries From A Single Sample Across Multiple Flowcells

Continuing with the example MRO above, we would make the following changes:

  1. Give the sample an appropriate sample_id. This corresponds to the --id option used in a normal longranger wgs analysis.
  2. Duplicate the dict contained in the sample_def definition as a second item in the sample_def list. Make sure that sample_names reflects the actual sample names encoded in the FASTQ filenames for the respective flow cells. This corresponds to the --sample option used in a normal longranger wgs analysis.
  3. Change the read_path for each of these sample_def objects to point to the locations of their respective FASTQ output directories. This corresponds to the --fastqs option used in a normal longranger wgs analysis.
  4. Change lanes and/or sample_names to reflect the flowcell configuration used in sequencing, if necessary.
  5. Change the gem_group to incrementally increasing integers, starting with 1. This is the part that command line options don't handle.
$ cp sample345/_invocation sample345-multi.mro
$ nano sample345-multi.mro
...
 
$ cat sample345-multi.mro
 
@include "phaser_svcaller_cs.mro"
 
call PHASER_SVCALLER_CS(
    fastq_mode = "BCL_PROCESSOR",
    sample_id = "sample345-multi",
    sample_def = [
        {
            "bc_in_read": 1,
            "bc_length": 16,
            "gem_group": 1,
            "lanes": null,
            "read_path": "/home/jdoe/runs/HAWT7ADXX",
            "sample_indices": [ "any" ]
        },
        {
            "bc_in_read": 1,
            "bc_length": 16,
            "gem_group": 2,
            "lanes": null,
            "read_path": "/home/jdoe/runs/HAWPUADXX",
            "sample_indices": [ "any" ]
        }
    ],
    sample_desc = "",
    reference_path = "/opt/refdata-hg19-2.1.0",
    sex = "f",
    targets = null,
    vc_mode = "freebayes",
    vc_ground_truth = null,
    restrict_locus = null
)

The cellular barcode sequences will include suffixes from the different gem groups, i.e. libraries.

AGAATGGTCTGCAT-1
CTGATCGATATCGA-1
GTAGCAACGTCGTA-2
AGAATGGTCTGCAT-2

This is how Long Ranger prevents the same barcode from different sets of molecules in different libraries from being erroneously combined based only on the barcode sequence.

Running Long Ranger

Once you have this single-sample, multi-library, multi-flowcell MRO, confirm that its syntax is valid with longranger mrc, the MRO compiler included with Long Ranger:

$ longranger mrc sample345-multi.mro
Successfully compiled 1 mro files.

Then run the MRO file using longranger's alternate MRO-mode syntax:

$ longranger wgs sample345-multi sample345-multi.mro --uiport=3600
Martian Runtime - 2.2.2
Serving UI at http://localhost:3600                                             
 
Running preflight checks (please wait)...
2016-05-01 12:00:00 [runtime] (ready)           ID.sample345-multi.PHASER_SVCALLER_CS.PHASER_SVCALLER._ALIGNER.SETUP_CHUNKS
2016-05-01 12:00:00 [runtime] (run:local)       ID.sample345-multi.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH
2016-05-01 12:00:00 [runtime] (run:local)       ID.sample345-multi.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH.fork0.chnk0.main

where