This is documentation for the GemCode System.  Click here for Chromium System documentation.
HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
GemCode Genome & Exome

Using Long Ranger with SGE and LSF

The Long Ranger pipelines support launching stages on SGE- and LSF-based clusters. This cluster mode allows highly parallelizable stages to utilize hundreds or thousands of cores concurrently, dramatically reducing time to solution.

Running pipelines in cluster mode requires the following:

  1. Long Ranger is installed in the same location on all nodes of the cluster (e.g., /opt/longranger-1.3.1 or /net/apps/longranger-1.3.1)
  2. Long Ranger pipelines will be run on a shared file system that is accessible to all nodes of the cluster. NFS-mounted directories are are the most common solution to this requirement.
  3. The cluster will accept both single-core and multithreaded (shared-memory) jobs.

Configuring Cluster Integration

Installing the Long Ranger software on a cluster is identical to the installation procedure for local-mode (non-cluster) operation. After you have confirmed that the longranger pipelines can run in local mode, you must configure the job submission template that Long Ranger will use to submit jobs to your cluster. Assuming you installed Long Ranger to /opt/longranger-1.3.1, the process is as follows.

Step 1. Navigate to the Martian runtime's jobmanagers/ directory which contains example jobmanager templates.

$ cd /opt/longranger-1.3.1/martian-cs/1.3.1/jobmanagers
$ ls
bsub.template.example  config.json  sge.template.example

Step 2. Make a copy of your cluster's example template (SGE or LSF) to either sge.template or lsf.template in this jobmanagers/ directory.

$ cp -v sge.template.example sge.template
`sge.template.example' -> `sge.template'
$ ls
bsub.template.example  config.json  sge.template  sge.template.example

Step 3. Edit this template file and make the necessary modifications that may be required by your specific cluster.

$ nano sge.template
...
 
$ cat sge.template
#$ -N __MRO_JOB_NAME__
#$ -V
#$ -pe threads __MRO_THREADS__
#$ -l mem_free=__MRO_MEM_GB__G
#$ -cwd
#$ -o __MRO_STDOUT__
#$ -e __MRO_STDERR__
 
__MRO_CMD__

If you are using an SGE cluster, you MUST modify the #$ -pe <pe_name> line of the example template to reflect the name of your cluster's multithreaded parallel environment (e.g., threads in the above example). You can view a list of your cluster's parallel environments using the qconf -spl command.

The most common modifications to the job submission template include adding additional lines to specify:

These job submission templates contain a number of special variables, contained within double underscores, that are substituted by the Martian runtime when each stage is being submitted. Specifically, the following variables will be expanded when a pipeline is submitting jobs to the cluster:

VariableMust be present?Description
__MRO_JOB_NAME__YesJob name composed of the sample ID and stage being executed
__MRO_THREADS__NoNumber of threads required by the stage
__MRO_MEM_GB__
__MRO_MEM_MB__
NoAmount of memory (in GB or MB) required by the stage
__MRO_MEM_GB_PER_THREAD__
__MRO_MEM_MB_PER_THREAD__
NoAmount of memory (in GB or MB) required per thread in multi-threaded stages.
__MRO_STDOUT__
__MRO_STDERR__
YesPaths to the _stdout and _stderr metadata files for the stage
__MRO_CMD__YesBourne shell command to run the stage code

It is critical that the special variables listed as required are present in the final template you create. If you are unsure of how this template should appear for your cluster, consult your cluster's administrator or help desk.

Validating Template Configuration

To run a Long Ranger pipeline in cluster mode, simply add the --jobmode=sge or --jobmode=lsf command-line option when using the longranger commands. The pipeline orchestration will still occur on your local machine, but individual stages will be submitted to your cluster as they become eligible to execute.

To validate that cluster mode is properly configured, you can follow the same validation instructions given for longranger in the Installation page but add --jobmode=sge or --jobmode=lsf.

$ longranger demux --run=./tiny-bcl --jobmode=sge
 
Martian Runtime - 1.3.1
 
Running preflight checks (please wait)...
2015-04-11 16:44:16 [runtime] (ready)           ID.HAWT7ADXX.BCL_PROCESSOR_CS.BCL_PROCESSOR.BARCODE_AWARE_BCL2FASTQ
2015-04-11 16:44:16 [runtime] (ready)           ID.HAWT7ADXX.BCL_PROCESSOR_CS.BCL_PROCESSOR.ANALYZE_RUN
...

If you check your job queue, you will begin to see stages queuing up:

$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
8675309 0.56000 ID.HAWT7AD jdoe         qw    01/01/2012 12:00:00 [email protected]       1
8675310 0.55500 ID.HAWT7AD jdoe         qw    01/01/2012 12:00:00 [email protected]       1

If you encounter a pipeline failure:

[error] Pipestance failed. Please see log at:
HAWT7ADXX/BCL_PROCESSOR_CS/BCL_PROCESSOR/BCL_PROCESSOR_PREFLIGHT/fork0/chnk0/_errors
 
Saving diagnostics to HAWT7ADXX/HAWT7ADXX.debug.tgz
For assistance, upload this file to 10x by running:
 
uploadto10x <your_email> HAWT7ADXX/HAWT7ADXX.debug.tgz

And the _errors file contains a jobcmd error:

$ cat HAWT7ADXX/BCL_PROCESSOR_CS/BCL_PROCESSOR/BCL_PROCESSOR_PREFLIGHT/fork0/chnk0/_errors
 
jobcmd error:
exit status 1

You likely have an invalid job submission template. This jobcmd error occurs when the job submission via qsub or bsub commands failed.

Cluster Mode Mechanics

After configuring Long Ranger for cluster mode, the longranger pipelines can be run with --jobmode=sge or --jobmode=lsf. This will make the underlying Martian pipeline framework launch each stage through the qsub or bsub commands when running in SGE or LSF modes, respectively. As stages' jobs are queued, launched, and completed, the pipeline framework will track their states using the metadata files that each stage maintains in the pipeline output directory.

Just as with local-mode pipelines, cluster-mode pipelines can be restarted after failure and maintain the same order of execution for dependent subsections of the pipeline. All of the stage code that is executed is identical to that of local mode, and the quantitative results will be identical to the limit of each stage's reproducibility.

In addition, the Long Ranger UI can still be used with cluster mode. Because the Martian pipeline framework runs on the node from which the command was issued, the UI will also run from that node.

Memory Requests and Consumption

Stages in the Long Ranger pipelines each request a specific number of cores and memory to aid with resource management. These values are used to prevent oversubscription of the computing system when running pipelines in local (non-cluster) mode, but the way in which CPU and memory requests are handled in cluster mode is defined by (1) how the __MRO_THREADS__ and __MRO_MEM_GB__ variables are used within the job template and (2) how your specific cluster's job manager schedules resources.

SGE / Grid Engine

SGE supports requesting memory via the mem_free resource natively, although your cluster may have another mechanism for requesting memory. To pass each stage's memory request through to SGE, add an additional line to your sge.template that requests mem_free, h_vmem, h_rss, or the custom memory resource defined by your cluster:

$ cat sge.template
#$ -N __MRO_JOB_NAME__
#$ -V
#$ -pe threads __MRO_THREADS__
#$ -l mem_free=__MRO_MEM_GB__G
#$ -cwd
#$ -o __MRO_STDOUT__
#$ -e __MRO_STDERR__
 
__MRO_CMD__

Note that the h_vmem (virtual memory) and mem_free/h_rss (physical memory) represent two different quantities, and that Long Ranger stages' __MRO_MEM_GB__ requests are expressed as physical memory. As such, using h_vmem in your job template may cause certain stages to be unduly killed if their virtual memory consumption is substantially larger than their physical memory consumption. It follows that we do not recommend using h_vmem.

Platform LSF

LSF supports job memory requests through the -M and -R [mem=...] options, but these requests generally must be expressed in MB, not GB. As such, your LSF job template should use the __MRO_MEM_MB__ variable rather than __MRO_MEM_GB__. For example,

$ cat bsub.template
#BSUB -J __MRO_JOB_NAME__
#BSUB -n __MRO_THREADS__
#BSUB -o __MRO_STDOUT__
#BSUB -e __MRO_STDERR__
#BSUB -R "rusage[mem=__MRO_MEM_MB__]"
#BSUB -R span[hosts=1]
 
__MRO_CMD__

Requesting Memory via Cores

For clusters whose job managers do not support memory requests, it is possible to request memory in the form of cores via the --mempercore command-line option. This option will scale up the number of threads requested via the __MRO_THREADS__ variable according to how much memory a stage requires when given to the ratio of memory on your nodes.

For example, given a cluster whose nodes have 16 cores and 128 GB of memory (8 GB per core), the following pipeline invocation command

$ longranger demux --run=./tiny-bcl --jobmode=sge --mempercore=8

will issue the following resource requests:

As the final bullet point illustrates, this mode can result in wasted CPU cycles and is only provided for clusters that cannot allocate memory as an independent resource.

Every cluster configuration is different, so if you are unsure of how your cluster resource management is configured, please contact your cluster administrator or help desk.

Rate Limiting Job Submissions

Some Long Ranger pipeline stages are divided into hundreds of jobs. By default, the rate at which these jobs are submitted to the cluster is throttled to at most 64 at a time and at least 100ms between each submission to avoid running into limits on clusters which impose quotas on the total number of pending jobs a user can submit.

If your cluster does not have such limits or is not shared with other users, you can control how the Martian pipeline runner sends job submissions to your cluster by using the --maxjobs and --jobinterval parameters.

You can increase the cap on the number of concurrent jobs to 200 with the --maxjobs parameter:

$ longranger run --id=sample ... --jobmode=sge --maxjobs=200

You may also change the rate limit on how often the Martian pipeline runner sends submissions to the cluster. To add a five-second pause between job submissions, use the --jobinterval parameter:

$ longranger run --id=sample ... --jobmode=sge --jobinterval=5000

The job interval parameter is in milliseconds. The minimum allowable value is 1.

Overriding Default Stage Memory and Thread Requests

Each stage makes a request for number of threads and maximum free memory. These values are hardcoded into each stage, and determined empirically from looking at in-house data runs, as well as reports from our customers. You may find that on your data, certain stages do not require as much memory as requested, or may require more memory than our defaults. The latter is more serious, as clusters may impose strict memory limits, and kill a job if those limits are exceeded.

You can override the defaults of a stage by supplying an override.json file, and specifying this file as the --override argument to your pipeline. Here is an example of an override JSON file to Long Ranger, which overrides the memory requests of the LOUPE_PREPROCESS stage for the wgs and targeted pipelines:

{
  "PHASER_SVCALLER_CS.PHASER_SVCALLER.LOUPE_PREPROCESS": {
    "split.mem_gb": 2,
    "chunk.mem_gb": 24,
    "join.mem_gb": 2,
    "chunk.threads": 2
  },
  "PHASER_SVCALLER_EXOME_CS.PHASER_SVCALLER_EXOME.LOUPE_PREPROCESS": {
    "split.mem_gb": 2,
    "chunk.mem_gb": 24,
    "join.mem_gb": 2,
    "chunk.threads": 2
  }
}

This configuration will reduce the amount of memory requested for the Loupe stage's split and join substages (default 6GB), but increase the memory and threads requested for the main chunk (originally 16GB and 1 thread from the Loupe stage definition).

To run a pipeline with the above configuration, supply the JSON file as the --override parameter:

$ longranger wgs --id=sample ... --jobmode=sge --override=./loupe_override.json

Overrides apply to pipelines executed both on the cluster and in local mode.

Common Override Settings

Below are some overrides that have been helpful when processing samples outside of the normal Long Ranger operating window. Long Ranger issues may be caused by extreme genomes (large total size or many contigs), very high depth, very low molecule sizes, or using a reference very distant from the sample. Some or all of the following overrides may be helpful to successfully run problematic samples:

{
  "PHASER_SVCALLER_CS.PHASER_SVCALLER.LOUPE_PREPROCESS": { "chunk.mem_gb": 64 },

  "PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.PHASE_SNPINDELS": { "chunk.mem_gb": 6, "join.threads": 4 },

  "PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.ANALYZE_SNPINDEL_CALLS": { "join.mem_gb": 32, "join.threads": 2 },

  "PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER._SNPINDEL_CALLER.POPULATE_INFO_FIELDS": { "chunk.mem_gb": 8 },

  "PHASER_SVCALLER_CS.PHASER_SVCALLER._LINKED_READS_ALIGNER.MERGE_POS_BAM": { "chunk.mem_gb": 10 },

  "PHASER_SVCALLER_CS.PHASER_SVCALLER._REPORTER.FILTER_BARCODES": { "join.mem_gb": 30 }
}