Long Ranger1.3, printed on 11/24/2024
When the longranger demux or longranger run pipelines fail, they will automatically generate a "debug tarball" that contains the logs and metadata generated by the pipestance leading up to failure. This file, named sampleid.mri.tgz, can be e-mailed to the 10x software team to help resolve any issues with using Long Ranger. You may also use the longranger upload command to send the tarball to 10x:
$ longranger upload [email protected] sample_id.mri.tgz
If you wish to troubleshoot a pipeline failure yourself, it is important to identify if you are experiencing a preflight failure, an in-flight failure, or an alert.
The remainder of this guide uses the term pipestance to refer to a specific instance of a pipeline running. |
Preflight failures are the most common and are the result of invalid input data or runtime parameters. Because they occur before the pipeline actually runs, there will be no pipeline output and the error is reported directly to your terminal.
Common preflight failures include failing to specify a reference or set a TENX_REFDATA environment variable, which generates the following error:
[error] Please set $TENX_REFDATA on workstation.university.edu to a directory with the contents of the 10x refdata-hg19 tarball (fasta, genes, regions, snps).
The TENX_REFDATA variable must be set to the directory that was
created when untarring the refdata-hg19-1.2.0.tar.gz package,
typically named refdata-hg19-1.2.0. You can
also specify this reference with the --reference
argument to the
longranger run or longranger align pipelines.
longranger demux will generate the following error if Illumina's bcl2fastq software is not installed:
[error] bcl2fastq or configureBclToFastq.pl not found on PATH on workstation.university.edu.
Whole Genome Mode of longranger run contains several pre-flight checks that validate the pre-called VCF you will be phasing. Common preflight failures include
[error] vcmode must be of the form 'freebayes', 'gatk:/path/to/GenomeAnalysisTK.jar', or 'precalled:path/to/vcf'
when a mis-named file is passed via the --vcmode=precalled:... command-line argument. Specifying a file that does not end in a .vcf suffix (including .vcf.gz, which is not a supported format for the pre-called VCF) will trigger this failure.
If a malformed or invalid VCF is supplied in Whole Genome Mode, you may also trigger the following failure:
[error] /home/jdoe/runs/malformed.vcf failed on parsing with PyVCF. Approximate line number of failure occured at 1. Traceback: Traceback (most recent call last): File "/home/jdoe/runs/longranger-1.3.1/longranger-cs/1.3.1/mro/stages/preflight/phaser_svcaller/__init__.py", line 134, in check_vcf record = vcf_iter.next() File "/home/jdoe/runs/longranger-1.3.1/anaconda/2.2.0/lib/python2.7/site-packages/vcf/parser.py", line 531, in next pos = int(row[1]) IndexError: list index out of range
This preflight check ensures that the VCF you supply can be parsed by the longranger run pipeline.
Exome Mode of longranger run contains several pre-flight checks that validate the targets BED file that you provide. Common preflight failures include
MRO TypeMismatchError: expected type 'bed' for 'targets' but got 'txt' instead at phaser_svcaller_cs.mro:12.
which enforces that the BED file you specify has the proper .bed suffix.
Long Ranger generally requires that files have the proper extension before accepting them as input. VCF files should be suffixed with .vcf, BED files with .bed, etc. |
If the BED file you provide is malformed, you may see this error:
[error] Error in BED file /home/jdoe/runs/malformed_agilent_v5_targs.bed Line 0: Too few fields. chrom, start position, and end position are required. Maybe file is not tab-delimited.
in which case you should re-download the targets BED file from your pulldown kit manufacturer to ensure your local copy is not corrupt.
In-flight failures are generally the result of factors external to the pipeline such as running out of system memory or disk space. Different stages may fail in different ways so the specific error messages vary widely.
There are a few important files that are saved to your pipeline output directory which, by default, is named according to the flowcell serial number for longranger demux (e.g., HAWT7ADXX) and your --id name for longranger run.
The pipeline execution log that is output to your terminal during pipeline execution is also saved to output_dir/_log.
Stages that experience a hard failure generate an _errors file containing the precise error that caused a stage to halt. You view these error logs, if they exist, using find output_dir -name _errors | xargs cat
Each stage also logs its stdout and stderr streams to _stderr and _stdout files. These logs can be listed using find output_dir -name _stderr and may contain elucidating error messages in stages that execute third-party applications such as BWA and GATK.
A more detailed description of the pipeline output directory and its contents is given in the Pipestance Structure page.
If you are unable to diagnose a failure yourself, you can always contact the 10x software support team for help. |
Once you have determined the reason for failure and are ready to continue running the pipeline, you can typically issue the same longranger run or longranger demux command to continue execution of the pipestance from the stage that originally failed.
When longranger run or longranger demux is run, it will detect if its intended output directory already exists. If it does, this existing pipeline output directory will be treated as an incomplete pipestance and resume execution. This feature allows pipelines to be stopped and resumed with great flexibility, but it can also result in errors such as:
RuntimeError: /home/jdoe/runs/sample345 is not a pipestance directory
which indicates that you specified a --id that corresponds to an existing directory that was not created by Long Ranger.
The following error:
RuntimeError: pipestance 'HAWT7ADXX' already exists and is locked by another Martian instance. If you are sure no other Martian instance is running, delete the _lock file in /home/jdoe/runs/HAWT7ADXX and start Martian again.
indicates that you may already have a copy of longranger run or longranger demux running that is using the same output directory. If you are sure that there is no pipestance running in the given output directory, you can either remove that output directory entirely (mv HAWT7ADXX HAWT7ADXX.old) to restart the pipestance from the beginning, or you can remove the pipestance's lock file (rm HAWT7ADXX/_lock) and re-run the longranger run command to resume pipeline execution.
If you encounter the following error when attempting to resume a pipestance:
RuntimeError: pipestance 'sample345' already exists with different invocation file /home/jdoe/runs/sample345/_invocation
you are attempting to resume a pipestance using command-line arguments that are different from those used to first run it. You can view the parameters input to the existing pipeline by examining the _log file located in the output directory (e.g., head -n20 /home/jdoe/runs/sample345/_log)
Alerts are generally the result of factors inherent in library preparation and sequencing instead of software. Abnormal data (including common short-read sequencing metrics and GemCode-specific statistics) are raised in the form of alerts that are printed in the pipeline output log and an output file called alerts_summary.txt. Alerts do not affect the operation of the pipeline, but they do highlight potential causes for abnormal or missing data.
Alerts come in two severity levels:
WARN alerts indicate that some parameter is suboptimal, but there may still be useful data in the pipeline output.
ERROR alerts indicate a major issue, and there is unlikely to usable results in the output.
For example, forgetting to specify the --targets parameter when running longranger run on an exome sample will run longranger run in Whole Genome Mode and result in the following alerts:
WARN [Low Mean Coverage Depth] -- A low sequencing depth was achieved. We recommend >20X coverage for whole genome samples. WARN [High Fraction of Genome with Zero Coverage] -- A high fraction of the genome had zero coverage.
These WARN alerts are not indicative of a lost sequencing run, and re-running longranger run with the proper targets file would return no alerts.
Poorly constructed libraries typically display more severe ERROR alerts. For example, a low-quality exome library make result in the following alerts:
WARN [Unmapped Fraction] -- An elevated unmapped read fraction was observed: 0.142355. This can indicate the wrong reference genomes,high contamination rate, or poor read quality WARN [Effective Barcode Diversity] -- A low barcode diversity was achieved: 35089.845815. ERROR [Mean Coverage Depth] -- A very low depth of coverage was achieved: 1.047583. ERROR [Fraction of Bases On Target] -- A very low fraction of bases were aligned to the target region: 0.163686
The presence of these ERROR alerts indicate that results that are output by this pipestance are likely dubious.