This is documentation for the GemCode System.  Click here for Chromium System documentation.
HOME  ›   pipelines

# Structural Variants in BEDPE Format

The principal output of the longranger run pipeline includes aligned reads with barcode and phasing information in BAM format, phased SNPs and indels in VCF format, and SV calls and candidates in BEDPE format. These are all standard file formats designed to interoperate with existing tools, and the additional information produced by the GemCode Platform are included as standards-compliant fields when appropriate.

The output of the SV calling code is BEDPE, a format similar to BED that describes pairs of genomic regions. Long Ranger uses this format to describe pairs of breakpoints that define a structural variant.

The BEDPE contains one SV per line with the following tab-delimited columns:

1. chrom1 - chromosome of the first breakpoint

2. start1 - start position of the first breakpoint

3. end1 - end position of the first breakpoint

4. chrom2 - chromosome of the second breakpoint

5. start2 - start position of the second breakpoint

6. end2 - end position of the second breakpoint

7. name - a unique string identifying the SV

8. quality - Phred-like quality score

9. strand1 - strand of the first breakpoint (not currently used; always '+')

10. strand2 - strand of the second breakpoint (not currently used; always '+')

11. filter - a semicolon-delimited list of filters that were applied to the SV, or single period (.) if the SV was not filtered out

12. info - extra information about the SV or a single period (.)

## Filter Entries

The filter field (column 11) is a semicolon-delimited string of filters that the SV failed to pass. The following filters may have been applied:

FilterDescription
BLACK_DISTAt least one breakpoint is within 10Kb of the blacklist (see also the BLACK_DIST1 and BLACK_DIST2 info fields below).
BLACK_FRACThe SV has >10% of base pairs overlapping the blacklist (see also the BLACK_FRAC info field below).
SEG_DUPThe SV breakpoints are within 10Kb from copies of the same segmental duplication.
NMATESBoth breakpoints of the SV participate in multiple (>5) SVs. This is an indication of low-complexity regions or barcode coalescence.
LOW_MAPQAverage MAPQ of reads in the call region < 40. Suggests potential alignment problems leading to a false positive call.
DEPTH_DROPDepth drop that is inconsistent with the presence of a deletion. Suggests alignment problems or coverage unevenness.
HIGH_BC_COVBarcode coverage on either breakpoint > 3 times the average barcode coverage genomewide. Suggests alignment problems leading to read pileups.
TOO_MANY_FILTERED_BCSMore than 30% of the barcodes supporting the call have been associated with calls filtered by one or more of the other filters.

The SV blacklist and segmental duplication list are included in the refdata-hg19 package required by Long Ranger. These lists define gaps and other ambiguous regions of the reference genome that have been found to raise spurious SV candidates and calls.

## Info entries

The info field (column 12) is a semicolon-delimited string of key=value pairs. A single period (.) in the value suggests that the value is missing (eg. because the corresponding info key does not apply to this entry of the BEDPE file). The following keys may be defined for a given SV:

KeyDescription
BLACK1If the first breakpoint of the SV is too close to a blacklist element, this will be the type of the element (eg. centromere, gap).
BLACK2If the second breakpoint of the SV is too close to a blacklist element, this will be the type of the element (eg. centromere, gap).
BLACK_DIST1Distance between the first breakpoint and the blacklist
BLACK_DIST2Distance between the second breakpoint and the blacklist
BLACK_FRACFraction of the SV length that overlaps the blacklist
MATCHESComma-separated list of ground-truth SVs that match the BEDPE entry. Always missing (.), unless a ground-truth list of SV calls is provided to the longranger run pipeline.