HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Genome & Exome

Structural Variants in BEDPE Format

The principal output of the longranger run pipeline includes aligned reads with barcode and phasing information in BAM format, phased SNPs and indels in VCF format, and SV calls and candidates in BEDPE format. These are all standard file formats designed to interoperate with existing tools, and the additional information produced by the Chromium Platform are included as standards-compliant fields when appropriate.

The output of the SV calling code is BEDPE, a format similar to BED that describes pairs of genomic regions. Long Ranger uses this format to describe pairs of breakpoints that define a structural variant.

The BEDPE contains one SV per line with the following tab-delimited columns:

  1. chrom1 - chromosome of the first breakpoint

  2. start1 - start position of the first breakpoint

  3. stop1 - end position of the first breakpoint

  4. chrom2 - chromosome of the second breakpoint

  5. start2 - start position of the second breakpoint

  6. stop2 - end position of the second breakpoint

  7. name - a unique string identifying the SV

  8. qual - Phred-like quality score

  9. strand1 - strand of the first breakpoint (not currently used; always '+')

  10. strand2 - strand of the second breakpoint (not currently used; always '+')

  11. filter - a semicolon-delimited list of filters that were applied to the SV, or single period (.) if the SV was not filtered out

  12. info - extra information about the SV or a single period (.)

Filter Entries

The filter field (column 11) is a semicolon-delimited string of filters that the SV failed to pass. The following filters may have been applied:

FilterDescription
BLACK_DISTAt least one breakpoint is within 10Kb of the blacklist (see also the BLACK_DIST1 and BLACK_DIST2 info fields below).
BLACK_FRACThe SV has >10% of base pairs overlapping the blacklist (see also the BLACK_FRAC info field below).
SEG_DUPThe SV breakpoints are within 10Kb from copies of the same segmental duplication.
NMATESBoth breakpoints of the SV participate in multiple (>5) SVs. This is an indication of low-complexity regions or barcode coalescence.
LOW_MAPQAverage MAPQ of reads in the call region < 40. Suggests potential alignment problems leading to a false positive call.
DEPTH_DROPDepth drop that is inconsistent with the presence of a deletion. Suggests alignment problems or coverage unevenness.
HIGH_BC_COVBarcode coverage on either breakpoint > 3 times the average barcode coverage genomewide. Suggests alignment problems leading to read pileups.
TOO_MANY_FILTERED_BCSMore than 30% of the barcodes supporting the call have been associated with calls filtered by one or more of the other filters.

The SV blacklist and segmental duplication list are included in the refdata-hg19 package required by Long Ranger. These lists define gaps and other ambiguous regions of the reference genome that have been found to raise spurious SV candidates and calls.

Info entries

The info field (column 12) is a semicolon-delimited string of key=value pairs. A single period (.) in the value suggests that the value is missing (eg. because the corresponding info key does not apply to this entry of the BEDPE file). The following keys may be defined for a given SV:

KeyDescription
BCOVNumber of linked-read sets supporting the SV
BLACK1If the first breakpoint of the SV is too close to a blacklist element, this will be the type of the element (eg. centromere, gap).
BLACK2If the second breakpoint of the SV is too close to a blacklist element, this will be the type of the element (eg. centromere, gap).
BLACK_DIST1Distance between the first breakpoint and the blacklist
BLACK_DIST2Distance between the second breakpoint and the blacklist
BLACK_FRACFraction of the SV length that overlaps the blacklist
FRAC_SUPPORTFraction of common barcodes between the two breakpoints that support the SV, weighted by their probability of belonging to the assigned haplotypes.
HAPSA comma separated list of two values (0, 1, or None) showing the haplotype to which each of the two breakpoints was assigned. HAPS=None,None means that the SV was called homozygous.
HAP_PROBS A comma separated list of 4 values, showing the confidence of the SV breakpoints being assigned to each of the following 4 sets of haplotypes: 00, 10, 01, 11. In other words, this is the confidence on the value of HAPS. This is only meaningful for heterozygous events (HAPS is not None,None).
MATCHESComma-separated list of ground-truth SVs that match the BEDPE entry. Always missing (.), unless a ground-truth list of SV calls is provided to the longranger run pipeline.
NBCS1Number of linked-read sets overlapping the first breakpoint
NBCS2Number of linked-read sets overlapping the second breakpoint
NMATES1Number of SVs involving the first breakpoint. A large number usually suggests a false positive.
NMATES2Number of SVs involving the second breakpoint. A large number usually suggests a false positive.
NOOVRough estimate of the number of linked-read sets that oppose the presence of the SV (eg. linked-read sets from the haplotype that does not carry the SV).
NPAIRSNumber of read-pairs supporting the SV
NSPLITNumber of split reads supporting the SV
ORIENT For the events of the DISTAL type, ORIENT shows how the breakpoints were rearranged. ORIENT=++ means that the region downstream of the first breakpoint is joined to the region downstream of the second breakpoint, ORIENT=+- means that the region downstream of the first breakpoint is joined to the region upstream of the second breakpoint, and so on. ORIENT is set to '..' if the orientation is unknown or the type is not DISTAL.
PS1/PS2Phase sets to which each of the breakpoints was assigned.
RP_LRLog-likelihood ratio score of read-pair support. Higher values correspond to stronger read-pair evidence.
RP_TYPE SV type suggested by the orientation of read-pairs around the breakpoints. This is similar to the TYPE/ORIENT fields but uses read-pair information instead of molecule/barcode-level information. Distal events (breakpoints >500Kb apart) are marked as TRANS_RR, TRANS_RF, TRANS_FF, and TRANS_FR, where the last two characters show the orientation of reads around each of the two breakpoints (F:forward, R: reverse). So RP_TYPE=TRANS_FR is equivalent to ORIENT=-+. However, the RP_TYPE does not have to be compatible with TYPE/ORIENT if the signal at the molecule level and read-level are not in agreement.
SEG_DUPComma-separated list of segmental duplications that overlap the breakpoints of the SV
SUPPORTNumber of barcodes that are more concordant with the presence than with the absence of an SV. This is usually less than BCOV.
TYPE Type of SV. If the breakpoints are <500Kb apart, this will be one of: DEL (deletion), INV (inversion), DUP (tandem duplication), or UNK (unknown type). All events with breakpoints >500Kb apart are marked as DISTAL.