Software  ›   pipelines
If your question is not answered here, please email us at:  support@10xgenomics.com

10x Genomics
Chromium Genome & Exome

Phased Structural Variants in VCF Format

The VCF outputs of the Structural Variant and Copy Number Variant pipelines largely follow the VCF standard. Below we describe a few additional conventions that we adopted in order to capture information provided by the 10x data and algorithms.

Structural Variants as breakends

Both breakpoints on the same phase set

SVs with both breakpoints on the same phase set are described using a single VCF record. In such cases, the type of the SV is given in the SVTYPE info field and can be one of DEL, INV, DUP, or UNK, to mark respectively a deletion, inversion, tandem duplication, or event of unknown type. The type of the SV is also encoded in the ALT field (which is one of <DEL>, <INV>, <DUP:TANDEM>, or <UNK>). The second breakpoint (e.g. the end of a deletion or the second breakpoint of an inversion) is given by the END info field.

Breakpoints on different phase sets

If the breakpoints are on different phase sets, each breakpoint is put in a separate VCF record (otherwise we wouldn't know which phase set the genotype field refers to). In this case, the ALT field, describes the adjacency created by the breakpoint. For information about describing adjacencies using breakends, please see the VCF standard.

The VCF standard specifies that the SVTYPE info field of breakends must be BND. Therefore, for SVs described as breakends, we use a custom info field SVTYPE2 to specify the predicted type of the SV.

All breakends referring to the same event have the same value of the EVENT info field. In addition, for each breakend of the event, the MATEID info field points to the other breakend of the same event.

Special case: Inversions

Inversions with the two breakpoints on different phase sets are split into four separate VCF records. This is done because each inversion breakpoint implies two sets of adjacencies. For examples, see section on inversions in the VCF standard).

Filter fields

The possible filter fields in our SV VCF files are similar to the filters applied to the entries of the SV BEDPE output. A VCF entry that passes all filters has the value PASS in the filter column. All other SV entries either have low support or are in spurious regions of the genome.

Summary of info fields

FilterDescription
SVTYPE Type of the SV (DEL, INV, DUP, or UNK), or BND for SVs described as sets of breakends.
SVTYPE2 Type of the SV, for SVs described as breakends.
IMPRECISE_DIR Flag indicating that the orientation of the adjacency is unknown. This only applies to SVs described using breakends.
SVLEN Length of the variant.
END Second breakpoint of the SV, for SVs given in a single record.
EVENT Unique name of the SV which can be used to group together breakends referring to the same event.
MATEID Name of the other breakend of the same event.
CIPOS/CIEND Uncertainty around the predicted first and second breakpoints of the event. This is a tuple of two values specifying the region of Uncertainty around the POS (or END) value.

The remaining info fields in our SV VCFs are similar to the fields in the BEDPE output.

Quality scores

The quality scores in our Structural Variant VCFs are not Phred-scaled probabilities. Instead, the value provided in the QUAL field is an estimate of the barcode support for the corresponding event. Our SV-calling algorithms estimate the appropriate cutoff for this value based on the properties of the sample, such as depth and loaded mass. Note that this implies that the quality score values are not necessarily comparable across samples.

Calls vs low quality candidates

Our large-scale SV calling pipeline, which identifies SVs greater than 30Kb, outputs two BEDPE files, one with high-quality calls, and another one with low-quality candidates. This is done for compatibility reasons with earlier versions of the pipeline.

However, the VCF files output by our pipelines might contain both high-quality calls and low-quality candidates. High-quality calls passing all filters have the PASS flag in the filters field.