Long Ranger2.2, printed on 11/22/2024
The SV filter file should contain gaps or other regions of the genome that are likely to give false positive SVs, like regions with known or putative assembly issues (eg. unplaced contigs, highly polymorphic regions etc). The SV-calling algorithm makes calls in such regions but marks them as filtered and puts them in the SV candidates file.
The SV filter file included in Long Ranger contains the following sets of regions:
A SV filter file for a custom reference should follow the BED format. Only the first 4 columns of the file will be used. The filter file should be placed in <refdata folder>/regions/sv_blacklist.bed, where <refdata folder> is the reference folder created by longranger mkref.
Segmental duplications are large scale (>= 1Kb) recent duplications with high (>=90%) sequence identity between copies. Such regions are likely to be missassembled and can potentially lead to incorrect alignments between the duplication copies. Therefore, it is a good idea to mark and filter out SV calls whose breakpoints overlap copies of the same segmental duplication. The UCSC browser provides segmental duplication tracks for some reference assemblies (hg19, GRCh38, mm9, mm10). These should be converted to the BEDPE format and placed in <refdata folder>/regions/segdups.bedpe.