, printed on 08/25/2019
bamtofastq is a tool for converting 10x BAMs produced by cellranger, cellranger-dna, cellranger-atac or longranger back to FASTQ files that can be used as inputs to re-run analysis. The FASTQs will be emitted into a directory structure that is compatible with the directories created by the mkfastq tool.
We created bamtofastq to helps users who want to reanalyze 10x data and only have access to 10x BAM files. (e.g. Some customers want to store BAM files only. Others might have downloaded our BAM data from NCBI SRA). 10x pipelines require sequencer FASTQs (with embedded barcodes) as input. The location of the 10x barcode varies depending on product and reagent version. For current version Genome v2 and Single Cell v2 products, the 10x barcode is found on the first 16 bases of the R1 read. In earlier product versions, the 10x barcode was attached on the sample indices. The bamtofastq tool determines the appropriate way construct the original read sequence from the sequences and tags in the BAM file.
bamtofastq is available for Linux and is compatible with RedHat/CentOS 5.2 or later, and Ubuntu 8.04 or later.
We recommend upgrading to bamtofastq 1.1.2 - it has improved error messages and has fixes for some use cases that ran extremely slowly.
bamtofastq is a single executable that can be run directly and requires no compilation or installation. Place the executable file in a directory that is on your
PATH, and make sure to
chmod 700 to make it executable.
10x BAMs produced by Long Ranger v2.1+, Cell Ranger v1.2+, Cell Ranger DNA v1.0+ and Cell Ranger ATAC v1.0+ contain header fields that permit automatic conversion to the correct FASTQ sequences. BAMs produced by older 10x pipelines may require special arguments or have some caveats, see below for details. Run times for full-coverage WGS BAMs may be several hours.
The FASTQ files emitted by bamtofastq contain the same set of sequences that were input to the original pipeline run, although the original order will not be preserved. 10x pipelines are generally insensitive to the order of the input data, so you can expect nearly identical results when re-running with bamtofastq outputs.
The latest versions of cellranger, cellranger-dna and longranger generate BAM files that automatically reconstruct complete FASTQ files representing all input reads. BAMs produced by older versions of cellranger and longranger have some caveats, listed below:
|Package||Version||Pipelines||Extra Arguments||Complete FASTQs|
|Cell Ranger DNA||1.0.0+||cnv||none||Yes|
|Long Ranger||2.1.3+||wgs, targeted, align, basic||none||Yes|
|Long Ranger||2.1.0 - 2.1.2||wgs, targeted||none||Yes|
|Long Ranger||2.0||wgs, targeted||--lr20||Yes|
|Long Ranger||2.0.0 - 2.1.2||align, basic||Not Supported||N/A|
|Long Ranger||1.3 (GemCode)||wgs, targeted||--gemcode||Reads without a valid barcode will be absent from FASTQ. This will result in a ~5-10% loss of coverage.|
|Cell Ranger||1.2||count||none||Reads without a valid barcode will be absent from FASTQ. (These reads are ignored by Cell Ranger)|
|Cell Ranger||1.0-1.1||count||--cr11||Reads without a valid barcode will be absent from FASTQ. (These reads are ignored by Cell Ranger)|