HOME  ›   pipelines

# FASTA/FASTQ Files

The cellranger vdj pipeline outputs several indexed FASTA and FASTQ files.

File type Primary Use Cases
FASTA Downstream tools such as the Integrated Genome Viewer (IGV) or V(D)J annotation tools like IGBLAST.
FASTQ Inspecting assembly base quality scores
File Records Description
filtered_contig.fasta Assembled contigs High-confidence contig sequences in cell barcodes.
filtered_contig.fastq Assembled contigs High-confidence contig sequences in cell barcodes.
all_contig.fasta Assembled contigs All assembled contig sequences.
all_contig.fastq Assembled contigs All assembled contig sequences.
consensus.fasta Clonotype consensus sequences Clonotype consensus sequences.
concat_ref.fasta Concatenated reference segments Concatenated V(D)J reference segments for the segments detected on each consensus sequence. These serve as an approximate reference for each consensus sequence.
donor_regions.fa Inferred germline genes See below

## Donor reference

Cell Ranger 5.0 infers the germline V genes used to rearrange T cell and B cell receptors. See Clonotype Grouping for more information. All cells with a given V gene (including cells in unrelated clonotypes) are inspected for shared mutations relative to the V(D)J reference; mutations shared across all cells are likely to be somatic mutations present in the germline V gene of the donor. In Cell Ranger 5.0, these inferred V gene germline sequences are exported as pipeline outs (vdj_reference/fasta/donor_regions.fa). D, J, and C germline genes are not inferred in Cell Ranger 5.0. Each donor_regions.fa file contains a list of unique records, wherein each record corresponds to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. The nucleotide sequence exported in the record spans the translated RNA sequence through the beginning of CDR3 (i.e. leader peptide to CDR3) and does not include the 5’ UTR. The header of each record in donor_regions.fa contains four elements. Consider the following example header:

>454:d1:1:TRAV1-2 (reference record id : donor name : allele number : gene name)


There are four elements in the header:

1. The first element of the header, 454, corresponds to the entry of the closest V gene in the regions.fa V(D)J reference.
2. The second element, d1, is the donor name provided to or inferred by the pipeline.
3. The third element, 1, refers to the allele of the closest V gene in the V(D)J reference.
4. The fourth and final element of the header, TRAV1-2, is the name of the closest V gene in the V(D)J reference.

## Quality Scores

Typically, quality scores in a FASTQ file indicate the Phred-encoded probability that the base is correct. When a FASTQ file contains records for sequencing reads, the quality scores usually indicate the confidence of the base-caller at each base. Because cellranger vdj produces quality scores for assembled bases, the interpretation is slightly different.

File Interpretation
filtered_contig.fastq Probability that the base is not a sequencing, PCR, or reverse-transcription (RT) error. The quality score is computed using the per-read sequencing Q-scores and an assumed RT error rate.
all_contig.fastq Same as above.