HOME  ›   pipelines

# FASTA/FASTQ Files

## File formats and descriptions

The cellranger vdj pipeline outputs several indexed FASTA and FASTQ files. Refer to the V(D)J outputs Overview page for a list of all output files generated.

• FASTA files serve as inputs to downstream tools such as the Integrated Genome Viewer (IGV) or V(D)J annotation tools like IGBLAST.
• FASTQ files are used to inspect assembly base quality scores.
File Records Description
all_contig.fasta Assembled contigs FASTA format sequence for each assembled contig in the V(D)J library.
all_contig.fasta.fai Index Companion file to the all_contig.fasta.fai that serves as an external index.
filtered_contig.fasta Assembled contigs Contig sequences from barcodes that passed the algorithm's filtering steps (described on the Assembly Algorithm page). These contigs are annotated in the filtered_contig_annotations.csv.
consensus.fasta Clonotype consensus sequences The consensus sequence of each assembled contig.
consensus.fasta.fai Index Companion file to the consensus.fasta that serves as an external index.
concat_ref.fasta Concatenated reference segments Concatenated V(D)J reference segments for the segments detected on each consensus sequence. These serve as an approximate reference for each consensus sequence.
concat_ref.fasta.fai Index Companion file to the concat_ref.fasta that serves as an external index.
vdj_reference/fasta/donor_regions.fa Inferred germline genes List of records that correspond to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. Learn More

## Donor reference

Cell Ranger v5.0+ infers the germline V genes used to rearrange T cell and B cell receptors. See Clonotype Grouping for more information. All cells with a given V gene (including cells in unrelated clonotypes) are inspected for shared mutations relative to the V(D)J reference; mutations shared across all cells are likely to be somatic mutations present in the germline V gene of the donor. In Cell Ranger v5.0+, these inferred V gene germline sequences are exported as pipeline outs (vdj_reference/fasta/donor_regions.fa).

Each donor_regions.fa file contains a list of unique records. Each record corresponds to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. The nucleotide sequence exported in the record spans the translated RNA sequence through the beginning of CDR3 (i.e., leader peptide to CDR3) and does not include the 5’ UTR.

>454:d1:1:TRAV1-2 (reference record id : donor name : allele number : gene name)


There are four elements in the donor_regions.fa header:

1. The first element of the header (454) corresponds to the entry of the closest V gene in the regions.fa V(D)J reference.
2. The second element (d1) is the donor name provided to or inferred by the pipeline.
3. The third element (1) refers to the allele of the closest V gene in the V(D)J reference.
4. The fourth element of the header (TRAV1-2) is the name of the closest V gene in the V(D)J reference.

D, J, and C germline genes are not inferred in Cell Ranger 5.0+.

## Quality scores

Thecellranger vdj pipeline produces quality scores for assembled bases. In the filtered_contig.fastq and all_contig.fastq files, the quality score corresponds to the probability of the base not being a sequencing, PCR, or reverse-transcription (RT) error. It is computed using the per-read sequencing Q-scores and an assumed RT error rate.

The cellranger vdj pipeline's quality score differs from the quality scores in a typical FASTQ file. In a typical FASTQ file, quality score indicates the Phred-encoded probability that the base is correct.