Chromium Single Cell Immune Profiling
Cell Ranger5.0 (latest), printed on 01/19/2021
Release notes for Cell Ranger 5.0.1 (12/16/2020):
- Fixes an issue in aggr where files would fail to be copied on NFSv4 File Systems.
- Fixes an issue in multi where r1-length and r2-length settings were ignored for vdj.
Release notes for Cell Ranger 5.0.0 (11/19/2020):
Changes that apply to V(D)J analysis
- Cell Ranger 5.0 introduces a new clonotype grouping algorithm that computationally approximates groups of cells which are descendants of a single, fully rearranged common ancestor and infers the germline sequence of the V genes from each individual in the dataset. In previous versions (4.0 and earlier), the algorithm grouped cells based only on the set of productive CDR3 nucleotide sequences. As a consequence, whenever a true clonotype had a CDR3 mutation, the true exact subclonotypes were presented by the algorithm as multiple separate clonotypes. The previous approach to clonotyping in Cell Ranger 4.0 and earlier led to inaccuracies in the B cell clonotypes due to the grouping by unique CDR3 sequence. Additionally, single-chain clonotypes were reported as separate clonotypes, which could lead to both over- and under-estimation of the size of a given clonotype. The new clonotyping algorithm is improved in specificity, sensitivity, and overall accuracy because it accounts for mutations found in the V(D)J transcript and in the V(D)J junction. It also merges single chain clonotypes into the correct fully-paired clonotypes for both T cells and B cells. Additional cell filters are also imposed during clonotyping to improve data quality.
- Changes to V(D)J outputs:
- The following output files are removed in 5.0:
- The following output files are added in 5.0:
- Contig info binary file, which would be used as an input to aggregate V(D)J samples
- Donor reference fasta
- Two new columns are added to the clonotypes.csv file that displays the iNKT/MAIT evidence.
- The files
filtered_contig.fastq now only contain data from the contigs in cell barcodes that are productive.
- A number of new fields are added to
v_start, v_end, v_end_ref, j_start, j_start_ref, j_end, cdr3_start, cdr3_end
- The recommended V(D)J reference packages for human and mouse have been updated from version 4.0 to 5.0. The changes to the V(D)J reference sequences are listed below:
- Replace IGKV2D-40, whose leader sequence appears to be truncated.
- Delete IGKV2-18, which is probably a pseudogene.
- Delete IGLV5-48, which is truncated on the right.
- Delete TRBV21-1, which has multiple frameshifts.
- Add IGHV4-30-4, which was missing.
- Add IGKV1-NL1, which was missing.
- Add IGHV4-38-2, which was missing.
- Delete TRAV23, which is frame-shifted.
- Delete the first base of the constant region gene IGHG2B.
- Make a six-base insertion in IGKV12-89, based on empirical data.
- Correct IGHV8-9, whose amino acid sequence showed the canonical C at the end of FWR3 as S. This is consistent with 10x data.
- Add an allele of IGKV2-109, which was missing.
- Add IGKV4-56, which was missing.
- Add IGHV1-2, which was missing.
- cellranger aggr now aggregates V(D)J data, allowing users to recompute V(D)J clonotype groupings across the combined data.
- Soft deprecation of
- Since Cell Ranger 3.1, due to filters in the VDJ assembler,
--force-cells in VDJ pipelines did not behave as users would expect it to behave. Users can only apply
--force-cells to the number of barcodes passing the combined filters in the assembler.
- This makes it effectively impossible for users to increase the number of recovered cells. Rather, it is only possible to reduce the number of recovered cells using
--force-cells in this context, unlike the behavior in the
cellranger count pipeline.
- Because this specific flag is likely to be misunderstood by users, and is also not highly requested, we are starting to deprecate it. In Cell Ranger 5.0,
--force-cells will be available only as an undocumented silent option. This will also allow users who are using this routinely in their workflows to anticipate eventual deprecation.
Changes that apply to Gene Expression and Feature Barcode analysis
- Cell Ranger 5.0 introduces a
--no-bam option that disables the generation of aligned BAMs for gene expression and feature barcode datasets. If you have no need for these files, then disabling their generation can significantly speed up the pipeline.
- Cell Ranger 5.0 introduces upgraded protein aggregation detection and filtering algorithm. By directly using the protein counts, more aggregate GEMs are detected and filtered out before proceeding with cell calling.
- Cell Ranger 5.0 introduces an
--include-introns option for counting intronic reads using 3’ Gene Expression and 5’ Gene Expression products. The usage of pre-mRNA references for counting intronic reads is now deprecated.
--include-introns option, introduced in Cell Ranger 5.0, works by aligning reads to a normal reference transcriptome with STAR. After alignment, the reads mapping to introns are annotated and counted similarly to reads that are aligned to exons. Previously, the pre-mRNA reference strategy implemented with Cell Ranger 4.0 and earlier involves alignment to a modified reference transcriptome that categorizes intronic regions as exonic. There are slight differences in read alignments produced by the STAR aligner when a pre-mRNA reference is used compared to a normal reference using
--include-introns. These differences result in small overall differences in counted UMIs for intron-mode compared to pre-mRNA-reference.
- Ported a fix from upstream
IRLBA that fixes incorrect behavior in rare circumstances.
- On some Linux distributions, NFS implementations would surface an improper error during file copy. We have implemented a workaround for our affected native code.
Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis
- Cell Ranger 5.0 introduces the multi pipeline that can simultaneously process any combination of 5' Gene Expression, Feature Barcode (cell surface protein or antigen) and V(D)J libraries from a single GEM well. The multi pipeline uses the cell calls provided by the gene expression data to improve the cell calls inferred by the V(D)J library.
- A new metric, “Number of Short Reads Skipped”, is added to the web summary, indicating the total number of read pairs that were ignored by the pipeline because they do not satisfy the minimum length requirements.