HOME  ›   pipelines

# Changes that apply to Gene Expression and Feature Barcode analysis

1. Targeted Gene Expression analysis is available in Cell Ranger 4.0 and is invoked by specifying the --target-panel option when running the cellranger count command.

2. Cell Ranger 4.0 introduces the new targeted-compare pipeline for direct comparative analysis of matched parent Whole Transcriptome Amplification (WTA) and Targeted Gene Expression datasets.

3. Cell Ranger 4.0 includes the new targeted-depth subcommand to estimate sequencing depths appropriate for Targeted Gene Expression experiments based on input WTA results and an associated target panel file.

4. Recommended reference packages for human and mouse have been updated from version 3.0.0 to 2020-A:

• Transcriptome annotations updated from Ensembl 93 to GENCODE v32 (human) and vM23 (mouse), which are equivalent to Ensembl 98.
• GRCh38 and mm10 sequences are not changed; chromosome names now follow the GENCODE/UCSC convention (e.g., chr1 and chrM) rather than the Ensembl convention (1 and MT).
• Additional filtering removes genes with unreliable annotations that often overlap more legitimate genes (see build scripts for details), resulting in improved overall sensitivity. 2020-A reference packages are backwards compatible with Cell Ranger 3.1.0 and prior.

Mapping rates and gene/UMI sensitivity are increased due to more comprehensive annotations and improved manual curation of genes:

1. When analyzing 3’ Gene Expression data, Cell Ranger 4.0 trims the template switch oligo (TSO) sequence from the 5’ end of Read-2 and the poly-A sequence from the 3’ end before aligning reads to the reference transcriptome. This behavior is different from Cell Ranger 3.1, which does not perform any trimming.

A full length cDNA molecule is normally flanked by the 30-bp TSO sequence, AAGCAGTGGTATCAACGCAGAGTACATGGG, at the 5' end and the poly-A sequence at the 3' end. Some fraction of sequencing reads are expected to contain either or both of these sequences, depending on the fragment size distribution of the library. Reads derived from short RNA molecules are more likely to contain either or both TSO and poly-A sequence than longer RNA molecules.

Trimming results in better alignment, with the fraction of reads mapped to a gene increasing by up to 1.5%, because the presence of non-template sequence in the form of either TSO or poly-A confounds read mapping. Trimming improves the sensitivity of the assay as well as the computational efficiency of the pipeline. Tags ts:i and pa:i in the output BAM files indicate the number of TSO nucleotides trimmed from the 5' end of Read-2 and the number of poly-A nucleotides trimmed from the 3' end. The trimmed bases are present in the sequence of the BAM record and are soft clipped in the CIGAR string.

Below, we illustrate how the fraction of reads mapped confidently to the transcriptome varies for both trimmed and untrimmed alignment as a function of read-length for a variety of sample types .

2. Cell Ranger 4.0 adds support for an “un-tethered” Feature Barcode pattern, (BC) without an anchor, specified in the Feature Reference CSV. This option allows the user to specify the sequence of the Feature Barcode without specifying a particular location on the read where the sequence is expected to be found.

3. cellranger reanalyze now outputs the count matrix used in the analysis, so as to reflect any subsetting of barcodes used.

4. Bug fixes for GTF files output by mkref. These changes do not affect the pipeline results.

• GTF attributes with duplicate keys (e.g., tag "value1"; tag "value2";) are handled correctly. Previously, only the last such attribute was kept.
• GTF attributes with unquoted integer values (e.g., exon_number 1;) are kept. Previously, they were removed.
• GTF lines end with semicolons.
• Unix line endings are used rather than DOS line endings, consistent with other Cell Ranger outputs.
5. Bug fixes for the BAM file

• The duplicate flag (0x400) is set correctly in the secondary alignments (flag 0x100) of PCR duplicate reads and low-support UMI reads (xf:i:2)
• Low-support UMI reads (xf:i:2) have the corrected barcode in UB:Z. Previously, it contained the raw barcode.
6. BAM file changes

• Cell Ranger 4.0 will not output the li:i tag. The RG:Z tag contains this information.
• Cell Ranger 4.0 will not output the BC:Z and QT:Z tags.
7. Cell Ranger 4.0 now relies on Orbit to perform transcriptome alignment, which leverages a modified STAR v2.7.2a. These modifications provide compatibility with “versionGenome 20201” references, such as those generated by STAR v2.5.1b. In Cell Ranger 4.0 we still provide and use STAR v2.5.1b for other purposes such as cellranger mkref. In our testing we did not note any differences in transcriptome alignments between the STAR shipped in Cell Ranger 3.1 (STAR v2.5.1b), STAR v2.7.2a, or Orbit.

# Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis

1. mkfastq supports dual-indexed libraries for gene expression, both WTA and Targeted, V(D)J, and Feature Barcode datasets.
2. mkfastq supports a new sequencing configuration for Novaseq where the I2 index may need to be reverse-complemented before demultiplexing dual-indexed libraries.
3. count and vdj run approximately two to four times faster than in Cell Ranger 3.1, depending on the sequencing data, and reduces disk I/O by half.
4. A new command-line interface with improved error-handling has been engineered into Cell Ranger 4.0.
5. The Martian pipeline framework has been upgraded to version 4.0. mrp and mrjob will shut down if they detect that their log files were deleted or renamed. See the Martian release notes for more details.
6. The following features present in Cell Ranger 3.1 are no longer present in Cell Ranger 4.0:
• mkfastq no longer supports data from the Single Cell 3′ v1 chemistry.
• The cellranger demux subcommand has been removed.
• The command-line interface does not accept FASTQs created by the deprecated cellranger demux pipeline. If you need to process FASTQs in this layout, contact [email protected] for assistance.
• cellranger count and cellranger vdj are no longer able to process data from multiple gem-wells through manual editing of MRO files.
• The Single Cell 3′ v1 and Single Cell 5′-R1 assay configurations will no longer be autodetected in Cell Ranger 4.0. Users who want to analyze data from those chemistries must explicitly specify the chemistry (SC3Pv1 or SC5P-R1 respectively) using the --chemistry argument.
7. The --id argument used by the pipelines has a 64 character limit in Cell Ranger 4.0.

# Changes that apply to V(D)J analysis

1. Recommended VDJ reference packages for human and mouse have been updated from version 3.1.0 to 4.0.0. The changes to the VDJ reference sequences are listed below:
• Remove the first base of the C region in certain cases. In these cases we observe that in most transcripts, the J region and C region overlap by exactly one base.
• Add an allele of the gene IGHJ6 to the human VDJ reference.
2. Bug fix in contig annotation:
• If a reference D region matches a contig perfectly, annotate the contig with that D region.
3. The command line argument --chain is added back in 4.0 for rare cases when the automatic chain detection fails.
4. A new output airr_rearrangement.tsv is added, which contains annotated contigs of VDJ rearrangements in the AIRR TSV format.
5. The VDJ reference is copied to the outputs folder starting with Cell Ranger 4.0.