Single Cell Gene Expression
Cell Ranger, printed on 11/29/2023
Release Notes for Cell Ranger 3.0 Gene Expression & Feature Barcoding
Cell Ranger 3.0.2 Gene Expression
Job Scheduling Changes
- Add support for SGE and LSF clusters that track virtual memory use.
Enable Analysis of CITE-seq Experiments
Cell Ranger can now process data from experiments where the antibodies were conjugated to oligonucleotides that were captured by oligo-dT primers. Previously, only experiments which used the Chromium Single Cell 3' Feature Barcode Library Kit, which utilizes a different capture sequence for Gene Expression and Feature Barcoding data, could be analyzed.
Please note that while Cell Ranger is now compatible with CITE-seq data, CITE-seq is not a supported application. To ensure full support for your 10x data analysis please visit the Feature Barcode Analysis page to see the supported Feature Barcoding technology.
Cell Ranger 3.0.1 Gene Expression
- Fix an issue where STAR would crash on CPUs without AVX support.
- Fix a determinism issue when aggregating 3' v2 and v3 data.
- Increase the memory reservation for the SORT_BY_POS stage.
Cell Ranger 3.0.0 Gene Expression
- Cell Ranger has been overhauled to support user-defined Feature Barcoding reagents, and to quantify these features alongside standard gene-expression reads. See Feature Barcoding for details. For users who have already run their data through earlier versions, there is no need to rerun it again using this new version.
Cell Calling Changes
- Cell Ranger 3.0 implements a version of the EmptyDrops cell calling algorithm that will call more low RNA content cells, especially when they are mixed with a population of high RNA content cells. See Cell Calling Algorithms for details.
- The cell calling 'knee-plot' in the web summary now indicates what fraction of barcodes in each segment of the curve were called as cells, since the new cell calling algorithm no longer makes a hard threshold on UMI counts.
Output File Format Changes
- The file formats of the gene-barcode matrix (now called the feature-barcode matrix) have changed to accomodate Feature Barcoding results.
- The mtx and barcodes.tsv files are now gzipped to save disk space
- The genes.tsv file has been renamed features.tsv.gz, and contains extra columns indicating the
feature_type of each gene / feature.
- See Feature-Barcode Matrices for details.
- As part of this change, cellranger-rkit is deprecated. We recommend Seurat for analysis in R.
- The Molecule info file format has been substantially changed to enable output from the new Feature Barcoding technology and remove rarely used mapping metrics.
Aggregation Pipeline Changes
- Cell Ranger 3.0 implements a version of the MNN algorithm to correct for systematic variability in gene expression profiles caused by different versions of the Single Cell Gene Expression chemistry. See Chemistry Batch Correction Algorithms for details.
check_invariants to check that the data and metadata in the
aggr output is consistent with that of the input files.
cellranger aggr no longer performs a cell-calling step, it simply aggregates the cell calls from each input job into a final set of cell calls.
- Allow for custom columns to be passed through the Aggregation CSV.
cellranger aggr no longer supports the option
- Increase sensitivity of the cell calling for low-RNA-content cell types.
- Chemistry batch correction in the aggregation pipeline.
- Filter chimeric molecules that add technical noise at high depths.
- Multi-Species support by generating a .cloupe file for multi-species experiments run with a joint reference. This allows Loupe Cell Browser to compute an aggregate transcriptome expression count per cell, enabling easy segmentation by species, or joint analysis for host-virus applications.
- Fix some edge case errors in PCA
- Fix partially mapped pairs in STAR output
- Fix failure to correct barcode when all barcodes have a bad base
- Fix crashes when no cells are detected
- Fix diffexp table showing gene names as integers in web summary
- Fix numerical issue limiting the significance of some unadjusted diffexp p-values to 2.2e-16
- Fix non-unique barcodes in filtered matrix of multigenome samples
- Fix failures on fastq-dumped SRA FASTQ files due to RG construction
- Fix overflow in ATTACH_BCS_AND_UMIS metrics
- Fix missing diffexp table when gene_id=gene_name in web summary
- Fix wrong order in reanalyze output caused by library_ids being out-of-order in matrices HDF5
- Fix rendering error in Top 10 Clonotypes Table Proportion Column in web summary
Job Scheduling Changes
- The memory and CPU consumption for the mrp process has been significantly reduced.
- There is now a timeout, configurable with the mrp --retry-wait command line flag, between when mrp observes a potentially-transient failure and when it retries the failure. In many cases (for example cluster-mode jobs running on a remote machine which was taken offline) the failures are clustered, and waiting a short time allows all of the failures to be dealt with at once. The default wait time is 1 second.
- mrp now tracks the number of posix processes owned by the user and compares to the current process rlimit (ulimit -u), throttling the number of spawned jobs if the user is approaching that limit. It will also issue a warning at startup if the difference between the current process
count and the process rlimit is small compared to --localcores. This should mitigate the frequently-observed issues with massive computers with 64+ cores, but with the default process ulimit of 1024.
- --never-local flag causes mrp to ignore local modifier on
non-preflight stages. This may be important when mrp is running on a
submit host with limited resources.
- The web UI now has a favicon.
- Volatile Disk Recovery's dependency tracking now ignores arguments of
types int, float, and bool when considering whether intermediate files
can be deleted, since those cannot contain file names.
- Various improvements to how files are tracked should result in lower peak storage usage.
- mrjob now attempts to track the I/O syscall and bytes rates for stage
code. Unfortunately, due to limitations of the Linux kernel's reporting of
such metrics, this is only accurate for block devices (e.g. local disk), as
opposed to for example NFS mounts.
- mrp now logs the type of filesystem the pipestance is running on.
- Various bug fixes.