10x Genomics
Single Cell Gene Expression
Cell Ranger, printed on 12/21/2024
Release Notes for Cell Ranger 2.0 Gene Expression
Cell Ranger 2.0.2 Gene Expression
Bug fixes
- Properly ignore SIGHUP when a pipeline is run using nohup.
Cell Ranger 2.0.1 Gene Expression
Pipeline Argument Changes
- Add --override option to all pipelines, allowing for stage-level overrides for cores and memory.
- Reanalyze no longer requires --agg to persist library ID; it is only required for persisting user-defined fields.
Bug fixes
- Fix CHUNK_READS using more cores and using them less efficiently than intended.
- Fix
aggr
using incorrect downsampling rates when more than 10 libraries are aggregated.
- Fix mkfastq proceeding even after bcl2fastq is killed.
- Fix lack of robustness to rare events where NFS latency induces double file deletion or double directory creation events.
- Fix ALIGN_READS proceeding after the STAR subprocess fails, causing crashes in ATTACH_BCS_AND_UMIS.
- Improve error messages when STAR or samtools fail in ALIGN_READS.
- Fix spaces in transcript IDs causing ATTACH_BCS_AND_UMIS to crash.
mkref
no longer allows spaces in transcript IDs.
- Fix crash when reads are adapter-trimmed by bcl2fastq and some reads end up empty.
- Fix out-of-memory condition in ATTACH_BCS_AND_UMIS for some libraries with >800M reads.
- Fix question marks replacing axis titles of barcode rank plot in web summary.
- Fix excessive memory consumption and runtime of mkfastq on large sample sheets.
Job Scheduling
- Fix several cases where, after
mrp
(which is invoked by cellranger
) gets killed, it was not able to restart correctly.
- On SGE clusters,
cellranger
/mrp
now periodically runs qstat
to verify that the jobs it queued have not been killed or canceled.
- If the run fails, instead of just displaying a message pointing the user to the relevant
_errors
file, the contents of that file is printed.
- On automatic retry of failed stages, the reason for the original failure is logged.
mrp
is now more resilient against certain kinds of filesystem errors.
- In the event of certain types of filesystem problems (such as permissions errors or disk quota),
mrp
/cellranger
should now sometimes be able to provide more useful and immediate error messages.
- Additional information about the environment cellranger runs in is now logged and included in
mri.tgz
.
- Additional information about the environment the analysis runs in is now logged and included in
mri.tgz
.
mrp
now correctly handles the signals sent by SGE and LSF when a soft time limit is reached (e.g. for SGE, -l s_rt 23:00:00
).
- Now supports
--overrides
method to dynamically change additional CPU and memory per stage.
Cell Ranger 2.0.0 Gene Expression
Pipeline Argument Changes
- Add --barcodes and --genes options to reanalyze, which allow selection of a specific subset of barcodes and/or genes to use in the secondary analysis.
- Add --force-cells option to count and reanalyze to explicitly set the cell count. If specified, Cell Ranger will take the top N barcodes (by UMI count) as cells instead of doing dynamic cell count estimation.
- Rename the estimated cells option from --cells to --expect-cells for clarity.
- Add --nosecondary flag to count, which skips the secondary analysis.
- Disallow slashes in the --genome argument in mkref.
- Add --id option to
mkfastq
which allows you to name the output directory.
New Subcommands
- Add cellranger mat2csv command, which converts a Cell Ranger sparse gene-barcode matrix to a dense CSV format. Note that the resulting file will be very large, even for a few hundred cells.
Web Summary Changes
- Add "Reads Mapped Antisense to Gene" metric, which quantifies reads that are mapped to the non-coding strand of a gene. High values can indicate the use of an unsupported chemistry type, e.g. passing a Single Cell V(D)J library to cellranger count.
- Add "Fraction GEMs with >1 Cell (Lower / Upper Bound)" metrics, which define a confidence interval for the multiplet rate estimate in multi-genome samples.
- Add more details to various metric descriptions.
Algorithm Improvements
- Add the requirement that reads overlap annotated exons by at least 50% in order to be considered exonic. As a result, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
- Reduce EXTRACT_READS per-read runtime by 50% by avoiding OrderedDict and caching metric calculations.
- Reduce SUBSAMPLE_READS runtime by reducing the number of fixed target values for subsampling (to just 25k and 50k reads per cell).
File Format Improvements
- Due to a format change (removal of the IntervalTree object), references produced with cellranger mkref using Cell Ranger 2.0 are not compatible with pipelines from Cell Ranger 1.x.
- Modify the TX, GX, and GN tags to have more granular transcript / gene annotations. Each BAM record is only annotated with transcripts / genes specific to that alignment, instead of combining annotations from all alignments of the corresponding read.
- Add RE tag, which indicates whether an alignment is exonic, intronic or intergenic.
Bug fixes
- Fix rare bug in interval arithmetic, leading to exonic reads being falsely annotated as intronic or intergenic. As a result of this bugfix, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.
- Fix excessive EXTRACT_READS runtime (10+ hours) on very large FASTQs such as those produced by mkfastq.
- Fix a crash in RUN_GRAPH_CLUSTERING on filesystems that do not support named pipes.
- Fix SUBSAMPLE_READS using more VMEM than expected, causing it to be killed by SGE when exceeding the h_vmem limit on certain clusters.
- Fix mkfastq not merging output files properly due to sample numbering issues.
- Fix mkfastq crash due to -d (demultiplexing-threads) argument being deprecated in bcl2fastq 2.19.
- Fix the components.csv file produced by PCA, which did not contain the correct matrix.
- Fix a crash in RUN_PCA when the number of nonzero genes is smaller than the number of principal components.
- Fix a crash in mkref with very large genomes; use the limitGenomeGenerateRAM option in STAR to overcome its default reference size limit.
- Fix certain special characters (like dashes) in reference names breaking the subsampled genes detected plot.
- Fix mkloupe displaying an unhelpful error message when run on mixed-species runs and those from Cell Ranger 1.1 or earlier.
- Fix the open-file-handle-limit check using the submit host rather than the execution machine.
- Fix
cellranger aggr
allowing duplicate library_ids.
- Fix CLOUPE_PREPROCESS taking the full matrix even after
reanalyze
subselects barcodes.
- Fix a crash in
mkfastq
on RunInfo.xml files produced by the NovaSeq.
- Fix a crash in
mkfastq
when bcl2fastq 2.19 is used in cluster mode or with the --demultiplexing-threads
argument.
- Fix
mkfastq
sometimes not properly merging samples in bcl2fastq 2.18 and 2.19 due to a change in the order in which lanes are processed by bcl2fastq.
Martian Runtime Changes
- Add caching for deserialized json metadata. This improves performance for stages with many chunks.
Miscellaneous
- Update samtools from 0.1.19 to 1.4.
- Rename RUN_PREPROCESS to PREPROCESS_MATRIX in the SC_RNA_ANALYZER pipeline.
- Add alerts.json as an output of the SUMMARIZE_REPORTS stage. This file is a machine-readable list of any abnormal metric values that raised alarms in the web summary.
- For multi-genome samples, display the full reference name rather than a comma delimited list of genomes in the web summary ("hg19, mm10" becomes "hg19_and_mm10").