Cell Ranger DNA1.1, printed on 11/21/2024
Analysis software for the 10x Genomics single cell DNA product is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support. |
cellranger-dna
produces a HDF5 format file that contains all of the key outputs of the pipeline. HDF5 files provide an easy way to store compressed data and can be read by tools like h5py. The general structure is similar to a dictionary, with a series of keys storing values. Each of the sections below represent a key in the HDF5 file.
annotation
: The gene annotation version included as part of the reference.assembly
: The assembly version of the sequence included as part of the reference.library_id
: A dataset of two columns, linking GEM well suffix to library ID.organism
: The reference organism.pipeline
: One of cnv
, aggr
, or reanalyze
, denoting which pipeline generated the file.pipeline_version
: The version of the pipeline used to generate the file.reference_path
: The absolute path of the reference used in the pipeline run.sample_desc
: The sample description provided during the pipeline invocation.sample_id
: The sample ID provided during the pipeline invocation.[-128, 126]
. Copy number calling is performed across all mappable bins of the genome, and then imputed in unmappable regions based on neighboring bins. Negative values denote imputation—when imputation is successful the value in the bin will be in [-126, -1]
representing an imputed copy number in the range [1, 126]
. If the neighboring bins during imputation have different copy numbers, then the value of -128
, or "no call" is used. If a copy number of 0
is imputed, then the value -127
will be assigned.
bin_size
: The size of the bins used for CNV calling. Defaults to 20kb.chroms
: The names of the primary contigs in the reference used.num_bins_per_chrom
: The number of bin_size
bins in each chromosome in chroms
.num_cells
: The number of barcodes determined to be cells. Will match the size of cell_barcodes
.num_chroms
: The number of primary contigs in the reference used. Will match the size of chroms
.num_nodes
: The number of nodes in the hierarchical tree generated by the clustering. Will always be equal to (num_cells * 2) - 1
because the tree is binary.gc_fraction
: The fraction of GC content in each bin.is_mappable
: A boolean array of mappability calls for each bin. A bin is mappable if at least 90% of simulated reads generated from a given bin map back uniquely.n_fraction
: The fraction of unknown bases (Ns) in the given bin.Z
: This is the linkage matrix output from SciPy clustering with complete linkage.is_cell_in_group
: An adjacency matrix for the cell-node graph. A num_cells - 1 x num_cells
bit matrix, where each X-axis value is an internal node and each Y-axis value is a leaf node (cell). This matrix has a value 1 in row x and column y when cell y is a member of internal node x and has a value 0 otherwise.heterogeneity
: This key has values for each primary contig in the reference, and has shape of num_cells - 1 x num_bins_per_chromosome
. For each internal node, the heterogeneity of cells within this cluster is calculated as 1 - (fraction majority)
, where fraction majority
is the fraction of cells that agree with the most common copy number call.