Cell Ranger DNA1.0, printed on 11/20/2024
cellranger-dna
produces a HDF5 format file that contains all of the key outputs of the pipeline. HDF5 files provide an easy way to store compressed data and can be read by tools like h5py. The general structure is similar to a dictionary, with a series of keys storing values. Each of the sections below represent a key in the HDF5 file.
[-128, 127]
. Copy number calling is performed across all mappable bins of the genome, and then imputed in unmappable regions based on neighboring bins. Negative values denote imputation—when imputation is successful the value in the bin will be in [-127, -1]
representing an imputed copy number in the range [1, 127]
. If the neighboring bins during imputation have different copy numbers, then the value of -128
, or "no call" is used. If a copy number of 0
is imputed, then the value -127
will be assigned.
bin_size
: The size of the bins used for CNV calling. Defaults to 20kb.chroms
: The names of the primary contigs in the reference used.num_bins_per_chrom
: The number of bin_size
bins in each chromosome in chroms
.num_cells
: The number of barcodes determined to be cells. Will match the size of cell_barcodes
.num_chroms
: The number of primary contigs in the reference used. Will match the size of chroms
.num_nodes
: The number of nodes in the hierarchical tree generated by the clustering. Will always be equal to (num_cells * 2) - 1
because the tree is binary.gc_fraction
: The fraction of GC content in each bin.is_mappable
: A boolean array of mappability calls for each bin. A bin is mappable if at least 90% of simulated reads generated from a given bin map back uniquely.n_fraction
: The fraction of unknown bases (Ns) in the given bin.Z
: This is the linkage matrix output from SciPy clustering with complete linkage.is_cell_in_group
: An adjacency matrix for the cell-node graph. A num_cells - 1 x num_cells
bit matrix, where each X-axis value is an internal node and each Y-axis value is a leaf node (cell). This matrix has a value 1 in row x and column y when cell y is a member of internal node x and has a value 0 otherwise.heterogeneity
: This key has values for each primary contig in the reference, and has shape of num_cells - 1 x num_bins_per_chromosome
. For each internal node, the heterogeneity of cells within this cluster is calculated as 1 - (fraction majority)
, where fraction majority
is the fraction of cells that agree with the most common copy number call.