rbcde¶
-
rbcde.RBC(adata, clus_key='leiden', layer=None, use_raw=False)¶ Compute the rank-biserial correlation coefficient for each gene in each cluster. The results can be subsequently turned into a marker list via the helper function
rbcde.filter_markers(). The primary output is stored as part of either .var or .raw.var, depending on whether .raw data is used.The rank-biserial correlation coefficient (Cureton, 1956) can be used as an effect size equivalent of the Wilcoxon test (Kerby, 2014), which in turn was deemed to perform well on single cell data problems (Soneson, 2018). Using effect size analyses is recommended for problems with large population sizes (Sullivan, 2012).
- adata :
AnnData - Needs per cell normalised data stored somewhere in the object (as either sparse or dense), and the desired clustering/grouping vector included in .obs.
- clus_key :
str, optional (default: “leiden”) - The name of the .obs column containing the clustering/grouping.
- layer :
strorNone, optional (default:None) - If specified, take the expression data from the matching
.layersfield. Overridesuse_rawif provided. - use_raw :
bool, optional (default:False) - If no
layerwas specified and this is set toTrue, take the data from the.rawfield of the object. Store results in.raw.varto match dimensionality.
- adata :
-
rbcde.filter_markers(adata, thresh=0.5, use_raw=False)¶ Filter the rank-biserial correlation coefficients computed with
rbcde.RBC()to a list of markers for each cluster, provided as a data frame and a Scanpy plotting compatiblevar_namescluster marker dictionaty. Returns those two objects, in this order.- adata :
AnnData - Needs to have been processed with
rbcde.RBC(). - thresh :
float, optional (default: 0.5) - The threshold value used to call markers. Literature critical values can be used.
- use_raw :
bool, optional (default:False) - Set this to
Trueif the raw data was used for the computation so that the results can be retrieved from the correct field of the object.
- adata :
rbcde.matrix¶
-
rbcde.matrix.RBC(data, clusters, vars)¶ Compute the rank-biserial correlation coefficient for each gene in each cluster. The results can be subsequently turned into a marker list via the helper function
rbcde.matrix.filter_markers(). Returns a data frame with the coefficient value for each gene in each cluster.The rank-biserial correlation coefficient (Cureton, 1956) can be used as an effect size equivalent of the Wilcoxon test (Kerby, 2014). Using effect size analyses is recommended for problems with large population sizes (Sullivan, 2012).
- data :
np.arrayorscipy.sparse - Per cell normalised, if using single cell count data. Variables as rows, observations as columns.
- clusters :
np.arrayorlist - A vector of cluster/group assignments for each observation.
- vars:
np.arrayorlist - A vector of variable names, for output generation purposes.
- data :
-
rbcde.matrix.filter_markers(results, thresh=0.5)¶ Filter the rank-biserial correlation coefficients computed with
rbcde.matrix.RBC()to a list of markers for each cluster. Returns a data frame of the computed markers.- results :
pd.DataFrame - The output of
rbcde.matrix.RBC(). - thresh :
float, optional (default: 0.5) - The threshold value used to call markers. Literature critical values can be used.
- results :