rbcde¶
-
rbcde.
RBC
(adata, clus_key='leiden', layer=None, use_raw=False)¶ Compute the rank-biserial correlation coefficient for each gene in each cluster. The results can be subsequently turned into a marker list via the helper function
rbcde.filter_markers()
. The primary output is stored as part of either .var or .raw.var, depending on whether .raw data is used.The rank-biserial correlation coefficient (Cureton, 1956) can be used as an effect size equivalent of the Wilcoxon test (Kerby, 2014), which in turn was deemed to perform well on single cell data problems (Soneson, 2018). Using effect size analyses is recommended for problems with large population sizes (Sullivan, 2012).
- adata :
AnnData
- Needs per cell normalised data stored somewhere in the object (as either sparse or dense), and the desired clustering/grouping vector included in .obs.
- clus_key :
str
, optional (default: “leiden”) - The name of the .obs column containing the clustering/grouping.
- layer :
str
orNone
, optional (default:None
) - If specified, take the expression data from the matching
.layers
field. Overridesuse_raw
if provided. - use_raw :
bool
, optional (default:False
) - If no
layer
was specified and this is set toTrue
, take the data from the.raw
field of the object. Store results in.raw.var
to match dimensionality.
- adata :
-
rbcde.
filter_markers
(adata, thresh=0.5, use_raw=False)¶ Filter the rank-biserial correlation coefficients computed with
rbcde.RBC()
to a list of markers for each cluster, provided as a data frame and a Scanpy plotting compatiblevar_names
cluster marker dictionaty. Returns those two objects, in this order.- adata :
AnnData
- Needs to have been processed with
rbcde.RBC()
. - thresh :
float
, optional (default: 0.5) - The threshold value used to call markers. Literature critical values can be used.
- use_raw :
bool
, optional (default:False
) - Set this to
True
if the raw data was used for the computation so that the results can be retrieved from the correct field of the object.
- adata :
rbcde.matrix¶
-
rbcde.matrix.
RBC
(data, clusters, vars)¶ Compute the rank-biserial correlation coefficient for each gene in each cluster. The results can be subsequently turned into a marker list via the helper function
rbcde.matrix.filter_markers()
. Returns a data frame with the coefficient value for each gene in each cluster.The rank-biserial correlation coefficient (Cureton, 1956) can be used as an effect size equivalent of the Wilcoxon test (Kerby, 2014). Using effect size analyses is recommended for problems with large population sizes (Sullivan, 2012).
- data :
np.array
orscipy.sparse
- Per cell normalised, if using single cell count data. Variables as rows, observations as columns.
- clusters :
np.array
orlist
- A vector of cluster/group assignments for each observation.
- vars:
np.array
orlist
- A vector of variable names, for output generation purposes.
- data :
-
rbcde.matrix.
filter_markers
(results, thresh=0.5)¶ Filter the rank-biserial correlation coefficients computed with
rbcde.matrix.RBC()
to a list of markers for each cluster. Returns a data frame of the computed markers.- results :
pd.DataFrame
- The output of
rbcde.matrix.RBC()
. - thresh :
float
, optional (default: 0.5) - The threshold value used to call markers. Literature critical values can be used.
- results :