Modz¶
Module for performing MODZ (modified z-score) transformations
- pycytominer.cyto_utils.modz.modz(population_df: DataFrame, replicate_columns: str | list[str], features: str | list[str] = 'infer', method: str = 'spearman', min_weight: float = 0.01, precision: int = 4) DataFrame¶
Collapse replicates into a consensus signature using a weighted transformation
- Parameters:
population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.
replicate_columns (str, list) – a string or list of column(s) in the population dataframe that indicate replicate level information
features (list, default "infer") – A list of strings corresponding to feature measurement column names in the population_df DataFrame. All features listed must be found in population_df. Defaults to “infer”. If “infer”, then assume CellProfiler features are those prefixed with “Cells”, “Nuclei”, or “Cytoplasm”.
method (str, default "spearman") – indicating which correlation metric to use.
min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to
precision (int, default 4) – how many significant digits to round weights to
- Returns:
modz_df – Consensus signatures with metadata for all replicates in the given DataFrame
- Return type:
pd.DataFrame
- pycytominer.cyto_utils.modz.modz_base(population_df: DataFrame, method: str = 'spearman', min_weight: float = 0.01, precision: int = 4) Series¶
Perform a modified z score transformation.
This code is modified from cmapPy. (see https://github.com/cytomining/pycytominer/issues/52). Note that this will apply the transformation to the FULL population_df. See modz() for replicate level procedures.
- Parameters:
population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.
method (str, default "spearman") – indicating which correlation metric to use.
min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to
precision (int, default 4) – how many significant digits to round weights to
- Returns:
modz_df – modz transformed pd.Series - a consensus signature of the input data weighted by replicate correlation
- Return type:
pd.Series