Modz

Module for performing MODZ (modified z-score) transformations

pycytominer.cyto_utils.modz.modz(population_df: DataFrame, replicate_columns: str | list[str], features: str | list[str] = 'infer', method: str = 'spearman', min_weight: float = 0.01, precision: int = 4) DataFrame

Collapse replicates into a consensus signature using a weighted transformation

Parameters:
  • population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.

  • replicate_columns (str, list) – a string or list of column(s) in the population dataframe that indicate replicate level information

  • features (list, default "infer") – A list of strings corresponding to feature measurement column names in the population_df DataFrame. All features listed must be found in population_df. Defaults to “infer”. If “infer”, then assume CellProfiler features are those prefixed with “Cells”, “Nuclei”, or “Cytoplasm”.

  • method (str, default "spearman") – indicating which correlation metric to use.

  • min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to

  • precision (int, default 4) – how many significant digits to round weights to

Returns:

modz_df – Consensus signatures with metadata for all replicates in the given DataFrame

Return type:

pd.DataFrame

pycytominer.cyto_utils.modz.modz_base(population_df: DataFrame, method: str = 'spearman', min_weight: float = 0.01, precision: int = 4) Series

Perform a modified z score transformation.

This code is modified from cmapPy. (see https://github.com/cytomining/pycytominer/issues/52). Note that this will apply the transformation to the FULL population_df. See modz() for replicate level procedures.

Parameters:
  • population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.

  • method (str, default "spearman") – indicating which correlation metric to use.

  • min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to

  • precision (int, default 4) – how many significant digits to round weights to

Returns:

modz_df – modz transformed pd.Series - a consensus signature of the input data weighted by replicate correlation

Return type:

pd.Series