Features

Utility function to manipulate cell profiler features

pycytominer.cyto_utils.features.convert_compartment_format_to_list(compartments: list[str] | str) list[str]

Converts compartment to a list.

Parameters:

compartments (list of str or str) – Cell Painting compartment(s).

Returns:

compartments – List of Cell Painting compartments.

Return type:

list of str

pycytominer.cyto_utils.features.count_na_features(population_df: DataFrame, features: list[str]) DataFrame

Given a population dataframe and features, count how many nas per feature.

Parameters:
  • population_df (pd.DataFrame) – DataFrame of profiles.

  • features (list of str) – Features present in the population dataframe.

Return type:

Dataframe of NA counts per feature

pycytominer.cyto_utils.features.drop_outlier_features(population_df: DataFrame, features: str | list[str] = 'infer', samples: str = 'all', outlier_cutoff: int | float = 500) list[str]

Exclude a feature if its min or max absolute value is greater than the threshold.

Parameters:
  • population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.

  • features (list of str or str, default "infer") – Features present in the population dataframe. If “infer”, then assume CellProfiler feature conventions (start with Cells_, Nuclei_, or Cytoplasm_)

  • samples (str, default "all") – List of samples to perform operation on. The function uses a pd.DataFrame.query() function, so you should structure samples in this fashion. An example is “Metadata_treatment == ‘control’” (include all quotes). If “all”, use all samples to calculate.

  • outlier_cutoff (int or float, default 500) – Threshold to remove features if absolute value is greater. See https://github.com/cytomining/pycytominer/issues/237 for details.

Returns:

outlier_features – Features greater than the threshold.

Return type:

list of str

pycytominer.cyto_utils.features.infer_cp_features(population_df: DataFrame, compartments: str | list[str] = ['Cells', 'Nuclei', 'Cytoplasm'], metadata: bool = False, image_features: bool = False) list[str]

Given CellProfiler output data read as a DataFrame, output feature column names as a list.

Inferred feature columns will match expected CellProfiler prefixes (for example, Cells_, Cytoplasm_, and Nuclei_). When image_features=True, the function excludes non-numeric Image_* columns from inferred features. This is important for use cases that combine profile features with image payload columns under the Image_* prefix, such as OME-Arrow. The function also excludes columns with nested object values, even if they use a CellProfiler-like prefix.

Parameters:
  • population_df (pd.DataFrame) – DataFrame from which features are to be inferred.

  • compartments (list of str, default ["Cells", "Nuclei", "Cytoplasm"]) – Compartments from which Cell Painting features were extracted.

  • metadata (bool, default False) – Whether or not to infer metadata features. If metadata is set to True, find column names that begin with the Metadata_ prefix. This convention is expected by CellProfiler defaults.

  • image_features (bool, default False) – Whether or not to include Image_* columns in inferred features. When True, Pycytominer includes numeric image features alongside the default CellProfiler compartments, while still excluding non-numeric Image_* columns. This avoids treating image payload columns as profile features in data layouts that store both under the same Image_* prefix, such as OME-Arrow-backed tables.

Returns:

features – List of inferred Cell Painting feature column names.

Return type:

list of str

pycytominer.cyto_utils.features.label_compartment(cp_features: list[str], compartment: str, metadata_cols: list[str]) list[str]

Assign compartment label to each features as a prefix.

Parameters:
  • cp_features (list of str) – All features being used.

  • compartment (str) – Measured compartment.

  • metadata_cols (list) – Columns that should be considered metadata.

Returns:

cp_features – Recoded column names with appropriate metadata and compartment labels.

Return type:

list of str