Features¶
Utility function to manipulate cell profiler features
- pycytominer.cyto_utils.features.convert_compartment_format_to_list(compartments: list[str] | str) list[str]¶
Converts compartment to a list.
- Parameters:
compartments (list of str or str) – Cell Painting compartment(s).
- Returns:
compartments – List of Cell Painting compartments.
- Return type:
list of str
- pycytominer.cyto_utils.features.count_na_features(population_df: DataFrame, features: list[str]) DataFrame¶
Given a population dataframe and features, count how many nas per feature.
- Parameters:
population_df (pd.DataFrame) – DataFrame of profiles.
features (list of str) – Features present in the population dataframe.
- Return type:
Dataframe of NA counts per feature
- pycytominer.cyto_utils.features.drop_outlier_features(population_df: DataFrame, features: str | list[str] = 'infer', samples: str = 'all', outlier_cutoff: int | float = 500) list[str]¶
Exclude a feature if its min or max absolute value is greater than the threshold.
- Parameters:
population_df (pd.DataFrame) – DataFrame that includes metadata and observation features.
features (list of str or str, default "infer") – Features present in the population dataframe. If “infer”, then assume CellProfiler feature conventions (start with
Cells_,Nuclei_, orCytoplasm_)samples (str, default "all") – List of samples to perform operation on. The function uses a pd.DataFrame.query() function, so you should structure samples in this fashion. An example is “Metadata_treatment == ‘control’” (include all quotes). If “all”, use all samples to calculate.
outlier_cutoff (int or float, default 500) – Threshold to remove features if absolute value is greater. See https://github.com/cytomining/pycytominer/issues/237 for details.
- Returns:
outlier_features – Features greater than the threshold.
- Return type:
list of str
- pycytominer.cyto_utils.features.infer_cp_features(population_df: DataFrame, compartments: str | list[str] = ['Cells', 'Nuclei', 'Cytoplasm'], metadata: bool = False, image_features: bool = False) list[str]¶
Given CellProfiler output data read as a DataFrame, output feature column names as a list.
Inferred feature columns will match expected CellProfiler prefixes (for example,
Cells_,Cytoplasm_, andNuclei_). Whenimage_features=True, the function excludes non-numericImage_*columns from inferred features. This is important for use cases that combine profile features with image payload columns under theImage_*prefix, such as OME-Arrow. The function also excludes columns with nested object values, even if they use a CellProfiler-like prefix.- Parameters:
population_df (pd.DataFrame) – DataFrame from which features are to be inferred.
compartments (list of str, default ["Cells", "Nuclei", "Cytoplasm"]) – Compartments from which Cell Painting features were extracted.
metadata (bool, default False) – Whether or not to infer metadata features. If metadata is set to True, find column names that begin with the Metadata_ prefix. This convention is expected by CellProfiler defaults.
image_features (bool, default False) – Whether or not to include
Image_*columns in inferred features. When True, Pycytominer includes numeric image features alongside the default CellProfiler compartments, while still excluding non-numericImage_*columns. This avoids treating image payload columns as profile features in data layouts that store both under the sameImage_*prefix, such as OME-Arrow-backed tables.
- Returns:
features – List of inferred Cell Painting feature column names.
- Return type:
list of str
- pycytominer.cyto_utils.features.label_compartment(cp_features: list[str], compartment: str, metadata_cols: list[str]) list[str]¶
Assign compartment label to each features as a prefix.
- Parameters:
cp_features (list of str) – All features being used.
compartment (str) – Measured compartment.
metadata_cols (list) – Columns that should be considered metadata.
- Returns:
cp_features – Recoded column names with appropriate metadata and compartment labels.
- Return type:
list of str