Aggregate¶

Aggregate profiles based on given grouping variables.

pycytominer.aggregate.aggregate(population_df: DataFrame, strata: list[str] = ['Metadata_Plate', 'Metadata_Well'], features: list[str] | str = 'infer', image_features: bool = False, operation: str = 'median', output_file: str | None = None, output_type: Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr', None] = 'csv', compute_object_count: bool = False, object_feature: str = 'Metadata_ObjectNumber', subset_data_df: DataFrame | None = None, compression_options: str | dict[str, Any] | None = None, float_format: str | None = None) → DataFrame¶

Combine population dataframe variables by strata groups using given operation.

Parameters:

population_df (pd.DataFrame) – DataFrame to group and aggregate.
strata (list of str, default ["Metadata_Plate", "Metadata_Well"]) – Columns to groupby and aggregate.
features (list of str, default "infer") – List of features that should be aggregated.
image_features (bool, default False) – Whether to include inferred Image_* feature columns. When True, Pycytominer preserves numeric image-level measurements while excluding non-numeric Image_* columns, which helps avoid treating image payload columns as profile features in mixed tables such as OME-Arrow-backed inputs.
operation (str, default "median") – How the data is aggregated. Currently only supports one of [‘mean’, ‘median’].
output_file (str or file handle, optional) – If provided, will write aggregated profiles to file. If not specified, will return the aggregated profiles. We recommend naming the file based on the plate name.
output_type (str, optional) – If provided, will write aggregated profiles as a specified file type (either CSV or parquet). If not specified and output_file is provided, then the file will be outputed as CSV as default.
compute_object_count (bool, default False) – Whether or not to compute object counts.
object_feature (str, default "Metadata_ObjectNumber") – Object number feature. Only used if compute_object_count=True.
subset_data_df (pd.DataFrame) – How to subset the input.
compression_options (str or dict, optional) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file as input to pd.DataFrame.to_csv(float_format=float_format). For example, use “%.3g” for 3 decimal precision.

Returns:

DataFrame of aggregated features. If output_file=None, then return the DataFrame. If you specify output_file, profiles will be written on disk based on provided output_file path.

Return type:

pd.DataFrame

Notes

Parameters: output_file, output_type, compression_options, and float_format are passed as kwargs to the write_to_file_if_user_specifies_output_details decorator, which handles writing the output DataFrame to file if the user specifies output details. If output_file is not specified, the function will return the aggregated DataFrame instead of writing to file.