Aggregate

Aggregate profiles based on given grouping variables.

pycytominer.aggregate.aggregate(population_df: DataFrame, strata: list[str] = ['Metadata_Plate', 'Metadata_Well'], features: list[str] | str = 'infer', image_features: bool = False, operation: str = 'median', output_file: str | None = None, output_type: Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr', None] = 'csv', compute_object_count: bool = False, object_feature: str = 'Metadata_ObjectNumber', subset_data_df: DataFrame | None = None, compression_options: str | dict[str, Any] | None = None, float_format: str | None = None) DataFrame | str

Combine population dataframe variables by strata groups using given operation.

Parameters:
  • population_df (pd.DataFrame) – DataFrame to group and aggregate.

  • strata (list of str, default ["Metadata_Plate", "Metadata_Well"]) – Columns to groupby and aggregate.

  • features (list of str, default "infer") – List of features that should be aggregated.

  • image_features (bool, default False) – Whether to include inferred Image_* feature columns. When True, Pycytominer preserves numeric image-level measurements while excluding non-numeric Image_* columns, which helps avoid treating image payload columns as profile features in mixed tables such as OME-Arrow-backed inputs.

  • operation (str, default "median") – How the data is aggregated. Currently only supports one of [‘mean’, ‘median’].

  • output_file (str or file handle, optional) – If provided, will write aggregated profiles to file. If not specified, will return the aggregated profiles. We recommend naming the file based on the plate name.

  • output_type (str, optional) – If provided, will write aggregated profiles as a specified file type (either CSV or parquet). If not specified and output_file is provided, then the file will be outputed as CSV as default.

  • compute_object_count (bool, default False) – Whether or not to compute object counts.

  • object_feature (str, default "Metadata_ObjectNumber") – Object number feature. Only used if compute_object_count=True.

  • subset_data_df (pd.DataFrame) – How to subset the input.

  • compression_options (str or dict, optional) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file as input to pd.DataFrame.to_csv(float_format=float_format). For example, use “%.3g” for 3 decimal precision.

Returns:

pd.DataFrame:

DataFrame of aggregated features. If output_file=None, then return the DataFrame. If you specify output_file, then write to file and do not return data.

str:

If output_file is provided, then the function returns the path to the output file.

Return type:

str or pd.DataFrame