Annotate

Annotates profiles with metadata information

pycytominer.annotate.annotate(profiles: str | DataFrame, platemap: str | DataFrame, join_on: str | list[str] = ['Metadata_well_position', 'Metadata_Well'], output_file: str | None = None, output_type: Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr'] | None = 'csv', add_metadata_id_to_platemap: bool = True, format_broad_cmap: bool = False, clean_cellprofiler: bool = True, external_metadata: str | DataFrame | None = None, external_join_on: str | list[str] | None = None, compression_options: str | dict[str, str] | None = None, float_format: str | None = None, cmap_args: dict[str, str] | None = None, **kwargs) DataFrame | str

Add metadata to aggregated profiles.

Parameters:
  • profiles (pd.DataFrame or file) – DataFrame or file path of profiles.

  • platemap (pd.DataFrame or file) – Dataframe or file path of platemap metadata.

  • join_on (list or str, default: ["Metadata_well_position", "Metadata_Well"]) – Which variables to merge profiles and plate. The first element indicates variable(s) in platemap and the second element indicates variable(s) in profiles to merge using. Note the setting of add_metadata_id_to_platemap

  • output_file (str, optional) – If not specified, will return the annotated profiles. We recommend that this output file be suffixed with “_augmented.csv”.

  • output_type (str, optional) – If provided, will write annotated profiles as a specified file type (either CSV or parquet). If not specified and output_file is provided, then the file will be outputed as CSV as default.

  • add_metadata_id_to_platemap (bool, default True) – Whether the plate map variables possibly need “Metadata” pre-pended

  • format_broad_cmap (bool, default False) – Whether we need to add columns to make compatible with Broad CMAP naming conventions.

  • clean_cellprofiler (bool, default True) – Clean specific CellProfiler feature names by dropping Image_ prefix. Default is true as the most common use case is annotating CellProfiler profiles, but this can be set to False if you are not using CellProfiler.

  • external_metadata (pd.DataFrame or file, optional) – DataFrame or file with additional metadata information. Most common use case is a QC.parquet file with QC flags for each profile that comes from coSMicQC.

  • external_join_on (str or list, optional) – Merge column(s) shared by the annotated profiles and external metadata. When provided, these keys are used on both sides of the external merge.

  • compression_options (str or dict, optional) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file as input to pd.DataFrame.to_csv(float_format=float_format). For example, use “%.3g” for 3 decimal precision.

  • cmap_args (dict, default None) – Potential keyword arguments for annotate_cmap(). See cyto_utils/annotate_custom.py for more details.

Returns:

pd.DataFrame:

DataFrame of annotated features. If output_file=None, then return the DataFrame. If you specify output_file, then write to file and do not return data.

str:

If output_file is provided, then the function returns the path to the

Return type:

str or pd.DataFrame