Annotate¶
Annotates profiles with metadata information
- pycytominer.annotate.annotate(profiles: str | DataFrame, platemap: str | DataFrame, join_on: str | list[str] = ['Metadata_well_position', 'Metadata_Well'], output_file: str | None = None, output_type: Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr'] | None = 'csv', add_metadata_id_to_platemap: bool = True, format_broad_cmap: bool = False, clean_cellprofiler: bool = True, external_metadata: str | DataFrame | None = None, external_join_on: str | list[str] | None = None, compression_options: str | dict[str, str] | None = None, float_format: str | None = None, cmap_args: dict[str, str] | None = None, **kwargs) DataFrame | str¶
Add metadata to aggregated profiles.
- Parameters:
profiles (pd.DataFrame or file) – DataFrame or file path of profiles.
platemap (pd.DataFrame or file) – Dataframe or file path of platemap metadata.
join_on (list or str, default: ["Metadata_well_position", "Metadata_Well"]) – Which variables to merge profiles and plate. The first element indicates variable(s) in platemap and the second element indicates variable(s) in profiles to merge using. Note the setting of add_metadata_id_to_platemap
output_file (str, optional) – If not specified, will return the annotated profiles. We recommend that this output file be suffixed with “_augmented.csv”.
output_type (str, optional) – If provided, will write annotated profiles as a specified file type (either CSV or parquet). If not specified and output_file is provided, then the file will be outputed as CSV as default.
add_metadata_id_to_platemap (bool, default True) – Whether the plate map variables possibly need “Metadata” pre-pended
format_broad_cmap (bool, default False) – Whether we need to add columns to make compatible with Broad CMAP naming conventions.
clean_cellprofiler (bool, default True) – Clean specific CellProfiler feature names by dropping Image_ prefix. Default is true as the most common use case is annotating CellProfiler profiles, but this can be set to False if you are not using CellProfiler.
external_metadata (pd.DataFrame or file, optional) – DataFrame or file with additional metadata information. Most common use case is a QC.parquet file with QC flags for each profile that comes from coSMicQC.
external_join_on (str or list, optional) – Merge column(s) shared by the annotated profiles and external metadata. When provided, these keys are used on both sides of the external merge.
compression_options (str or dict, optional) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file as input to pd.DataFrame.to_csv(float_format=float_format). For example, use “%.3g” for 3 decimal precision.
cmap_args (dict, default None) – Potential keyword arguments for annotate_cmap(). See cyto_utils/annotate_custom.py for more details.
- Returns:
- pd.DataFrame:
DataFrame of annotated features. If output_file=None, then return the DataFrame. If you specify output_file, then write to file and do not return data.
- str:
If output_file is provided, then the function returns the path to the
- Return type:
str or pd.DataFrame