Cells

Class to interact with single cell morphological profiles.

class pycytominer.cyto_utils.cells.SingleCells(sql_file: str, strata: list[str] = ['Metadata_Plate', 'Metadata_Well'], aggregation_operation: str = 'median', output_file: str | None = None, compartments: list[str] = ['cells', 'cytoplasm', 'nuclei'], compartment_linking_cols: dict[str, dict[str, str]] = {'cells': {'cytoplasm': 'ObjectNumber'}, 'cytoplasm': {'cells': 'Cytoplasm_Parent_Cells', 'nuclei': 'Cytoplasm_Parent_Nuclei'}, 'nuclei': {'cytoplasm': 'ObjectNumber'}}, merge_cols: list[str] = ['TableNumber', 'ImageNumber'], image_cols: list[str] = ['TableNumber', 'ImageNumber', 'Metadata_Site'], add_image_features: bool = False, image_feature_categories: list[str] | None = None, features: str | list[str] = 'infer', load_image_data: bool = True, image_table_name: str = 'image', subsample_frac: float = 1.0, subsample_n: str | int = 'all', subsampling_random_state: str | int | None = None, fields_of_view: str | list[int] = 'all', fields_of_view_feature: str = 'Metadata_Site', object_feature: str = 'Metadata_ObjectNumber', default_datatype_float: type[~numpy.generic] = <class 'numpy.float64'>)

Bases: object

This is a class to interact with single cell morphological profiles. Interaction includes aggregation, normalization, and output.

sql_file

SQLite connection pointing to the single cell database. The string prefix must be “sqlite:///”.

Type:

str

strata

The columns to groupby and aggregate single cells.

Type:

list of str, default [“Metadata_Plate”, “Metadata_Well”]

aggregation_operation

Operation to perform single cell aggregation.

Type:

str, default “median”

output_file

If specified, the location to write the file.

Type:

str, default None

compartments

list of compartments to process.

Type:

list of str, default [“cells”, “cytoplasm”, “nuclei”]

compartment_linking_cols

Dictionary identifying how to merge columns across tables.

Type:

dict, default noted below

merge_cols

Columns indicating how to merge image and compartment data.

Type:

list of str, default [“TableNumber”, “ImageNumber”]

image_cols

Columns to select from the image table.

Type:

list of str, default [“TableNumber”, “ImageNumber”, “Metadata_Site”]

add_image_features

Whether to add image features to the profiles.

Type:

bool, default False

image_feature_categories

list of categories of features from the image table to add to the profiles.

Type:

list of str, optional

features

list of features that should be loaded or aggregated.

Type:

str or list of str, default “infer”

load_image_data

Whether or not the image data should be loaded into memory.

Type:

bool, default True

image_table_name

The name of the table inside the SQLite file of image measurements.

Type:

str, default “image”

subsample_frac

The percentage of single cells to select (0 < subsample_frac <= 1).

Type:

float, default 1

subsample_n

How many samples to subsample - do not specify both subsample_frac and subsample_n.

Type:

str or int, default “all”

subsampling_random_state

The random state to init subsample.

Type:

str or int, default None

fields_of_view

list of fields of view to aggregate.

Type:

list of int, str, default “all”

fields_of_view_feature

Name of the fields of view feature.

Type:

str, default “Metadata_Site”

object_feature

Object number feature.

Type:

str, default “Metadata_ObjectNumber”

default_datatype_float

Numpy floating point datatype to use for load_compartment and resulting dataframes. This parameter may be used to assist with performance-related issues by reducing the memory required for floating-point data. For example, using np.float32 instead of np.float64 for this parameter will reduce memory consumed by float columns by roughly 50%. Please note: using any besides np.float64 are experimentally unverified.

Type:

type

Notes

Note

The argument compartment_linking_cols is designed to work with CellProfiler output, as curated by cytominer-database. The default is:

{
    "cytoplasm": {
        "cells": "Cytoplasm_Parent_Cells",
        "nuclei": "Cytoplasm_Parent_Nuclei",
    },
    "cells": {"cytoplasm": "ObjectNumber"},
    "nuclei": {"cytoplasm": "ObjectNumber"},
}
aggregate_compartment(compartment: str, compute_subsample: bool = False, compute_counts: bool = False, add_image_features: bool = False, n_aggregation_memory_strata: int = 1) DataFrame

Aggregate morphological profiles. Uses pycytominer.aggregate()

Parameters:
  • compartment (str) – Compartment to aggregate.

  • compute_subsample (bool, default False) – Whether or not to subsample.

  • compute_counts (bool, default False) – Whether or not to compute the number of objects in each compartment and the number of fields of view per well.

  • add_image_features (bool, default False) – Whether or not to add image features.

  • n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory. For example, if aggregating by “well”, then n_aggregation_memory_strata=1 means that one “well” will be pulled from the SQLite database into memory at a time.

Returns:

DataFrame of aggregated profiles.

Return type:

pd.DataFrame

aggregate_profiles(compute_subsample: bool = False, output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, n_aggregation_memory_strata: int = 1, **kwargs)

Aggregate and merge compartments. This is the primary entry to this class.

Parameters:
  • compute_subsample (bool, default False) – Whether or not to compute subsample. compute_subsample must be specified to perform subsampling. The function aggregate_profiles(compute_subsample=True) will apply subsetting even if subsample is initialized.

  • output_file (str, optional) – The name of a file to output. We recommended that, if provided, the output file be suffixed with “_augmented”.

  • compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file.

  • n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory.

Returns:

if output_file=None) returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pd.DataFrame or str

count_cells(compartment: str = 'cells', merge_cols: list[str] = ['TableNumber', 'ImageNumber'], object_col: str = 'ObjectNumber', image_count_col: str = 'Count_Cells', count_subset: bool = False) DataFrame

Determine how many cells are measured per well.

Parameters:
  • compartment (str, default "cells") – Compartment to subset.

  • merge_cols (list[str], default ["TableNumber", "ImageNumber"]) – Columns used to merge image and compartment tables when falling back to object-level counting. Must include at least one column when image_count_col is unavailable.

  • object_col (str, default "ObjectNumber") – Column used as the object identifier when falling back to object-level counting. Must be non-empty when image_count_col is unavailable.

  • image_count_col (str, default "Count_Cells") – Image-level count column to sum by strata before falling back to object-level counting.

  • count_subset (bool, default False) – Whether or not count the number of cells as specified by the strata groups.

Returns:

DataFrame of cell counts in the experiment.

Return type:

pd.DataFrame

count_sql_table_rows(table: str)

Count total number of rows for a table.

get_sql_table_col_names(table: str)

Get column names from the database.

get_subsample(df: DataFrame | None = None, compartment: str = 'cells', rename_col: bool = True)

Apply the subsampling procedure.

Parameters:
  • df (pd.DataFrame) – DataFrame of a single cell profile.

  • compartment (str, default "cells") – The compartment to process.

  • rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

Nothing is returned.

Return type:

None

load_compartment(compartment: str) DataFrame

Creates the compartment dataframe.

Note: makes use of default_datatype_float attribute for setting a default floating point datatype.

Parameters:

compartment (str) – The compartment to process.

Returns:

Compartment dataframe.

Return type:

pd.DataFrame

load_image(image_table_name: str | None = None)

Load image table from sqlite file

Returns:

Nothing is returned.

Return type:

None

merge_single_cells(compute_subsample: bool = False, sc_output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, single_cell_normalize: bool = False, normalize_args: dict | None = None, platemap: str | DataFrame | None = None, **kwargs) DataFrame | str

Given the linking columns, merge single cell data. Normalization is also supported.

Parameters:
  • compute_subsample (bool, default False) – Whether or not to compute subsample.

  • sc_output_file (str, optional) – The name of a file to output.

  • compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file.

  • single_cell_normalize (bool, default False) – Whether or not to normalize the single cell data.

  • normalize_args (dict, optional) – Additional arguments passed as input to pycytominer.normalize().

  • platemap (str or pd.DataFrame, default None) – optional platemap filepath str or pd.DataFrame to be used with results via annotate

Returns:

if output_file=None returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pd.DataFrame or str

set_output_file(output_file: str)

Setting operation to conveniently rename output file.

Parameters:

output_file (str) – New output file name.

Returns:

Nothing is returned.

Return type:

None

set_subsample_frac(subsample_frac: float)

Setting operation to conveniently update the subsample fraction.

Parameters:

subsample_frac (float, default 1) – Percentage of single cells to select (0 < subsample_frac <= 1).

Returns:

Nothing is returned.

Return type:

None

set_subsample_n(subsample_n: str | int)

Setting operation to conveniently update the subsample n.

Parameters:

subsample_n (int, default "all") – Indicate how many sample to subsample - do not specify both subsample_frac and subsample_n.

Returns:

Nothing is returned.

Return type:

None

set_subsample_random_state(random_state: int)

Setting operation to conveniently update the subsample random state.

Parameters:

random_state (int, optional) – The random state to init subsample.

Returns:

Nothing is returned.

Return type:

None

split_column_categories(col_names: list[str])

Split a list of column names into feature and metadata columns lists.

subsample_profiles(df: DataFrame, rename_col: bool = True) DataFrame

Sample a Pandas DataFrame given subsampling information.

Parameters:
  • df (pd.DataFrame) – DataFrame of a single cell profile.

  • rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

A subsampled pandas dataframe of single cell profiles.

Return type:

pd.DataFrame