Cells¶

Class to interact with single cell morphological profiles.

class pycytominer.cyto_utils.cells.SingleCells(sql_file: str, strata: list[str] = ['Metadata_Plate', 'Metadata_Well'], aggregation_operation: str = 'median', output_file: str | None = None, compartments: list[str] = ['cells', 'cytoplasm', 'nuclei'], compartment_linking_cols: dict[str, dict[str, str]] = {'cells': {'cytoplasm': 'ObjectNumber'}, 'cytoplasm': {'cells': 'Cytoplasm_Parent_Cells', 'nuclei': 'Cytoplasm_Parent_Nuclei'}, 'nuclei': {'cytoplasm': 'ObjectNumber'}}, merge_cols: list[str] = ['TableNumber', 'ImageNumber'], image_cols: list[str] = ['TableNumber', 'ImageNumber', 'Metadata_Site'], add_image_features: bool = False, image_feature_categories: list[str] | None = None, features: str | list[str] = 'infer', load_image_data: bool = True, image_table_name: str = 'image', subsample_frac: float = 1.0, subsample_n: str | int = 'all', subsampling_random_state: str | int | None = None, fields_of_view: str | list[int] = 'all', fields_of_view_feature: str = 'Metadata_Site', object_feature: str = 'Metadata_ObjectNumber', default_datatype_float: type[~numpy.generic] = <class 'numpy.float64'>)¶

Bases: object

This is a class to interact with single cell morphological profiles. Interaction includes aggregation, normalization, and output.

Warning

The SingleCells class is deprecated and will be removed in a future Pycytominer release. Please use CytoTable instead.

sql_file¶

SQLite connection pointing to the single cell database. The string prefix must be “sqlite:///”.

Type:: str

strata¶

The columns to groupby and aggregate single cells.

Type:: list of str, default [“Metadata_Plate”, “Metadata_Well”]

aggregation_operation¶

Operation to perform single cell aggregation.

Type:: str, default “median”

output_file¶

If specified, the location to write the file.

Type:: str, default None

compartments¶

list of compartments to process.

Type:: list of str, default [“cells”, “cytoplasm”, “nuclei”]

compartment_linking_cols¶

Dictionary identifying how to merge columns across tables.

Type:: dict, default noted below

merge_cols¶

Columns indicating how to merge image and compartment data.

Type:: list of str, default [“TableNumber”, “ImageNumber”]

image_cols¶

Columns to select from the image table.

Type:: list of str, default [“TableNumber”, “ImageNumber”, “Metadata_Site”]

add_image_features¶

Whether to add image features to the profiles.

Type:: bool, default False

image_feature_categories¶

list of categories of features from the image table to add to the profiles.

Type:: list of str, optional

features¶

list of features that should be loaded or aggregated.

Type:: str or list of str, default “infer”

load_image_data¶

Whether or not the image data should be loaded into memory.

Type:: bool, default True

image_table_name¶

The name of the table inside the SQLite file of image measurements.

Type:: str, default “image”

subsample_frac¶

The percentage of single cells to select (0 < subsample_frac <= 1).

Type:: float, default 1

subsample_n¶

How many samples to subsample - do not specify both subsample_frac and subsample_n.

Type:: str or int, default “all”

subsampling_random_state¶

The random state to init subsample.

Type:: str or int, default None

fields_of_view¶

list of fields of view to aggregate.

Type:: list of int, str, default “all”

fields_of_view_feature¶

Name of the fields of view feature.

Type:: str, default “Metadata_Site”

object_feature¶

Object number feature.

Type:: str, default “Metadata_ObjectNumber”

default_datatype_float¶

Numpy floating point datatype to use for load_compartment and resulting dataframes. This parameter may be used to assist with performance-related issues by reducing the memory required for floating-point data. For example, using np.float32 instead of np.float64 for this parameter will reduce memory consumed by float columns by roughly 50%. Please note: using any besides np.float64 are experimentally unverified.

Type:: type

Notes

Note

The argument compartment_linking_cols is designed to work with CellProfiler output, as curated by cytominer-database. The default is:

{
    "cytoplasm": {
        "cells": "Cytoplasm_Parent_Cells",
        "nuclei": "Cytoplasm_Parent_Nuclei",
    },
    "cells": {"cytoplasm": "ObjectNumber"},
    "nuclei": {"cytoplasm": "ObjectNumber"},
}

aggregate_compartment(compartment: str, compute_subsample: bool = False, compute_counts: bool = False, add_image_features: bool = False, n_aggregation_memory_strata: int = 1) → DataFrame¶

Aggregate morphological profiles. Uses pycytominer.aggregate()

Parameters:

compartment (str) – Compartment to aggregate.
compute_subsample (bool, default False) – Whether or not to subsample.
compute_counts (bool, default False) – Whether or not to compute the number of objects in each compartment and the number of fields of view per well.
add_image_features (bool, default False) – Whether or not to add image features.
n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory. For example, if aggregating by “well”, then n_aggregation_memory_strata=1 means that one “well” will be pulled from the SQLite database into memory at a time.

Returns:

DataFrame of aggregated profiles.

Return type:

pd.DataFrame

aggregate_profiles(compute_subsample: bool = False, output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, n_aggregation_memory_strata: int = 1, **kwargs)¶

Aggregate and merge compartments. This is the primary entry to this class.

Parameters:

compute_subsample (bool, default False) – Whether or not to compute subsample. compute_subsample must be specified to perform subsampling. The function aggregate_profiles(compute_subsample=True) will apply subsetting even if subsample is initialized.
output_file (str, optional) – The name of a file to output. We recommended that, if provided, the output file be suffixed with “_augmented”.
compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file.
n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory.

Returns:

if output_file=None) returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pd.DataFrame or str

count_cells(compartment: str = 'cells', merge_cols: list[str] = ['TableNumber', 'ImageNumber'], object_col: str = 'ObjectNumber', image_count_col: str = 'Count_Cells', count_subset: bool = False) → DataFrame¶

Determine how many cells are measured per well.

Parameters:

compartment (str, default "cells") – Compartment to subset.
merge_cols (list[str], default ["TableNumber", "ImageNumber"]) – Columns used to merge image and compartment tables when falling back to object-level counting. Must include at least one column when image_count_col is unavailable.
object_col (str, default "ObjectNumber") – Column used as the object identifier when falling back to object-level counting. Must be non-empty when image_count_col is unavailable.
image_count_col (str, default "Count_Cells") – Image-level count column to sum by strata before falling back to object-level counting.
count_subset (bool, default False) – Whether or not count the number of cells as specified by the strata groups.

Returns:

DataFrame of cell counts in the experiment.

Return type:

pd.DataFrame

count_sql_table_rows(table: str)¶: Count total number of rows for a table.

get_sql_table_col_names(table: str)¶: Get column names from the database.

get_subsample(df: DataFrame | None = None, compartment: str = 'cells', rename_col: bool = True)¶

Apply the subsampling procedure.

Parameters:

df (pd.DataFrame) – DataFrame of a single cell profile.
compartment (str, default "cells") – The compartment to process.
rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

Nothing is returned.

Return type:

None

load_compartment(compartment: str) → DataFrame¶

Creates the compartment dataframe.

Note: makes use of default_datatype_float attribute for setting a default floating point datatype.

Parameters:: compartment (str) – The compartment to process.
Returns:: Compartment dataframe.
Return type:: pd.DataFrame

load_image(image_table_name: str | None = None)¶

Load image table from sqlite file

Returns:: Nothing is returned.
Return type:: None

Given the linking columns, merge single cell data. Normalization is also supported.

Parameters:

compute_subsample (bool, default False) – Whether or not to compute subsample.
sc_output_file (str, optional) – The name of a file to output.
compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file.
single_cell_normalize (bool, default False) – Whether or not to normalize the single cell data.
normalize_args (dict, optional) – Additional arguments passed as input to pycytominer.normalize().
platemap (str or pd.DataFrame, default None) – optional platemap filepath str or pd.DataFrame to be used with results via annotate

Returns:

if output_file=None returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pd.DataFrame or str

set_output_file(output_file: str)¶

Setting operation to conveniently rename output file.

Parameters:: output_file (str) – New output file name.
Returns:: Nothing is returned.
Return type:: None

set_subsample_frac(subsample_frac: float)¶

Setting operation to conveniently update the subsample fraction.

Parameters:: subsample_frac (float, default 1) – Percentage of single cells to select (0 < subsample_frac <= 1).
Returns:: Nothing is returned.
Return type:: None

set_subsample_n(subsample_n: str | int)¶

Setting operation to conveniently update the subsample n.

Parameters:: subsample_n (int, default "all") – Indicate how many sample to subsample - do not specify both subsample_frac and subsample_n.
Returns:: Nothing is returned.
Return type:: None

set_subsample_random_state(random_state: int)¶

Setting operation to conveniently update the subsample random state.

Parameters:: random_state (int, optional) – The random state to init subsample.
Returns:: Nothing is returned.
Return type:: None

split_column_categories(col_names: list[str])¶: Split a list of column names into feature and metadata columns lists.

subsample_profiles(df: DataFrame, rename_col: bool = True) → DataFrame¶

Sample a Pandas DataFrame given subsampling information.

Parameters:

df (pd.DataFrame) – DataFrame of a single cell profile.
rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

A subsampled pandas dataframe of single cell profiles.

Return type:

pd.DataFrame