Cells¶
Class to interact with single cell morphological profiles.
- class pycytominer.cyto_utils.cells.SingleCells(sql_file: str, strata: list[str] = ['Metadata_Plate', 'Metadata_Well'], aggregation_operation: str = 'median', output_file: str | None = None, compartments: list[str] = ['cells', 'cytoplasm', 'nuclei'], compartment_linking_cols: dict[str, dict[str, str]] = {'cells': {'cytoplasm': 'ObjectNumber'}, 'cytoplasm': {'cells': 'Cytoplasm_Parent_Cells', 'nuclei': 'Cytoplasm_Parent_Nuclei'}, 'nuclei': {'cytoplasm': 'ObjectNumber'}}, merge_cols: list[str] = ['TableNumber', 'ImageNumber'], image_cols: list[str] = ['TableNumber', 'ImageNumber', 'Metadata_Site'], add_image_features: bool = False, image_feature_categories: list[str] | None = None, features: str | list[str] = 'infer', load_image_data: bool = True, image_table_name: str = 'image', subsample_frac: float = 1.0, subsample_n: str | int = 'all', subsampling_random_state: str | int | None = None, fields_of_view: str | list[int] = 'all', fields_of_view_feature: str = 'Metadata_Site', object_feature: str = 'Metadata_ObjectNumber', default_datatype_float: type[~numpy.generic] = <class 'numpy.float64'>)¶
Bases:
objectThis is a class to interact with single cell morphological profiles. Interaction includes aggregation, normalization, and output.
- sql_file¶
SQLite connection pointing to the single cell database. The string prefix must be “sqlite:///”.
- Type:
str
- strata¶
The columns to groupby and aggregate single cells.
- Type:
list of str, default [“Metadata_Plate”, “Metadata_Well”]
- aggregation_operation¶
Operation to perform single cell aggregation.
- Type:
str, default “median”
- output_file¶
If specified, the location to write the file.
- Type:
str, default None
- compartments¶
list of compartments to process.
- Type:
list of str, default [“cells”, “cytoplasm”, “nuclei”]
- compartment_linking_cols¶
Dictionary identifying how to merge columns across tables.
- Type:
dict, default noted below
- merge_cols¶
Columns indicating how to merge image and compartment data.
- Type:
list of str, default [“TableNumber”, “ImageNumber”]
- image_cols¶
Columns to select from the image table.
- Type:
list of str, default [“TableNumber”, “ImageNumber”, “Metadata_Site”]
- add_image_features¶
Whether to add image features to the profiles.
- Type:
bool, default False
- image_feature_categories¶
list of categories of features from the image table to add to the profiles.
- Type:
list of str, optional
- features¶
list of features that should be loaded or aggregated.
- Type:
str or list of str, default “infer”
- load_image_data¶
Whether or not the image data should be loaded into memory.
- Type:
bool, default True
- image_table_name¶
The name of the table inside the SQLite file of image measurements.
- Type:
str, default “image”
- subsample_frac¶
The percentage of single cells to select (0 < subsample_frac <= 1).
- Type:
float, default 1
- subsample_n¶
How many samples to subsample - do not specify both subsample_frac and subsample_n.
- Type:
str or int, default “all”
- subsampling_random_state¶
The random state to init subsample.
- Type:
str or int, default None
- fields_of_view¶
list of fields of view to aggregate.
- Type:
list of int, str, default “all”
- fields_of_view_feature¶
Name of the fields of view feature.
- Type:
str, default “Metadata_Site”
- object_feature¶
Object number feature.
- Type:
str, default “Metadata_ObjectNumber”
- default_datatype_float¶
Numpy floating point datatype to use for load_compartment and resulting dataframes. This parameter may be used to assist with performance-related issues by reducing the memory required for floating-point data. For example, using np.float32 instead of np.float64 for this parameter will reduce memory consumed by float columns by roughly 50%. Please note: using any besides np.float64 are experimentally unverified.
- Type:
type
Notes
Note
The argument
compartment_linking_colsis designed to work with CellProfiler output, as curated by cytominer-database. The default is:{ "cytoplasm": { "cells": "Cytoplasm_Parent_Cells", "nuclei": "Cytoplasm_Parent_Nuclei", }, "cells": {"cytoplasm": "ObjectNumber"}, "nuclei": {"cytoplasm": "ObjectNumber"}, }
- aggregate_compartment(compartment: str, compute_subsample: bool = False, compute_counts: bool = False, add_image_features: bool = False, n_aggregation_memory_strata: int = 1) DataFrame¶
Aggregate morphological profiles. Uses pycytominer.aggregate()
- Parameters:
compartment (str) – Compartment to aggregate.
compute_subsample (bool, default False) – Whether or not to subsample.
compute_counts (bool, default False) – Whether or not to compute the number of objects in each compartment and the number of fields of view per well.
add_image_features (bool, default False) – Whether or not to add image features.
n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory. For example, if aggregating by “well”, then n_aggregation_memory_strata=1 means that one “well” will be pulled from the SQLite database into memory at a time.
- Returns:
DataFrame of aggregated profiles.
- Return type:
pd.DataFrame
- aggregate_profiles(compute_subsample: bool = False, output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, n_aggregation_memory_strata: int = 1, **kwargs)¶
Aggregate and merge compartments. This is the primary entry to this class.
- Parameters:
compute_subsample (bool, default False) – Whether or not to compute subsample. compute_subsample must be specified to perform subsampling. The function aggregate_profiles(compute_subsample=True) will apply subsetting even if subsample is initialized.
output_file (str, optional) – The name of a file to output. We recommended that, if provided, the output file be suffixed with “_augmented”.
compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file.
n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory.
- Returns:
if output_file=None) returns a Pandas dataframe else will write to file and return the filepath of the file
- Return type:
pd.DataFrame or str
- count_cells(compartment: str = 'cells', merge_cols: list[str] = ['TableNumber', 'ImageNumber'], object_col: str = 'ObjectNumber', image_count_col: str = 'Count_Cells', count_subset: bool = False) DataFrame¶
Determine how many cells are measured per well.
- Parameters:
compartment (str, default "cells") – Compartment to subset.
merge_cols (list[str], default ["TableNumber", "ImageNumber"]) – Columns used to merge image and compartment tables when falling back to object-level counting. Must include at least one column when image_count_col is unavailable.
object_col (str, default "ObjectNumber") – Column used as the object identifier when falling back to object-level counting. Must be non-empty when image_count_col is unavailable.
image_count_col (str, default "Count_Cells") – Image-level count column to sum by strata before falling back to object-level counting.
count_subset (bool, default False) – Whether or not count the number of cells as specified by the strata groups.
- Returns:
DataFrame of cell counts in the experiment.
- Return type:
pd.DataFrame
- count_sql_table_rows(table: str)¶
Count total number of rows for a table.
- get_sql_table_col_names(table: str)¶
Get column names from the database.
- get_subsample(df: DataFrame | None = None, compartment: str = 'cells', rename_col: bool = True)¶
Apply the subsampling procedure.
- Parameters:
df (pd.DataFrame) – DataFrame of a single cell profile.
compartment (str, default "cells") – The compartment to process.
rename_col (bool, default True) – Whether or not to rename the columns.
- Returns:
Nothing is returned.
- Return type:
None
- load_compartment(compartment: str) DataFrame¶
Creates the compartment dataframe.
Note: makes use of default_datatype_float attribute for setting a default floating point datatype.
- Parameters:
compartment (str) – The compartment to process.
- Returns:
Compartment dataframe.
- Return type:
pd.DataFrame
- load_image(image_table_name: str | None = None)¶
Load image table from sqlite file
- Returns:
Nothing is returned.
- Return type:
None
- merge_single_cells(compute_subsample: bool = False, sc_output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, single_cell_normalize: bool = False, normalize_args: dict | None = None, platemap: str | DataFrame | None = None, **kwargs) DataFrame | str¶
Given the linking columns, merge single cell data. Normalization is also supported.
- Parameters:
compute_subsample (bool, default False) – Whether or not to compute subsample.
sc_output_file (str, optional) – The name of a file to output.
compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.
float_format (str, optional) – Decimal precision to use in writing output file.
single_cell_normalize (bool, default False) – Whether or not to normalize the single cell data.
normalize_args (dict, optional) – Additional arguments passed as input to pycytominer.normalize().
platemap (str or pd.DataFrame, default None) – optional platemap filepath str or pd.DataFrame to be used with results via annotate
- Returns:
if output_file=None returns a Pandas dataframe else will write to file and return the filepath of the file
- Return type:
pd.DataFrame or str
- set_output_file(output_file: str)¶
Setting operation to conveniently rename output file.
- Parameters:
output_file (str) – New output file name.
- Returns:
Nothing is returned.
- Return type:
None
- set_subsample_frac(subsample_frac: float)¶
Setting operation to conveniently update the subsample fraction.
- Parameters:
subsample_frac (float, default 1) – Percentage of single cells to select (0 < subsample_frac <= 1).
- Returns:
Nothing is returned.
- Return type:
None
- set_subsample_n(subsample_n: str | int)¶
Setting operation to conveniently update the subsample n.
- Parameters:
subsample_n (int, default "all") – Indicate how many sample to subsample - do not specify both subsample_frac and subsample_n.
- Returns:
Nothing is returned.
- Return type:
None
- set_subsample_random_state(random_state: int)¶
Setting operation to conveniently update the subsample random state.
- Parameters:
random_state (int, optional) – The random state to init subsample.
- Returns:
Nothing is returned.
- Return type:
None
- split_column_categories(col_names: list[str])¶
Split a list of column names into feature and metadata columns lists.
- subsample_profiles(df: DataFrame, rename_col: bool = True) DataFrame¶
Sample a Pandas DataFrame given subsampling information.
- Parameters:
df (pd.DataFrame) – DataFrame of a single cell profile.
rename_col (bool, default True) – Whether or not to rename the columns.
- Returns:
A subsampled pandas dataframe of single cell profiles.
- Return type:
pd.DataFrame