Load

Module for loading profiles from files or dataframes.

pycytominer.cyto_utils.load.infer_delim(file: str | Path | Any) str

Sniff the delimiter in the given file

Parameters:

file (str) – File name

Returns:

the delimiter used in the dataframe (typically either tab or commas)

Return type:

str

pycytominer.cyto_utils.load.is_path_a_parquet_dataset_dir(file: str | Path) bool

Check whether a path is a parquet dataset directory.

Parameters:

file (Union[str, pathlib.Path]) – Path to inspect.

Returns:

Returns True when the path is a directory, contains at least one direct file child, and all direct file children are parquet files.

Return type:

bool

Raises:

FileNotFoundError – Raised if the provided path in the file does not exist.

Notes

If file is not a string or path-like object, the function prints a message and returns False rather than raising TypeError.

pycytominer.cyto_utils.load.is_path_a_parquet_file(file: str | Path) bool

Checks if the provided file path is a parquet file.

Identify parquet files by inspecting the file extensions. If the file does not end with parquet, this will return False, else True.

Parameters:

file (Union[str, pathlib.Path]) – path to parquet file

Returns:

Returns True if the file path contains .parquet, else it will return False

Return type:

bool

Raises:

FileNotFoundError – Raised if the provided path in the file does not exist.

Notes

If file is not a string or path-like object, the function prints a message and returns False rather than raising TypeError.

pycytominer.cyto_utils.load.load_cytotable_profiles(warehouse_path: str | Path | PurePath, table_name: str = 'joined_profiles', namespace: str = 'profiles') DataFrame

Load a profile table from a CytoTable-style warehouse layout.

This helper loads profile data stored as parquet fragments within an Iceberg-style table directory, typically under warehouse/<namespace>/<table_name>/data, where namespace is typically profiles. It is intended for CytoTable-style local outputs that organize tables by namespace and table name for downstream Pycytominer processing.

Parameters:
  • warehouse_path (path-like) – Path to either the warehouse root or the project directory that contains a warehouse/ directory.

  • table_name (str, default "joined_profiles") – Table name to load from within the namespace. The default, joined_profiles, is the conventional CytoTable table that joins object-level profile measurements across compartments into one profile table.

  • namespace (str, default "profiles") – Iceberg namespace that contains the table. For profile data this is typically profiles.

Returns:

Loaded table as a pandas dataframe.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – Raised when the requested table cannot be resolved to a parquet dataset.

pycytominer.cyto_utils.load.load_npz_features(npz_file: str, fallback_feature_prefix: str = 'DP', metadata: bool = True) DataFrame

Load an npz file storing features and, sometimes, metadata.

The function will first search the .npz file for a metadata column called “Metadata_Model”. If the field exists, the function uses this entry as the feature prefix. If it doesn’t exist, use the fallback_feature_prefix.

If the npz file does not exist, this function returns an empty dataframe.

Parameters:
  • npz_file (str) – file path to the compressed output (typically DeepProfiler output)

  • fallback_feature_prefix (str) – a string to prefix all features [default: “DP”].

  • metadata (bool) – whether or not to load metadata [default: True]

Returns:

df – pandas DataFrame of profiles

Return type:

pd.DataFrame

pycytominer.cyto_utils.load.load_npz_locations(npz_file: str, location_x_col_index: int = 0, location_y_col_index: int = 1) DataFrame

Load an npz file storing locations and, sometimes, metadata.

The function will first search the .npz file for a metadata column called “locations”. If the field exists, the function uses this entry as the feature prefix.

If the npz file does not exist, this function returns an empty dataframe.

Parameters:
  • npz_file (str) – file path to the compressed output (typically DeepProfiler output)

  • location_x_col_index (int) – index of the x location column (which column in DP output has X coords)

  • location_y_col_index (int) – index of the y location column (which column in DP output has Y coords)

Returns:

df – pandas DataFrame of profiles

Return type:

pd.DataFrame

pycytominer.cyto_utils.load.load_platemap(platemap: str | DataFrame, add_metadata_id=True) DataFrame

Unless a dataframe is provided, load the given platemap dataframe from path or string

Parameters:
  • platemap (pd.DataFrame or str) – location or actual pd.DataFrame of platemap file

  • add_metadata_id (bool) – boolean if Metadata_ should be appended to all platemap columns

Returns:

platemap – pandas DataFrame of profiles

Return type:

pd.DataFrame

pycytominer.cyto_utils.load.load_profiles(profiles: str | Path | PurePath | DataFrame | AnnDataLike) DataFrame

Unless a dataframe is provided, load the given profile dataframe from path or string.

This loader supports direct files, parquet dataset directories, AnnData inputs, and unambiguous CytoTable-style warehouse roots that contain a single parquet-backed table under profiles/*/data. This is the entry point used by higher-level functions such as normalize() and annotate() when they receive a path-like profiles input. If a warehouse path contains multiple profile tables, this loader will not guess which one to use; call load_cytotable_profiles() directly with an explicit table_name and namespace instead.

Parameters:

profiles – {str, pathlib.Path, pathlib.PurePath, pandas.DataFrame, ad.AnnData} File location, warehouse root, or in-memory profile data.

Returns:

  • pandas DataFrame of profiles

  • Raises

  • ——-

  • FileNotFoundError – Raised if the provided profile does not exists

pycytominer.cyto_utils.load.resolve_cytotable_profiles_target(warehouse_path: str | Path | PurePath) tuple[Path, str, str] | None

Resolve a single profile table from a CytoTable-style warehouse.

This helper only auto-resolves a target when exactly one parquet-backed profile table is present under the expected profile namespace layout. It does not infer which table to use based on downstream pycytominer operations or processing level; callers must be explicit when multiple profile tables are available.

Parameters:

warehouse_path (path-like) – Path to either the warehouse root or a project directory that contains a warehouse/ directory.

Returns:

Returns the resolved warehouse root path, namespace, and table name when exactly one parquet-backed profile table can be identified under the profile namespace. Returns None when the path does not expose a profile namespace in either <root>/profiles/<table> or <root>/warehouse/profiles/<table> form.

Return type:

tuple[pathlib.Path, str, str] or None

Raises:

ValueError – Raised when multiple parquet-backed profile tables are found and the intended target is ambiguous. This helper is only for the convenience case where a warehouse path exposes exactly one profile table. When multiple tables are present, use load_cytotable_profiles() with an explicit namespace and table name.

pycytominer.cyto_utils.load.resolve_parquet_path(path_like: str | Path | PurePath) Path | None

Resolve file and dataset paths that pandas can read via parquet.

Parameters:

path_like (path-like) – Path to inspect.

Returns:

Resolved parquet file or dataset directory. Returns None when the path does not point to a parquet-backed source. This helper also resolves Iceberg-style table directories whose parquet data lives under a data/ child directory, such as CytoTable warehouse tables.

Return type:

pathlib.Path or None