Load¶
Module for loading profiles from files or dataframes.
- pycytominer.cyto_utils.load.infer_delim(file: str | Path | Any) str¶
Sniff the delimiter in the given file
- Parameters:
file (str) – File name
- Returns:
the delimiter used in the dataframe (typically either tab or commas)
- Return type:
str
- pycytominer.cyto_utils.load.is_path_a_parquet_dataset_dir(file: str | Path) bool¶
Check whether a path is a parquet dataset directory.
- Parameters:
file (Union[str, pathlib.Path]) – Path to inspect.
- Returns:
Returns True when the path is a directory, contains at least one direct file child, and all direct file children are parquet files.
- Return type:
bool
- Raises:
FileNotFoundError – Raised if the provided path in the file does not exist.
Notes
If file is not a string or path-like object, the function prints a message and returns False rather than raising TypeError.
- pycytominer.cyto_utils.load.is_path_a_parquet_file(file: str | Path) bool¶
Checks if the provided file path is a parquet file.
Identify parquet files by inspecting the file extensions. If the file does not end with parquet, this will return False, else True.
- Parameters:
file (Union[str, pathlib.Path]) – path to parquet file
- Returns:
Returns True if the file path contains .parquet, else it will return False
- Return type:
bool
- Raises:
FileNotFoundError – Raised if the provided path in the file does not exist.
Notes
If file is not a string or path-like object, the function prints a message and returns False rather than raising TypeError.
- pycytominer.cyto_utils.load.load_cytotable_profiles(warehouse_path: str | Path | PurePath, table_name: str = 'joined_profiles', namespace: str = 'profiles') DataFrame¶
Load a profile table from a CytoTable-style warehouse layout.
This helper loads profile data stored as parquet fragments within an Iceberg-style table directory, typically under
warehouse/<namespace>/<table_name>/data, where namespace is typicallyprofiles. It is intended for CytoTable-style local outputs that organize tables by namespace and table name for downstream Pycytominer processing.- Parameters:
warehouse_path (path-like) – Path to either the warehouse root or the project directory that contains a warehouse/ directory.
table_name (str, default "joined_profiles") – Table name to load from within the namespace. The default,
joined_profiles, is the conventional CytoTable table that joins object-level profile measurements across compartments into one profile table.namespace (str, default "profiles") – Iceberg namespace that contains the table. For profile data this is typically profiles.
- Returns:
Loaded table as a pandas dataframe.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – Raised when the requested table cannot be resolved to a parquet dataset.
- pycytominer.cyto_utils.load.load_npz_features(npz_file: str, fallback_feature_prefix: str = 'DP', metadata: bool = True) DataFrame¶
Load an npz file storing features and, sometimes, metadata.
The function will first search the .npz file for a metadata column called “Metadata_Model”. If the field exists, the function uses this entry as the feature prefix. If it doesn’t exist, use the fallback_feature_prefix.
If the npz file does not exist, this function returns an empty dataframe.
- Parameters:
npz_file (str) – file path to the compressed output (typically DeepProfiler output)
fallback_feature_prefix (str) – a string to prefix all features [default: “DP”].
metadata (bool) – whether or not to load metadata [default: True]
- Returns:
df – pandas DataFrame of profiles
- Return type:
pd.DataFrame
- pycytominer.cyto_utils.load.load_npz_locations(npz_file: str, location_x_col_index: int = 0, location_y_col_index: int = 1) DataFrame¶
Load an npz file storing locations and, sometimes, metadata.
The function will first search the .npz file for a metadata column called “locations”. If the field exists, the function uses this entry as the feature prefix.
If the npz file does not exist, this function returns an empty dataframe.
- Parameters:
npz_file (str) – file path to the compressed output (typically DeepProfiler output)
location_x_col_index (int) – index of the x location column (which column in DP output has X coords)
location_y_col_index (int) – index of the y location column (which column in DP output has Y coords)
- Returns:
df – pandas DataFrame of profiles
- Return type:
pd.DataFrame
- pycytominer.cyto_utils.load.load_platemap(platemap: str | DataFrame, add_metadata_id=True) DataFrame¶
Unless a dataframe is provided, load the given platemap dataframe from path or string
- Parameters:
platemap (pd.DataFrame or str) – location or actual pd.DataFrame of platemap file
add_metadata_id (bool) – boolean if
Metadata_should be appended to all platemap columns
- Returns:
platemap – pandas DataFrame of profiles
- Return type:
pd.DataFrame
- pycytominer.cyto_utils.load.load_profiles(profiles: str | Path | PurePath | DataFrame | AnnDataLike) DataFrame¶
Unless a dataframe is provided, load the given profile dataframe from path or string.
This loader supports direct files, parquet dataset directories, AnnData inputs, and unambiguous CytoTable-style warehouse roots that contain a single parquet-backed table under
profiles/*/data. This is the entry point used by higher-level functions such asnormalize()andannotate()when they receive a path-likeprofilesinput. If a warehouse path contains multiple profile tables, this loader will not guess which one to use; callload_cytotable_profiles()directly with an explicittable_nameandnamespaceinstead.- Parameters:
profiles – {str, pathlib.Path, pathlib.PurePath, pandas.DataFrame, ad.AnnData} File location, warehouse root, or in-memory profile data.
- Returns:
pandas DataFrame of profiles
Raises
——-
FileNotFoundError – Raised if the provided profile does not exists
- pycytominer.cyto_utils.load.resolve_cytotable_profiles_target(warehouse_path: str | Path | PurePath) tuple[Path, str, str] | None¶
Resolve a single profile table from a CytoTable-style warehouse.
This helper only auto-resolves a target when exactly one parquet-backed profile table is present under the expected profile namespace layout. It does not infer which table to use based on downstream pycytominer operations or processing level; callers must be explicit when multiple profile tables are available.
- Parameters:
warehouse_path (path-like) – Path to either the warehouse root or a project directory that contains a
warehouse/directory.- Returns:
Returns the resolved warehouse root path, namespace, and table name when exactly one parquet-backed profile table can be identified under the profile namespace. Returns None when the path does not expose a profile namespace in either
<root>/profiles/<table>or<root>/warehouse/profiles/<table>form.- Return type:
tuple[pathlib.Path, str, str] or None
- Raises:
ValueError – Raised when multiple parquet-backed profile tables are found and the intended target is ambiguous. This helper is only for the convenience case where a warehouse path exposes exactly one profile table. When multiple tables are present, use
load_cytotable_profiles()with an explicit namespace and table name.
- pycytominer.cyto_utils.load.resolve_parquet_path(path_like: str | Path | PurePath) Path | None¶
Resolve file and dataset paths that pandas can read via parquet.
- Parameters:
path_like (path-like) – Path to inspect.
- Returns:
Resolved parquet file or dataset directory. Returns None when the path does not point to a parquet-backed source. This helper also resolves Iceberg-style table directories whose parquet data lives under a
data/child directory, such as CytoTable warehouse tables.- Return type:
pathlib.Path or None