Blocklist¶
Blocklist utilities for excluding unwanted features from profile DataFrames.
Although the packaged default blocklist targets known-noisy CellProfiler
features, the Blocklist class works with any feature names — including
embeddings, custom morphological measurements, or any other column-based
profile type.
- class pycytominer.cyto_utils.blocklist.Blocklist(blocklist_name: str | list[str] | None = None, features_to_block: list[str] | None = None, blocklists_file: str | Path = '/home/docs/checkouts/readthedocs.org/user_builds/pycytominer/checkouts/stable/pycytominer/cyto_utils/../data/default_blocklists.yaml')¶
Bases:
objectA collection of feature names to exclude from downstream analysis.
A
Blocklistholds feature names that are known to be noisy, uninformative, or otherwise undesirable. While the packaged default targets known-problematic CellProfiler measurements,Blocklistworks with any column-based profile — CellProfiler features, embeddings, custom morphological measurements, or any other feature type. It can be built from any combination of three sources:Packaged named lists — pycytominer ships a
default_blocklists.yamlregistry whose top-level keys are named lists of features. Pass one or more names viablocklist_nameto load them. The key"default"loads the curated pycytominer default, which is derived from Way (2019) [1].Explicit feature names — pass a list of column names directly via
features_to_block.Custom YAML registry — supply your own YAML file via
blocklists_fileand reference its named lists withblocklist_name. Feature names can follow any naming convention (CellProfiler prefixes, embedding dimensions, custom names, etc.). The file must follow the format:my_list: - embedding_dim_42 - Cells_MyFeature_A another_list: - my_custom_feature
Any top-level key becomes a valid
blocklist_name.
When constructing, named list(s) are loaded first (in order), then
features_to_blockentries are appended. Duplicates are preserved; callto_list()and deduplicate manually if needed.- Parameters:
blocklist_name (str or list of str, optional) – Name(s) of lists to load from the blocklist registry. When multiple names are given, entries are appended in the order provided. If
None, no named list is loaded. Use"default"to load the curated pycytominer default blocklist.features_to_block (list of str, optional) – Additional feature names to append after loading any named list(s). If
blocklist_nameisNone, these are the only blocklisted features.blocklists_file (path-like, default packaged
default_blocklists.yaml) – Path to a YAML registry mapping list names to feature lists. Defaults to pycytominer’s packaged registry. Supply a custom path to use your own feature lists (see format above).
Examples
Use the packaged default blocklist (recommended starting point):
>>> bl = Blocklist(blocklist_name="default") >>> isinstance(bl.to_list(), list) True
Block an explicit set of project-specific features:
>>> bl = Blocklist(features_to_block=["Cells_MyFeature", "Nuclei_MyFeature"]) >>> bl.to_list() ['Cells_MyFeature', 'Nuclei_MyFeature']
Extend the packaged default with project-specific exclusions:
>>> bl = Blocklist( ... blocklist_name="default", ... features_to_block=["Cells_MyFeature"], ... )
Use a custom YAML registry instead of the packaged one (feature names can follow any convention — CellProfiler prefixes, embedding dimensions, etc.):
>>> import pathlib >>> # my_blocklists.yaml contains: >>> # qc_fails: >>> # - embedding_dim_42 >>> # - Cells_Texture_BadChannel >>> bl = Blocklist( ... blocklist_name="qc_fails", ... blocklists_file=pathlib.Path("my_blocklists.yaml"), ... )
Pass the result directly to
feature_select():>>> import pandas as pd >>> from pycytominer import feature_select >>> bl = Blocklist( ... blocklist_name="default", ... features_to_block=["Cells_MyFeature"], ... ) >>> # df = feature_select(profiles, operation="blocklist", blocklist=bl)
See also
get_blocklist_featuresResolve a
Blocklist(or shorthand forms) to a plain list of feature names present in a given profile DataFrame.feature_selectApply blocklist (and other) feature-selection operations to a profile DataFrame.
References
- add(features: list[str]) None¶
Add one or more feature names to the blocklist.
- to_list() list[str]¶
Return blocklist features as a list.
- pycytominer.cyto_utils.blocklist.get_blocklist_features(blocklist: str | list[str] | Blocklist | None = None, blocklist_name: str | list[str] | None = None, population_df: DataFrame | None = None, blocklist_file: str | Path | None = None) list[str]¶
Resolve blocklist inputs to a list of feature names present in a DataFrame.
Accepts the same shorthand forms supported by
feature_select()and returns a plain list of feature names, optionally filtered to only those that exist inpopulation_df. When bothblocklistandblocklist_nameareNone, the packaged default blocklist is used. For full details on blocklist construction and customization, seeBlocklist.- Parameters:
blocklist (str, list of str, or Blocklist, optional) – Feature name(s) to exclude. A
Blocklistobject may be passed directly for full customization (custom YAML, combined named + explicit features, etc.). IfNone,blocklist_nameor the packaged default is used instead.blocklist_name (str or list of str, optional) – Name(s) of packaged blocklists to load when
blocklistisNone. If both areNone, falls back toDEFAULT_BLOCKLIST_NAME("default").population_df (pd.DataFrame, optional) – When provided, the returned list is filtered to only feature names that appear as columns in this DataFrame.
blocklist_file (str or path-like, optional) –
Deprecated since version 2.0: Pass feature names via
blocklist(a list orBlocklistobject) instead.blocklist_fileaccepted a CSV file with a singleblocklistcolumn; that format is no longer the primary interface. This parameter will be removed in a future release.
- Returns:
blocklist_features – Feature names to exclude from downstream analysis.
- Return type:
list of str
See also
BlocklistFull reference for blocklist construction and customization.