Blocklist

Blocklist utilities for excluding unwanted features from profile DataFrames.

Although the packaged default blocklist targets known-noisy CellProfiler features, the Blocklist class works with any feature names — including embeddings, custom morphological measurements, or any other column-based profile type.

class pycytominer.cyto_utils.blocklist.Blocklist(blocklist_name: str | list[str] | None = None, features_to_block: list[str] | None = None, blocklists_file: str | Path = '/home/docs/checkouts/readthedocs.org/user_builds/pycytominer/checkouts/stable/pycytominer/cyto_utils/../data/default_blocklists.yaml')

Bases: object

A collection of feature names to exclude from downstream analysis.

A Blocklist holds feature names that are known to be noisy, uninformative, or otherwise undesirable. While the packaged default targets known-problematic CellProfiler measurements, Blocklist works with any column-based profile — CellProfiler features, embeddings, custom morphological measurements, or any other feature type. It can be built from any combination of three sources:

  1. Packaged named lists — pycytominer ships a default_blocklists.yaml registry whose top-level keys are named lists of features. Pass one or more names via blocklist_name to load them. The key "default" loads the curated pycytominer default, which is derived from Way (2019) [1].

  2. Explicit feature names — pass a list of column names directly via features_to_block.

  3. Custom YAML registry — supply your own YAML file via blocklists_file and reference its named lists with blocklist_name. Feature names can follow any naming convention (CellProfiler prefixes, embedding dimensions, custom names, etc.). The file must follow the format:

    my_list:
      - embedding_dim_42
      - Cells_MyFeature_A
    
    another_list:
      - my_custom_feature
    

    Any top-level key becomes a valid blocklist_name.

When constructing, named list(s) are loaded first (in order), then features_to_block entries are appended. Duplicates are preserved; call to_list() and deduplicate manually if needed.

Parameters:
  • blocklist_name (str or list of str, optional) – Name(s) of lists to load from the blocklist registry. When multiple names are given, entries are appended in the order provided. If None, no named list is loaded. Use "default" to load the curated pycytominer default blocklist.

  • features_to_block (list of str, optional) – Additional feature names to append after loading any named list(s). If blocklist_name is None, these are the only blocklisted features.

  • blocklists_file (path-like, default packaged default_blocklists.yaml) – Path to a YAML registry mapping list names to feature lists. Defaults to pycytominer’s packaged registry. Supply a custom path to use your own feature lists (see format above).

Examples

Use the packaged default blocklist (recommended starting point):

>>> bl = Blocklist(blocklist_name="default")
>>> isinstance(bl.to_list(), list)
True

Block an explicit set of project-specific features:

>>> bl = Blocklist(features_to_block=["Cells_MyFeature", "Nuclei_MyFeature"])
>>> bl.to_list()
['Cells_MyFeature', 'Nuclei_MyFeature']

Extend the packaged default with project-specific exclusions:

>>> bl = Blocklist(
...     blocklist_name="default",
...     features_to_block=["Cells_MyFeature"],
... )

Use a custom YAML registry instead of the packaged one (feature names can follow any convention — CellProfiler prefixes, embedding dimensions, etc.):

>>> import pathlib
>>> # my_blocklists.yaml contains:
>>> # qc_fails:
>>> #   - embedding_dim_42
>>> #   - Cells_Texture_BadChannel
>>> bl = Blocklist(
...     blocklist_name="qc_fails",
...     blocklists_file=pathlib.Path("my_blocklists.yaml"),
... )

Pass the result directly to feature_select():

>>> import pandas as pd
>>> from pycytominer import feature_select
>>> bl = Blocklist(
...     blocklist_name="default",
...     features_to_block=["Cells_MyFeature"],
... )
>>> # df = feature_select(profiles, operation="blocklist", blocklist=bl)

See also

get_blocklist_features

Resolve a Blocklist (or shorthand forms) to a plain list of feature names present in a given profile DataFrame.

feature_select

Apply blocklist (and other) feature-selection operations to a profile DataFrame.

References

add(features: list[str]) None

Add one or more feature names to the blocklist.

to_list() list[str]

Return blocklist features as a list.

pycytominer.cyto_utils.blocklist.get_blocklist_features(blocklist: str | list[str] | Blocklist | None = None, blocklist_name: str | list[str] | None = None, population_df: DataFrame | None = None, blocklist_file: str | Path | None = None) list[str]

Resolve blocklist inputs to a list of feature names present in a DataFrame.

Accepts the same shorthand forms supported by feature_select() and returns a plain list of feature names, optionally filtered to only those that exist in population_df. When both blocklist and blocklist_name are None, the packaged default blocklist is used. For full details on blocklist construction and customization, see Blocklist.

Parameters:
  • blocklist (str, list of str, or Blocklist, optional) – Feature name(s) to exclude. A Blocklist object may be passed directly for full customization (custom YAML, combined named + explicit features, etc.). If None, blocklist_name or the packaged default is used instead.

  • blocklist_name (str or list of str, optional) – Name(s) of packaged blocklists to load when blocklist is None. If both are None, falls back to DEFAULT_BLOCKLIST_NAME ("default").

  • population_df (pd.DataFrame, optional) – When provided, the returned list is filtered to only feature names that appear as columns in this DataFrame.

  • blocklist_file (str or path-like, optional) –

    Deprecated since version 2.0: Pass feature names via blocklist (a list or Blocklist object) instead. blocklist_file accepted a CSV file with a single blocklist column; that format is no longer the primary interface. This parameter will be removed in a future release.

Returns:

blocklist_features – Feature names to exclude from downstream analysis.

Return type:

list of str

See also

Blocklist

Full reference for blocklist construction and customization.