Using Pycytominer from the command line interface (CLI)¶

Pycytominer ships with a full-featured command-line interface (CLI) so that every pipeline step can be run directly from a terminal, no Python code required. This makes it easy to integrate pycytominer into shell scripts, Snakemake workflows, Nextflow pipelines, or any other automation tool that orchestrates file-based steps.

This tutorial covers all five CLI commands:

Command	Equivalent Python function
`pycytominer aggregate`	`aggregate()`
`pycytominer annotate`	`annotate()`
`pycytominer normalize`	`normalize()`
`pycytominer feature_select`	`feature_select()`
`pycytominer consensus`	`consensus()`

New to pycytominer? Read the Introduction to Pycytominer tutorial first to understand the pipeline concepts before running them from the command line.

flowchart LR sc["single_cells.parquet"] wp["well_profiles.parquet"] an["annotated.parquet"] no["normalized.parquet"] fs["selected.parquet"] co["consensus.parquet"] sc -->|"aggregate"| wp wp -->|"annotate"| an an -->|"normalize"| no no -->|"feature_select"| fs fs -->|"consensus"| co style sc fill:#f0d9fa,stroke:#88239A,color:#111 style co fill:#f0d9fa,stroke:#88239A,color:#111 style wp fill:#ffffff,stroke:#88239A,color:#111 style an fill:#ffffff,stroke:#88239A,color:#111 style no fill:#ffffff,stroke:#88239A,color:#111 style fs fill:#ffffff,stroke:#88239A,color:#111

Prerequisites¶

# Recommended — with uv (faster)
uv pip install pycytominer

# Or with standard pip
pip install pycytominer

After installation the pycytominer command is available in your shell. Verify it is on your PATH and see all available sub-commands:

pycytominer

Tip — try before you install with ``uvx``: If you use uv, you can run any pycytominer CLI command immediately without a permanent install:
uvx pycytominer aggregate --help
uvx creates an isolated environment, installs pycytominer into it, runs the command, and discards the environment, all in one step. It is the fastest way to explore the CLI or script a one-off pipeline step on a new machine.

[1]:

# List all available sub-commands
!pycytominer

NAME
    pycytominer - Command Line Interface for Pycytominer operations.

SYNOPSIS
    pycytominer COMMAND

DESCRIPTION
    Command Line Interface for Pycytominer operations.

COMMANDS
    COMMAND is one of the following:

     aggregate
       Aggregate profiles from a file and write the results to disk.

     annotate
       Annotate profiles using a platemap file and write output.

     consensus
       Create consensus profiles from a file and write output.

     feature_select
       Select features from profiles and write the results to disk.

     normalize
       Normalize profiles from a file and write the results to disk.

[2]:

# Show all options for the aggregate sub-command
!pycytominer aggregate --help

INFO: Showing help with the command 'pycytominer aggregate -- --help'.

NAME
    pycytominer aggregate - Aggregate profiles from a file and write the results to disk.

SYNOPSIS
    pycytominer aggregate PROFILES OUTPUT_FILE <flags>

DESCRIPTION
    Aggregate profiles from a file and write the results to disk.

POSITIONAL ARGUMENTS
    PROFILES
        Type: 'str'
        Path to the input profiles file.
    OUTPUT_FILE
        Type: 'str'
        Path to the output file to write.

FLAGS
    --strata=STRATA
        Type: 'str | Sequence[str]'
        Default: 'Metadata_Plate,Metad...
        Metadata columns to aggregate by.
    --features=FEATURES
        Type: 'str | Sequence[str]'
        Default: 'infer'
        Feature list or "infer" to infer CellProfiler features.
    -i, --image_features=IMAGE_FEATURES
        Type: 'bool'
        Default: False
        Whether inferred features should include numeric image features.
    --operation=OPERATION
        Type: 'str'
        Default: 'median'
        Aggregation operation ("median" or "mean").
    --output_type=OUTPUT_TYPE
        Type: "Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr'] | None"
        Default: 'csv'
        Output type to write.
    --compute_object_count=COMPUTE_OBJECT_COUNT
        Type: 'bool'
        Default: False
        Whether to compute object counts.
    --object_feature=OBJECT_FEATURE
        Type: 'str'
        Default: 'Metadata_ObjectNumber'
        Column used for object counting.
    --subset_data_file=SUBSET_DATA_FILE
        Type: Optional['str | None']
        Default: None
        Optional path to a subset dataframe for filtering.
    --compression_options=COMPRESSION_OPTIONS
        Type: Optional['str | di...
        Default: None
        Compression options for writing output.
    --float_format=FLOAT_FORMAT
        Type: Optional['str | None']
        Default: None
        Decimal precision for output formatting.

NOTES
    You can also use flags syntax for POSITIONAL ARGUMENTS

Sample Data¶

The CLI reads and writes files, CSV and Parquet are both supported as input. Below we generate the same synthetic Cell Painting dataset used in the Introduction to Pycytominer tutorial and save it to a temporary working directory as Parquet files.

In a real experiment you would replace single_cells.parquet with the output from CellProfiler or CytoTable.

The simulation code is in the expandable block below, skip ahead if you just want to follow the CLI steps.

import tempfile
from pathlib import Path

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)

# ── Temporary working directory ────────────────────────────────────────────
workdir = Path(tempfile.mkdtemp()).resolve()

# ── Synthetic single-cell data ─────────────────────────────────────────────
WELLS = {
    "B02": "DMSO",       "C02": "DMSO",
    "B03": "Compound_A", "C03": "Compound_A",
    "B04": "Compound_B", "C04": "Compound_B",
}
N = 100

rows = []
for img_num, (well, treatment) in enumerate(WELLS.items(), start=1):
    is_a = float(treatment == "Compound_A")
    is_b = float(treatment == "Compound_B")
    cell_areas = rng.normal(500 + 180 * is_a - 90 * is_b, 120, N)
    for obj_num in range(1, N + 1):
        rows.append({
            "Metadata_Plate": "Plate_1",
            "Metadata_Well":  well,
            "Metadata_ImageNumber": img_num,
            "Metadata_ObjectNumber": obj_num,
            "Cells_AreaShape_Area":          cell_areas[obj_num - 1],
            "Cells_AreaShape_BoundingBoxArea": cell_areas[obj_num - 1] * 1.3 + rng.normal(0, 4),
            "Cells_AreaShape_EulerNumber":    1,
            "Cells_AreaShape_Eccentricity":   float(np.clip(rng.normal(0.55, 0.12), 0, 1)),
            "Cells_Intensity_MeanIntensity_Mito":      rng.normal(0.30, 0.06),
            "Cells_Texture_Correlation_RNA_3_0_256":   rng.normal(0.22, 0.06),
            "Cytoplasm_AreaShape_Area":                rng.normal(310, 80),
            "Cytoplasm_Intensity_MeanIntensity_AGP":   rng.normal(0.25, 0.07),
            "Nuclei_AreaShape_Area":                   rng.normal(195, 55),
            "Nuclei_AreaShape_Eccentricity":  float(np.clip(rng.normal(0.40, 0.10), 0, 1)),
            "Nuclei_Intensity_MeanIntensity_DNA":      rng.normal(0.50, 0.08),
        })

sc_path = workdir / "single_cells.parquet"
pd.DataFrame(rows).to_parquet(sc_path, index=False)
print(f"Saved {len(rows):,} single cells to {sc_path.name}")

Step 1: Aggregate¶

pycytominer aggregate collapses single-cell rows into one representative profile per well by taking the median (or mean) of each feature across all cells in that well.

Key arguments:

--profiles, input file (CSV or Parquet)
--output_file, where to write the result
--strata, comma-delimited metadata columns that define each group (default: Metadata_Plate,Metadata_Well)
--operation, aggregation function: median (default) or mean
--output_type, csv (default) or parquet

[4]:

!pycytominer aggregate --profiles {workdir}/single_cells.parquet --output_file {workdir}/well_profiles.parquet --strata "Metadata_Plate,Metadata_Well" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"

Wrote output file: well_profiles.parquet
well_profiles.parquet

[5]:

wp = pd.read_parquet(workdir / "well_profiles.parquet")
print(f"Well profiles: {wp.shape}  (one row per well)")
wp.head(3)

Well profiles: (6, 13)  (one row per well)

[5]:

	Metadata_Plate	Metadata_Well	Cells_AreaShape_Area	Cells_AreaShape_BoundingBoxArea	Cells_AreaShape_EulerNumber	Cells_AreaShape_Eccentricity	Cells_Intensity_MeanIntensity_Mito	Cells_Texture_Correlation_RNA_3_0_256	Cytoplasm_AreaShape_Area	Cytoplasm_Intensity_MeanIntensity_AGP	Nuclei_AreaShape_Area	Nuclei_AreaShape_Eccentricity	Nuclei_Intensity_MeanIntensity_DNA
0	Plate_1	B02	499.741578	646.410141	1.0	0.551590	0.305235	0.221010	309.361769	0.252230	191.121017	0.407695	0.492709
1	Plate_1	B03	689.065353	895.200860	1.0	0.550686	0.304796	0.223964	319.691855	0.250131	190.228310	0.394803	0.508586
2	Plate_1	B04	406.933246	529.871038	1.0	0.535506	0.287034	0.229690	330.455137	0.254138	189.392536	0.394729	0.509548

Step 2: Annotate¶

pycytominer annotate joins a plate map file onto the well profiles, adding columns such as treatment, cell line, and concentration. The plate map is a CSV (or any tabular format) where each row describes one well.

Key arguments:

--platemap, path to the plate map file
--join_on, two comma-delimited column names: platemap_col,profiles_col (default: Metadata_well_position,Metadata_Well)
--add_metadata_id_to_platemap, prefix new columns with Metadata_ (default: True)

[6]:

# Create the plate map CSV
platemap = pd.DataFrame({
    "well_position": ["B02", "C02", "B03", "C03", "B04", "C04"],
    "treatment": [
        "DMSO",
        "DMSO",
        "Compound_A",
        "Compound_A",
        "Compound_B",
        "Compound_B",
    ],
    "cell_line": ["HeLa"] * 6,
    "concentration_um": [0.0, 0.0, 10.0, 10.0, 5.0, 5.0],
})
platemap.to_csv(workdir / "platemap.csv", index=False)
platemap

[6]:

	well_position	treatment	cell_line	concentration_um
0	B02	DMSO	HeLa	0.0
1	C02	DMSO	HeLa	0.0
2	B03	Compound_A	HeLa	10.0
3	C03	Compound_A	HeLa	10.0
4	B04	Compound_B	HeLa	5.0
5	C04	Compound_B	HeLa	5.0

[7]:

!pycytominer annotate --profiles {workdir}/well_profiles.parquet --platemap {workdir}/platemap.csv --output_file {workdir}/annotated.parquet --join_on "Metadata_well_position,Metadata_Well" --output_type parquet 2>&1 | sed "s|{workdir}/||g"

Wrote output file: annotated.parquet
annotated.parquet

[8]:

ann = pd.read_parquet(workdir / "annotated.parquet")
print(f"Annotated profiles: {ann.shape}")
ann[[c for c in ann.columns if c.startswith("Metadata_")]].head(3)

Annotated profiles: (6, 16)

[8]:

	Metadata_treatment	Metadata_cell_line	Metadata_concentration_um	Metadata_Plate	Metadata_Well
0	DMSO	HeLa	0.0	Plate_1	B02
1	DMSO	HeLa	0.0	Plate_1	C02
2	Compound_A	HeLa	10.0	Plate_1	B03

Step 3: Normalize¶

pycytominer normalize scales features to a common range and limits plate-to-plate technical variation. Z-scoring against DMSO control wells (--samples) is the most common approach.

Key arguments:

--samples, a pandas query string selecting the normalization reference. Use all to normalize against the entire plate.
--method, normalization method: standardize (z-score, default), robustize (MAD-based), or spherize

[9]:

!pycytominer normalize --profiles {workdir}/annotated.parquet --output_file {workdir}/normalized.parquet --samples "Metadata_treatment == 'DMSO'" --method standardize --output_type parquet 2>&1 | sed "s|{workdir}/||g"

Wrote output file: normalized.parquet
normalized.parquet

[10]:

norm = pd.read_parquet(workdir / "normalized.parquet")
print(f"Normalized profiles: {norm.shape}")
norm.head(3)

Normalized profiles: (6, 16)

[10]:

	Metadata_treatment	Metadata_cell_line	Metadata_concentration_um	Metadata_Plate	Metadata_Well	Cells_AreaShape_Area	Cells_AreaShape_BoundingBoxArea	Cells_AreaShape_Eccentricity	Cells_Intensity_MeanIntensity_Mito	Cells_Texture_Correlation_RNA_3_0_256	Cytoplasm_AreaShape_Area	Cytoplasm_Intensity_MeanIntensity_AGP	Nuclei_AreaShape_Area	Nuclei_AreaShape_Eccentricity	Nuclei_Intensity_MeanIntensity_DNA
0	DMSO	HeLa	0.0	Plate_1	B02	-1.000000	-1.00000	1.000000	1.000000	1.000000	1.00000	1.000000	1.000000	1.000000	1.000000
1	DMSO	HeLa	0.0	Plate_1	C02	1.000000	1.00000	-1.000000	-1.000000	-1.000000	-1.00000	-1.000000	-1.000000	-1.000000	-1.000000
2	Compound_A	HeLa	10.0	Plate_1	B03	52.302035	42.69753	0.020332	0.694833	4.413158	2.71186	0.309624	0.829585	-1.378075	8.707708

Step 4: Feature Select¶

pycytominer feature_select removes uninformative features. Multiple operations can be applied in one call by passing a comma-delimited list.

Key arguments:

--operation, comma-delimited list of operations to apply:
- variance_threshold, drop near-constant features
- correlation_threshold, drop one of each highly correlated pair
- blocklist, drop features known to be unreliable across assays
- drop_na_columns, drop columns with too many missing values
- noise_removal, remove features with low signal-to-noise ratio

[11]:

!pycytominer feature_select --profiles {workdir}/normalized.parquet --output_file {workdir}/selected.parquet --operation "variance_threshold,correlation_threshold,blocklist" --output_type parquet 2>&1 | sed "s|{workdir}/||g"

Wrote output file: selected.parquet
selected.parquet

[12]:

sel = pd.read_parquet(workdir / "selected.parquet")
feat_before = [c for c in norm.columns if not c.startswith("Metadata_")]
feat_after = [c for c in sel.columns if not c.startswith("Metadata_")]
print(f"Features: {len(feat_before)} -> {len(feat_after)}")
print(f"Removed:  {set(feat_before) - set(feat_after)}")
sel.head(3)

Features: 11 -> 8
Removed:  {'Cells_AreaShape_Area', 'Cells_AreaShape_EulerNumber', 'Cells_Texture_Correlation_RNA_3_0_256'}

[12]:

	Metadata_treatment	Metadata_cell_line	Metadata_concentration_um	Metadata_Plate	Metadata_Well	Cells_AreaShape_BoundingBoxArea	Cells_AreaShape_Eccentricity	Cells_Intensity_MeanIntensity_Mito	Cytoplasm_AreaShape_Area	Cytoplasm_Intensity_MeanIntensity_AGP	Nuclei_AreaShape_Area	Nuclei_AreaShape_Eccentricity	Nuclei_Intensity_MeanIntensity_DNA
0	DMSO	HeLa	0.0	Plate_1	B02	-1.00000	1.000000	1.000000	1.00000	1.000000	1.000000	1.000000	1.000000
1	DMSO	HeLa	0.0	Plate_1	C02	1.00000	-1.000000	-1.000000	-1.00000	-1.000000	-1.000000	-1.000000	-1.000000
2	Compound_A	HeLa	10.0	Plate_1	B03	42.69753	0.020332	0.694833	2.71186	0.309624	0.829585	-1.378075	8.707708

Step 5: Consensus¶

pycytominer consensus collapses replicate wells into one profile per biological condition by taking the median (or modz) across replicates.

Key arguments:

--replicate_columns, comma-delimited metadata columns that identify a unique condition (replicates share all of these values)
--operation, median (default), mean, or modz (moderated z-score, recommended for large screens)

[13]:

!pycytominer consensus --profiles {workdir}/selected.parquet --output_file {workdir}/consensus.parquet --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"

Wrote output file: consensus.parquet
consensus.parquet

[14]:

cons = pd.read_parquet(workdir / "consensus.parquet")
print(f"Consensus profiles: {cons.shape}  (one row per condition)")
cons[[c for c in cons.columns if c.startswith("Metadata_")]]

Consensus profiles: (3, 11)  (one row per condition)

[14]:

	Metadata_treatment	Metadata_cell_line	Metadata_concentration_um
0	Compound_A	HeLa	10.0
1	Compound_B	HeLa	5.0
2	DMSO	HeLa	0.0

Summary¶

You ran the full pycytominer pipeline using only command-line calls:

pycytominer aggregate    --profiles single_cells.csv  --output_file well_profiles.parquet  --strata "Metadata_Plate,Metadata_Well"
pycytominer annotate     --profiles well_profiles.parquet --output_file annotated.parquet      --platemap platemap.csv
pycytominer normalize    --profiles annotated.parquet     --output_file normalized.parquet     --samples "Metadata_treatment == 'DMSO'"
pycytominer feature_select --profiles normalized.parquet  --output_file selected.parquet       --operation "variance_threshold,correlation_threshold,blocklist"
pycytominer consensus    --profiles selected.parquet      --output_file consensus.parquet      --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um"

Tips for scripting¶

List all commands with pycytominer; get full option docs with pycytominer COMMAND --help
Chain into Bash scripts or Makefile targets for reproducible pipelines
Query strings in --samples follow pandas query syntax , any valid pandas query expression works