Using Pycytominer from the command line interface (CLI)

Pycytominer ships with a full-featured command-line interface (CLI) so that every pipeline step can be run directly from a terminal, no Python code required. This makes it easy to integrate pycytominer into shell scripts, Snakemake workflows, Nextflow pipelines, or any other automation tool that orchestrates file-based steps.

This tutorial covers all five CLI commands:

Command

Equivalent Python function

pycytominer aggregate

aggregate()

pycytominer annotate

annotate()

pycytominer normalize

normalize()

pycytominer feature_select

feature_select()

pycytominer consensus

consensus()

New to pycytominer? Read the Introduction to Pycytominer tutorial first to understand the pipeline concepts before running them from the command line.

flowchart LR sc["single_cells.parquet"] wp["well_profiles.parquet"] an["annotated.parquet"] no["normalized.parquet"] fs["selected.parquet"] co["consensus.parquet"] sc -->|"aggregate"| wp wp -->|"annotate"| an an -->|"normalize"| no no -->|"feature_select"| fs fs -->|"consensus"| co style sc fill:#f0d9fa,stroke:#88239A,color:#111 style co fill:#f0d9fa,stroke:#88239A,color:#111 style wp fill:#ffffff,stroke:#88239A,color:#111 style an fill:#ffffff,stroke:#88239A,color:#111 style no fill:#ffffff,stroke:#88239A,color:#111 style fs fill:#ffffff,stroke:#88239A,color:#111

Prerequisites

# Recommended — with uv (faster)
uv pip install pycytominer

# Or with standard pip
pip install pycytominer

After installation the pycytominer command is available in your shell. Verify it is on your PATH and see all available sub-commands:

pycytominer

Tip — try before you install with ``uvx``: If you use uv, you can run any pycytominer CLI command immediately without a permanent install:

uvx pycytominer aggregate --help

uvx creates an isolated environment, installs pycytominer into it, runs the command, and discards the environment, all in one step. It is the fastest way to explore the CLI or script a one-off pipeline step on a new machine.

[1]:
# List all available sub-commands
!pycytominer
NAME
    pycytominer - Command Line Interface for Pycytominer operations.

SYNOPSIS
    pycytominer COMMAND

DESCRIPTION
    Command Line Interface for Pycytominer operations.

COMMANDS
    COMMAND is one of the following:

     aggregate
       Aggregate profiles from a file and write the results to disk.

     annotate
       Annotate profiles using a platemap file and write output.

     consensus
       Create consensus profiles from a file and write output.

     feature_select
       Select features from profiles and write the results to disk.

     normalize
       Normalize profiles from a file and write the results to disk.
[2]:
# Show all options for the aggregate sub-command
!pycytominer aggregate --help
INFO: Showing help with the command 'pycytominer aggregate -- --help'.

NAME
    pycytominer aggregate - Aggregate profiles from a file and write the results to disk.

SYNOPSIS
    pycytominer aggregate PROFILES OUTPUT_FILE <flags>

DESCRIPTION
    Aggregate profiles from a file and write the results to disk.

POSITIONAL ARGUMENTS
    PROFILES
        Type: 'str'
        Path to the input profiles file.
    OUTPUT_FILE
        Type: 'str'
        Path to the output file to write.

FLAGS
    --strata=STRATA
        Type: 'str | Sequence[str]'
        Default: 'Metadata_Plate,Metad...
        Metadata columns to aggregate by.
    --features=FEATURES
        Type: 'str | Sequence[str]'
        Default: 'infer'
        Feature list or "infer" to infer CellProfiler features.
    -i, --image_features=IMAGE_FEATURES
        Type: 'bool'
        Default: False
        Whether inferred features should include numeric image features.
    --operation=OPERATION
        Type: 'str'
        Default: 'median'
        Aggregation operation ("median" or "mean").
    --output_type=OUTPUT_TYPE
        Type: "Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr'] | None"
        Default: 'csv'
        Output type to write.
    --compute_object_count=COMPUTE_OBJECT_COUNT
        Type: 'bool'
        Default: False
        Whether to compute object counts.
    --object_feature=OBJECT_FEATURE
        Type: 'str'
        Default: 'Metadata_ObjectNumber'
        Column used for object counting.
    --subset_data_file=SUBSET_DATA_FILE
        Type: Optional['str | None']
        Default: None
        Optional path to a subset dataframe for filtering.
    --compression_options=COMPRESSION_OPTIONS
        Type: Optional['str | di...
        Default: None
        Compression options for writing output.
    --float_format=FLOAT_FORMAT
        Type: Optional['str | None']
        Default: None
        Decimal precision for output formatting.

NOTES
    You can also use flags syntax for POSITIONAL ARGUMENTS

Sample Data

The CLI reads and writes files, CSV and Parquet are both supported as input. Below we generate the same synthetic Cell Painting dataset used in the Introduction to Pycytominer tutorial and save it to a temporary working directory as Parquet files.

In a real experiment you would replace single_cells.parquet with the output from CellProfiler or CytoTable.

The simulation code is in the expandable block below, skip ahead if you just want to follow the CLI steps.

import tempfile
from pathlib import Path

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)

# ── Temporary working directory ────────────────────────────────────────────
workdir = Path(tempfile.mkdtemp()).resolve()

# ── Synthetic single-cell data ─────────────────────────────────────────────
WELLS = {
    "B02": "DMSO",       "C02": "DMSO",
    "B03": "Compound_A", "C03": "Compound_A",
    "B04": "Compound_B", "C04": "Compound_B",
}
N = 100

rows = []
for img_num, (well, treatment) in enumerate(WELLS.items(), start=1):
    is_a = float(treatment == "Compound_A")
    is_b = float(treatment == "Compound_B")
    cell_areas = rng.normal(500 + 180 * is_a - 90 * is_b, 120, N)
    for obj_num in range(1, N + 1):
        rows.append({
            "Metadata_Plate": "Plate_1",
            "Metadata_Well":  well,
            "Metadata_ImageNumber": img_num,
            "Metadata_ObjectNumber": obj_num,
            "Cells_AreaShape_Area":          cell_areas[obj_num - 1],
            "Cells_AreaShape_BoundingBoxArea": cell_areas[obj_num - 1] * 1.3 + rng.normal(0, 4),
            "Cells_AreaShape_EulerNumber":    1,
            "Cells_AreaShape_Eccentricity":   float(np.clip(rng.normal(0.55, 0.12), 0, 1)),
            "Cells_Intensity_MeanIntensity_Mito":      rng.normal(0.30, 0.06),
            "Cells_Texture_Correlation_RNA_3_0_256":   rng.normal(0.22, 0.06),
            "Cytoplasm_AreaShape_Area":                rng.normal(310, 80),
            "Cytoplasm_Intensity_MeanIntensity_AGP":   rng.normal(0.25, 0.07),
            "Nuclei_AreaShape_Area":                   rng.normal(195, 55),
            "Nuclei_AreaShape_Eccentricity":  float(np.clip(rng.normal(0.40, 0.10), 0, 1)),
            "Nuclei_Intensity_MeanIntensity_DNA":      rng.normal(0.50, 0.08),
        })

sc_path = workdir / "single_cells.parquet"
pd.DataFrame(rows).to_parquet(sc_path, index=False)
print(f"Saved {len(rows):,} single cells to {sc_path.name}")

Step 1: Aggregate

pycytominer aggregate collapses single-cell rows into one representative profile per well by taking the median (or mean) of each feature across all cells in that well.

Key arguments:

  • --profiles, input file (CSV or Parquet)

  • --output_file, where to write the result

  • --strata, comma-delimited metadata columns that define each group (default: Metadata_Plate,Metadata_Well)

  • --operation, aggregation function: median (default) or mean

  • --output_type, csv (default) or parquet

[4]:
!pycytominer aggregate --profiles {workdir}/single_cells.parquet --output_file {workdir}/well_profiles.parquet --strata "Metadata_Plate,Metadata_Well" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: well_profiles.parquet
well_profiles.parquet
[5]:
wp = pd.read_parquet(workdir / "well_profiles.parquet")
print(f"Well profiles: {wp.shape}  (one row per well)")
wp.head(3)
Well profiles: (6, 13)  (one row per well)
[5]:
Metadata_Plate Metadata_Well Cells_AreaShape_Area Cells_AreaShape_BoundingBoxArea Cells_AreaShape_EulerNumber Cells_AreaShape_Eccentricity Cells_Intensity_MeanIntensity_Mito Cells_Texture_Correlation_RNA_3_0_256 Cytoplasm_AreaShape_Area Cytoplasm_Intensity_MeanIntensity_AGP Nuclei_AreaShape_Area Nuclei_AreaShape_Eccentricity Nuclei_Intensity_MeanIntensity_DNA
0 Plate_1 B02 499.741578 646.410141 1.0 0.551590 0.305235 0.221010 309.361769 0.252230 191.121017 0.407695 0.492709
1 Plate_1 B03 689.065353 895.200860 1.0 0.550686 0.304796 0.223964 319.691855 0.250131 190.228310 0.394803 0.508586
2 Plate_1 B04 406.933246 529.871038 1.0 0.535506 0.287034 0.229690 330.455137 0.254138 189.392536 0.394729 0.509548

Step 2: Annotate

pycytominer annotate joins a plate map file onto the well profiles, adding columns such as treatment, cell line, and concentration. The plate map is a CSV (or any tabular format) where each row describes one well.

Key arguments:

  • --platemap, path to the plate map file

  • --join_on, two comma-delimited column names: platemap_col,profiles_col (default: Metadata_well_position,Metadata_Well)

  • --add_metadata_id_to_platemap, prefix new columns with Metadata_ (default: True)

[6]:
# Create the plate map CSV
platemap = pd.DataFrame({
    "well_position": ["B02", "C02", "B03", "C03", "B04", "C04"],
    "treatment": [
        "DMSO",
        "DMSO",
        "Compound_A",
        "Compound_A",
        "Compound_B",
        "Compound_B",
    ],
    "cell_line": ["HeLa"] * 6,
    "concentration_um": [0.0, 0.0, 10.0, 10.0, 5.0, 5.0],
})
platemap.to_csv(workdir / "platemap.csv", index=False)
platemap
[6]:
well_position treatment cell_line concentration_um
0 B02 DMSO HeLa 0.0
1 C02 DMSO HeLa 0.0
2 B03 Compound_A HeLa 10.0
3 C03 Compound_A HeLa 10.0
4 B04 Compound_B HeLa 5.0
5 C04 Compound_B HeLa 5.0
[7]:
!pycytominer annotate --profiles {workdir}/well_profiles.parquet --platemap {workdir}/platemap.csv --output_file {workdir}/annotated.parquet --join_on "Metadata_well_position,Metadata_Well" --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: annotated.parquet
annotated.parquet
[8]:
ann = pd.read_parquet(workdir / "annotated.parquet")
print(f"Annotated profiles: {ann.shape}")
ann[[c for c in ann.columns if c.startswith("Metadata_")]].head(3)
Annotated profiles: (6, 16)
[8]:
Metadata_treatment Metadata_cell_line Metadata_concentration_um Metadata_Plate Metadata_Well
0 DMSO HeLa 0.0 Plate_1 B02
1 DMSO HeLa 0.0 Plate_1 C02
2 Compound_A HeLa 10.0 Plate_1 B03

Step 3: Normalize

pycytominer normalize scales features to a common range and limits plate-to-plate technical variation. Z-scoring against DMSO control wells (--samples) is the most common approach.

Key arguments:

  • --samples, a pandas query string selecting the normalization reference. Use all to normalize against the entire plate.

  • --method, normalization method: standardize (z-score, default), robustize (MAD-based), or spherize

[9]:
!pycytominer normalize --profiles {workdir}/annotated.parquet --output_file {workdir}/normalized.parquet --samples "Metadata_treatment == 'DMSO'" --method standardize --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: normalized.parquet
normalized.parquet
[10]:
norm = pd.read_parquet(workdir / "normalized.parquet")
print(f"Normalized profiles: {norm.shape}")
norm.head(3)
Normalized profiles: (6, 16)
[10]:
Metadata_treatment Metadata_cell_line Metadata_concentration_um Metadata_Plate Metadata_Well Cells_AreaShape_Area Cells_AreaShape_BoundingBoxArea Cells_AreaShape_EulerNumber Cells_AreaShape_Eccentricity Cells_Intensity_MeanIntensity_Mito Cells_Texture_Correlation_RNA_3_0_256 Cytoplasm_AreaShape_Area Cytoplasm_Intensity_MeanIntensity_AGP Nuclei_AreaShape_Area Nuclei_AreaShape_Eccentricity Nuclei_Intensity_MeanIntensity_DNA
0 DMSO HeLa 0.0 Plate_1 B02 -1.000000 -1.00000 0.0 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000
1 DMSO HeLa 0.0 Plate_1 C02 1.000000 1.00000 0.0 -1.000000 -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000
2 Compound_A HeLa 10.0 Plate_1 B03 52.302035 42.69753 0.0 0.020332 0.694833 4.413158 2.71186 0.309624 0.829585 -1.378075 8.707708

Step 4: Feature Select

pycytominer feature_select removes uninformative features. Multiple operations can be applied in one call by passing a comma-delimited list.

Key arguments:

  • --operation, comma-delimited list of operations to apply:

    • variance_threshold, drop near-constant features

    • correlation_threshold, drop one of each highly correlated pair

    • blocklist, drop features known to be unreliable across assays

    • drop_na_columns, drop columns with too many missing values

    • noise_removal, remove features with low signal-to-noise ratio

[11]:
!pycytominer feature_select --profiles {workdir}/normalized.parquet --output_file {workdir}/selected.parquet --operation "variance_threshold,correlation_threshold,blocklist" --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: selected.parquet
selected.parquet
[12]:
sel = pd.read_parquet(workdir / "selected.parquet")
feat_before = [c for c in norm.columns if not c.startswith("Metadata_")]
feat_after = [c for c in sel.columns if not c.startswith("Metadata_")]
print(f"Features: {len(feat_before)} -> {len(feat_after)}")
print(f"Removed:  {set(feat_before) - set(feat_after)}")
sel.head(3)
Features: 11 -> 8
Removed:  {'Cells_AreaShape_Area', 'Cells_AreaShape_EulerNumber', 'Cells_Texture_Correlation_RNA_3_0_256'}
[12]:
Metadata_treatment Metadata_cell_line Metadata_concentration_um Metadata_Plate Metadata_Well Cells_AreaShape_BoundingBoxArea Cells_AreaShape_Eccentricity Cells_Intensity_MeanIntensity_Mito Cytoplasm_AreaShape_Area Cytoplasm_Intensity_MeanIntensity_AGP Nuclei_AreaShape_Area Nuclei_AreaShape_Eccentricity Nuclei_Intensity_MeanIntensity_DNA
0 DMSO HeLa 0.0 Plate_1 B02 -1.00000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000
1 DMSO HeLa 0.0 Plate_1 C02 1.00000 -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000
2 Compound_A HeLa 10.0 Plate_1 B03 42.69753 0.020332 0.694833 2.71186 0.309624 0.829585 -1.378075 8.707708

Step 5: Consensus

pycytominer consensus collapses replicate wells into one profile per biological condition by taking the median (or modz) across replicates.

Key arguments:

  • --replicate_columns, comma-delimited metadata columns that identify a unique condition (replicates share all of these values)

  • --operation, median (default), mean, or modz (moderated z-score, recommended for large screens)

[13]:
!pycytominer consensus --profiles {workdir}/selected.parquet --output_file {workdir}/consensus.parquet --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: consensus.parquet
consensus.parquet
[14]:
cons = pd.read_parquet(workdir / "consensus.parquet")
print(f"Consensus profiles: {cons.shape}  (one row per condition)")
cons[[c for c in cons.columns if c.startswith("Metadata_")]]
Consensus profiles: (3, 11)  (one row per condition)
[14]:
Metadata_treatment Metadata_cell_line Metadata_concentration_um
0 Compound_A HeLa 10.0
1 Compound_B HeLa 5.0
2 DMSO HeLa 0.0

Summary

You ran the full pycytominer pipeline using only command-line calls:

pycytominer aggregate    --profiles single_cells.csv  --output_file well_profiles.parquet  --strata "Metadata_Plate,Metadata_Well"
pycytominer annotate     --profiles well_profiles.parquet --output_file annotated.parquet      --platemap platemap.csv
pycytominer normalize    --profiles annotated.parquet     --output_file normalized.parquet     --samples "Metadata_treatment == 'DMSO'"
pycytominer feature_select --profiles normalized.parquet  --output_file selected.parquet       --operation "variance_threshold,correlation_threshold,blocklist"
pycytominer consensus    --profiles selected.parquet      --output_file consensus.parquet      --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um"

Tips for scripting

  • List all commands with pycytominer; get full option docs with pycytominer COMMAND --help

  • Chain into Bash scripts or Makefile targets for reproducible pipelines

  • Query strings in --samples follow pandas query syntax , any valid pandas query expression works