Using Pycytominer from the command line interface (CLI)¶
Pycytominer ships with a full-featured command-line interface (CLI) so that every pipeline step can be run directly from a terminal, no Python code required. This makes it easy to integrate pycytominer into shell scripts, Snakemake workflows, Nextflow pipelines, or any other automation tool that orchestrates file-based steps.
This tutorial covers all five CLI commands:
Command |
Equivalent Python function |
|---|---|
|
|
|
|
|
|
|
|
|
|
New to pycytominer? Read the Introduction to Pycytominer tutorial first to understand the pipeline concepts before running them from the command line.
Prerequisites¶
# Recommended — with uv (faster)
uv pip install pycytominer
# Or with standard pip
pip install pycytominer
After installation the pycytominer command is available in your shell. Verify it is on your PATH and see all available sub-commands:
pycytominer
Tip — try before you install with ``uvx``: If you use uv, you can run any pycytominer CLI command immediately without a permanent install:
uvx pycytominer aggregate --help
uvxcreates an isolated environment, installs pycytominer into it, runs the command, and discards the environment, all in one step. It is the fastest way to explore the CLI or script a one-off pipeline step on a new machine.
[1]:
# List all available sub-commands
!pycytominer
NAME
pycytominer - Command Line Interface for Pycytominer operations.
SYNOPSIS
pycytominer COMMAND
DESCRIPTION
Command Line Interface for Pycytominer operations.
COMMANDS
COMMAND is one of the following:
aggregate
Aggregate profiles from a file and write the results to disk.
annotate
Annotate profiles using a platemap file and write output.
consensus
Create consensus profiles from a file and write output.
feature_select
Select features from profiles and write the results to disk.
normalize
Normalize profiles from a file and write the results to disk.
[2]:
# Show all options for the aggregate sub-command
!pycytominer aggregate --help
INFO: Showing help with the command 'pycytominer aggregate -- --help'.
NAME
pycytominer aggregate - Aggregate profiles from a file and write the results to disk.
SYNOPSIS
pycytominer aggregate PROFILES OUTPUT_FILE <flags>
DESCRIPTION
Aggregate profiles from a file and write the results to disk.
POSITIONAL ARGUMENTS
PROFILES
Type: 'str'
Path to the input profiles file.
OUTPUT_FILE
Type: 'str'
Path to the output file to write.
FLAGS
--strata=STRATA
Type: 'str | Sequence[str]'
Default: 'Metadata_Plate,Metad...
Metadata columns to aggregate by.
--features=FEATURES
Type: 'str | Sequence[str]'
Default: 'infer'
Feature list or "infer" to infer CellProfiler features.
-i, --image_features=IMAGE_FEATURES
Type: 'bool'
Default: False
Whether inferred features should include numeric image features.
--operation=OPERATION
Type: 'str'
Default: 'median'
Aggregation operation ("median" or "mean").
--output_type=OUTPUT_TYPE
Type: "Literal['csv', 'parquet', 'anndata_h5ad', 'anndata_zarr'] | None"
Default: 'csv'
Output type to write.
--compute_object_count=COMPUTE_OBJECT_COUNT
Type: 'bool'
Default: False
Whether to compute object counts.
--object_feature=OBJECT_FEATURE
Type: 'str'
Default: 'Metadata_ObjectNumber'
Column used for object counting.
--subset_data_file=SUBSET_DATA_FILE
Type: Optional['str | None']
Default: None
Optional path to a subset dataframe for filtering.
--compression_options=COMPRESSION_OPTIONS
Type: Optional['str | di...
Default: None
Compression options for writing output.
--float_format=FLOAT_FORMAT
Type: Optional['str | None']
Default: None
Decimal precision for output formatting.
NOTES
You can also use flags syntax for POSITIONAL ARGUMENTS
Sample Data¶
The CLI reads and writes files, CSV and Parquet are both supported as input. Below we generate the same synthetic Cell Painting dataset used in the Introduction to Pycytominer tutorial and save it to a temporary working directory as Parquet files.
In a real experiment you would replace single_cells.parquet with the output from CellProfiler or CytoTable.
The simulation code is in the expandable block below, skip ahead if you just want to follow the CLI steps.
import tempfile
from pathlib import Path
import numpy as np
import pandas as pd
rng = np.random.default_rng(42)
# ── Temporary working directory ────────────────────────────────────────────
workdir = Path(tempfile.mkdtemp()).resolve()
# ── Synthetic single-cell data ─────────────────────────────────────────────
WELLS = {
"B02": "DMSO", "C02": "DMSO",
"B03": "Compound_A", "C03": "Compound_A",
"B04": "Compound_B", "C04": "Compound_B",
}
N = 100
rows = []
for img_num, (well, treatment) in enumerate(WELLS.items(), start=1):
is_a = float(treatment == "Compound_A")
is_b = float(treatment == "Compound_B")
cell_areas = rng.normal(500 + 180 * is_a - 90 * is_b, 120, N)
for obj_num in range(1, N + 1):
rows.append({
"Metadata_Plate": "Plate_1",
"Metadata_Well": well,
"Metadata_ImageNumber": img_num,
"Metadata_ObjectNumber": obj_num,
"Cells_AreaShape_Area": cell_areas[obj_num - 1],
"Cells_AreaShape_BoundingBoxArea": cell_areas[obj_num - 1] * 1.3 + rng.normal(0, 4),
"Cells_AreaShape_EulerNumber": 1,
"Cells_AreaShape_Eccentricity": float(np.clip(rng.normal(0.55, 0.12), 0, 1)),
"Cells_Intensity_MeanIntensity_Mito": rng.normal(0.30, 0.06),
"Cells_Texture_Correlation_RNA_3_0_256": rng.normal(0.22, 0.06),
"Cytoplasm_AreaShape_Area": rng.normal(310, 80),
"Cytoplasm_Intensity_MeanIntensity_AGP": rng.normal(0.25, 0.07),
"Nuclei_AreaShape_Area": rng.normal(195, 55),
"Nuclei_AreaShape_Eccentricity": float(np.clip(rng.normal(0.40, 0.10), 0, 1)),
"Nuclei_Intensity_MeanIntensity_DNA": rng.normal(0.50, 0.08),
})
sc_path = workdir / "single_cells.parquet"
pd.DataFrame(rows).to_parquet(sc_path, index=False)
print(f"Saved {len(rows):,} single cells to {sc_path.name}")
Step 1: Aggregate¶
pycytominer aggregate collapses single-cell rows into one representative profile per well by taking the median (or mean) of each feature across all cells in that well.
Key arguments:
--profiles, input file (CSV or Parquet)--output_file, where to write the result--strata, comma-delimited metadata columns that define each group (default:Metadata_Plate,Metadata_Well)--operation, aggregation function:median(default) ormean--output_type,csv(default) orparquet
[4]:
!pycytominer aggregate --profiles {workdir}/single_cells.parquet --output_file {workdir}/well_profiles.parquet --strata "Metadata_Plate,Metadata_Well" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: well_profiles.parquet
well_profiles.parquet
[5]:
wp = pd.read_parquet(workdir / "well_profiles.parquet")
print(f"Well profiles: {wp.shape} (one row per well)")
wp.head(3)
Well profiles: (6, 13) (one row per well)
[5]:
| Metadata_Plate | Metadata_Well | Cells_AreaShape_Area | Cells_AreaShape_BoundingBoxArea | Cells_AreaShape_EulerNumber | Cells_AreaShape_Eccentricity | Cells_Intensity_MeanIntensity_Mito | Cells_Texture_Correlation_RNA_3_0_256 | Cytoplasm_AreaShape_Area | Cytoplasm_Intensity_MeanIntensity_AGP | Nuclei_AreaShape_Area | Nuclei_AreaShape_Eccentricity | Nuclei_Intensity_MeanIntensity_DNA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Plate_1 | B02 | 499.741578 | 646.410141 | 1.0 | 0.551590 | 0.305235 | 0.221010 | 309.361769 | 0.252230 | 191.121017 | 0.407695 | 0.492709 |
| 1 | Plate_1 | B03 | 689.065353 | 895.200860 | 1.0 | 0.550686 | 0.304796 | 0.223964 | 319.691855 | 0.250131 | 190.228310 | 0.394803 | 0.508586 |
| 2 | Plate_1 | B04 | 406.933246 | 529.871038 | 1.0 | 0.535506 | 0.287034 | 0.229690 | 330.455137 | 0.254138 | 189.392536 | 0.394729 | 0.509548 |
Step 2: Annotate¶
pycytominer annotate joins a plate map file onto the well profiles, adding columns such as treatment, cell line, and concentration. The plate map is a CSV (or any tabular format) where each row describes one well.
Key arguments:
--platemap, path to the plate map file--join_on, two comma-delimited column names:platemap_col,profiles_col(default:Metadata_well_position,Metadata_Well)--add_metadata_id_to_platemap, prefix new columns withMetadata_(default:True)
[6]:
# Create the plate map CSV
platemap = pd.DataFrame({
"well_position": ["B02", "C02", "B03", "C03", "B04", "C04"],
"treatment": [
"DMSO",
"DMSO",
"Compound_A",
"Compound_A",
"Compound_B",
"Compound_B",
],
"cell_line": ["HeLa"] * 6,
"concentration_um": [0.0, 0.0, 10.0, 10.0, 5.0, 5.0],
})
platemap.to_csv(workdir / "platemap.csv", index=False)
platemap
[6]:
| well_position | treatment | cell_line | concentration_um | |
|---|---|---|---|---|
| 0 | B02 | DMSO | HeLa | 0.0 |
| 1 | C02 | DMSO | HeLa | 0.0 |
| 2 | B03 | Compound_A | HeLa | 10.0 |
| 3 | C03 | Compound_A | HeLa | 10.0 |
| 4 | B04 | Compound_B | HeLa | 5.0 |
| 5 | C04 | Compound_B | HeLa | 5.0 |
[7]:
!pycytominer annotate --profiles {workdir}/well_profiles.parquet --platemap {workdir}/platemap.csv --output_file {workdir}/annotated.parquet --join_on "Metadata_well_position,Metadata_Well" --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: annotated.parquet
annotated.parquet
[8]:
ann = pd.read_parquet(workdir / "annotated.parquet")
print(f"Annotated profiles: {ann.shape}")
ann[[c for c in ann.columns if c.startswith("Metadata_")]].head(3)
Annotated profiles: (6, 16)
[8]:
| Metadata_treatment | Metadata_cell_line | Metadata_concentration_um | Metadata_Plate | Metadata_Well | |
|---|---|---|---|---|---|
| 0 | DMSO | HeLa | 0.0 | Plate_1 | B02 |
| 1 | DMSO | HeLa | 0.0 | Plate_1 | C02 |
| 2 | Compound_A | HeLa | 10.0 | Plate_1 | B03 |
Step 3: Normalize¶
pycytominer normalize scales features to a common range and limits plate-to-plate technical variation. Z-scoring against DMSO control wells (--samples) is the most common approach.
Key arguments:
--samples, a pandas query string selecting the normalization reference. Useallto normalize against the entire plate.--method, normalization method:standardize(z-score, default),robustize(MAD-based), orspherize
[9]:
!pycytominer normalize --profiles {workdir}/annotated.parquet --output_file {workdir}/normalized.parquet --samples "Metadata_treatment == 'DMSO'" --method standardize --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: normalized.parquet
normalized.parquet
[10]:
norm = pd.read_parquet(workdir / "normalized.parquet")
print(f"Normalized profiles: {norm.shape}")
norm.head(3)
Normalized profiles: (6, 16)
[10]:
| Metadata_treatment | Metadata_cell_line | Metadata_concentration_um | Metadata_Plate | Metadata_Well | Cells_AreaShape_Area | Cells_AreaShape_BoundingBoxArea | Cells_AreaShape_EulerNumber | Cells_AreaShape_Eccentricity | Cells_Intensity_MeanIntensity_Mito | Cells_Texture_Correlation_RNA_3_0_256 | Cytoplasm_AreaShape_Area | Cytoplasm_Intensity_MeanIntensity_AGP | Nuclei_AreaShape_Area | Nuclei_AreaShape_Eccentricity | Nuclei_Intensity_MeanIntensity_DNA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | DMSO | HeLa | 0.0 | Plate_1 | B02 | -1.000000 | -1.00000 | 0.0 | 1.000000 | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | DMSO | HeLa | 0.0 | Plate_1 | C02 | 1.000000 | 1.00000 | 0.0 | -1.000000 | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 |
| 2 | Compound_A | HeLa | 10.0 | Plate_1 | B03 | 52.302035 | 42.69753 | 0.0 | 0.020332 | 0.694833 | 4.413158 | 2.71186 | 0.309624 | 0.829585 | -1.378075 | 8.707708 |
Step 4: Feature Select¶
pycytominer feature_select removes uninformative features. Multiple operations can be applied in one call by passing a comma-delimited list.
Key arguments:
--operation, comma-delimited list of operations to apply:variance_threshold, drop near-constant featurescorrelation_threshold, drop one of each highly correlated pairblocklist, drop features known to be unreliable across assaysdrop_na_columns, drop columns with too many missing valuesnoise_removal, remove features with low signal-to-noise ratio
[11]:
!pycytominer feature_select --profiles {workdir}/normalized.parquet --output_file {workdir}/selected.parquet --operation "variance_threshold,correlation_threshold,blocklist" --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: selected.parquet
selected.parquet
[12]:
sel = pd.read_parquet(workdir / "selected.parquet")
feat_before = [c for c in norm.columns if not c.startswith("Metadata_")]
feat_after = [c for c in sel.columns if not c.startswith("Metadata_")]
print(f"Features: {len(feat_before)} -> {len(feat_after)}")
print(f"Removed: {set(feat_before) - set(feat_after)}")
sel.head(3)
Features: 11 -> 8
Removed: {'Cells_AreaShape_Area', 'Cells_AreaShape_EulerNumber', 'Cells_Texture_Correlation_RNA_3_0_256'}
[12]:
| Metadata_treatment | Metadata_cell_line | Metadata_concentration_um | Metadata_Plate | Metadata_Well | Cells_AreaShape_BoundingBoxArea | Cells_AreaShape_Eccentricity | Cells_Intensity_MeanIntensity_Mito | Cytoplasm_AreaShape_Area | Cytoplasm_Intensity_MeanIntensity_AGP | Nuclei_AreaShape_Area | Nuclei_AreaShape_Eccentricity | Nuclei_Intensity_MeanIntensity_DNA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | DMSO | HeLa | 0.0 | Plate_1 | B02 | -1.00000 | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | DMSO | HeLa | 0.0 | Plate_1 | C02 | 1.00000 | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 |
| 2 | Compound_A | HeLa | 10.0 | Plate_1 | B03 | 42.69753 | 0.020332 | 0.694833 | 2.71186 | 0.309624 | 0.829585 | -1.378075 | 8.707708 |
Step 5: Consensus¶
pycytominer consensus collapses replicate wells into one profile per biological condition by taking the median (or modz) across replicates.
Key arguments:
--replicate_columns, comma-delimited metadata columns that identify a unique condition (replicates share all of these values)--operation,median(default),mean, ormodz(moderated z-score, recommended for large screens)
[13]:
!pycytominer consensus --profiles {workdir}/selected.parquet --output_file {workdir}/consensus.parquet --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um" --operation median --output_type parquet 2>&1 | sed "s|{workdir}/||g"
Wrote output file: consensus.parquet
consensus.parquet
[14]:
cons = pd.read_parquet(workdir / "consensus.parquet")
print(f"Consensus profiles: {cons.shape} (one row per condition)")
cons[[c for c in cons.columns if c.startswith("Metadata_")]]
Consensus profiles: (3, 11) (one row per condition)
[14]:
| Metadata_treatment | Metadata_cell_line | Metadata_concentration_um | |
|---|---|---|---|
| 0 | Compound_A | HeLa | 10.0 |
| 1 | Compound_B | HeLa | 5.0 |
| 2 | DMSO | HeLa | 0.0 |
Summary¶
You ran the full pycytominer pipeline using only command-line calls:
pycytominer aggregate --profiles single_cells.csv --output_file well_profiles.parquet --strata "Metadata_Plate,Metadata_Well"
pycytominer annotate --profiles well_profiles.parquet --output_file annotated.parquet --platemap platemap.csv
pycytominer normalize --profiles annotated.parquet --output_file normalized.parquet --samples "Metadata_treatment == 'DMSO'"
pycytominer feature_select --profiles normalized.parquet --output_file selected.parquet --operation "variance_threshold,correlation_threshold,blocklist"
pycytominer consensus --profiles selected.parquet --output_file consensus.parquet --replicate_columns "Metadata_treatment,Metadata_cell_line,Metadata_concentration_um"
Tips for scripting¶
List all commands with
pycytominer; get full option docs withpycytominer COMMAND --helpChain into Bash scripts or
Makefiletargets for reproducible pipelinesQuery strings in
--samplesfollow pandas query syntax , any valid pandas query expression works