speakers

Manage persistent diarization speaker identities.

Usage

agent-cli speakers COMMAND [OPTIONS]

Description

Speaker profiles are stored voice embeddings created by transcribe --diarize or diarize-live-session when you use --remember-unknown-speakers or --enroll-speakers.

Use speakers list to see the stable profile IDs, speakers rename to give an unknown profile a human name, speakers merge to fold duplicate profiles for the same person together, and speakers review to listen to diarized snippets and decide interactively whether each voice should be merged into an existing profile or saved as a new named profile.

Examples

# First diarize and remember unmatched voices
agent-cli diarize-live-session --last-recording 1 --remember-unknown-speakers

# Inspect the remembered profiles
agent-cli speakers list

# Name one remembered profile
agent-cli speakers rename UNKNOWN_001 Alice

# Merge a duplicate unknown profile into Alice
agent-cli speakers merge UNKNOWN_002 Alice

# Listen to snippets from the last saved recording and update profiles
agent-cli speakers review --last-recording 1

# Review a continuous transcribe-live session
agent-cli speakers review --last-session 2

# JSON output for scripts
agent-cli speakers list --json

Notes

Pyannote labels such as SPEAKER_00 are local to one diarization run and may change between recordings.
Stored profile IDs such as UNKNOWN_001 are stable across runs.
Renaming a profile preserves its embeddings and changes the display name used by future diarization matches.
Merging moves embeddings from the source profile into the target profile and removes the source profile.
Review appends the current recording's speaker embedding to an existing profile when you choose merge.
speakers list --json shows profile metadata only; it does not print embedding vectors.

Rename Arguments

Argument	Description
`IDENTIFIER`	Existing profile id or name, for example `UNKNOWN_001`.
`NAME`	New display name. Quote names with spaces, for example `"John Smith"`.

Merge Arguments

Argument	Description
`SOURCE`	Duplicate profile id or name to remove, for example `UNKNOWN_002`.
`TARGET`	Profile id or name to keep, for example `Alice` or `UNKNOWN_001`.

List Options

Options

Option	Default	Description
`--speaker-profiles-file`	`/home/runner/.config/agent-cli/speaker-profiles.json`	JSON file storing persistent speaker voice embeddings.
`--json`	`false`	Output profile metadata as JSON without embedding vectors.

General Options

Option	Default	Description
`--config`	-	Path to a TOML configuration file.

Rename Options

Options

Option	Default	Description
`--speaker-profiles-file`	`/home/runner/.config/agent-cli/speaker-profiles.json`	JSON file storing persistent speaker voice embeddings.
`--json`	`false`	Output the renamed profile metadata as JSON.

General Options

Option	Default	Description
`--config`	-	Path to a TOML configuration file.

Merge Options

Options

Option	Default	Description
`--speaker-profiles-file`	`/home/runner/.config/agent-cli/speaker-profiles.json`	JSON file storing persistent speaker voice embeddings.
`--json`	`false`	Output the merged target profile metadata as JSON.

General Options

Option	Default	Description
`--config`	-	Path to a TOML configuration file.

Review Options

Options

Option	Default	Description
`--from-file`	-	Review speakers from an existing audio file.
`--last-recording`	-	Review the Nth most recent saved transcribe recording (default: 1).
`--last-session`	-	Review the Nth most recent inferred transcribe-live session.
`--session-gap`	`300.0`	Maximum seconds between transcribe-live chunks in one session.
`--transcription-log`	`/home/runner/.config/agent-cli/transcriptions.jsonl`	Path to the transcribe-live JSONL log for --last-session.
`--output-dir`	`/home/runner/.cache/agent-cli/speaker-review`	Directory for combined live-session audio and temporary snippets.
`--speakers`	-	Known number of speakers. Sets both --min-speakers and --max-speakers.
`--speaker-profiles-file`	`/home/runner/.config/agent-cli/speaker-profiles.json`	JSON file storing persistent speaker voice embeddings.
`--snippet-seconds`	`6.0`	Maximum seconds to play for each speaker snippet.
`--player`	-	Audio player command to use for snippets (default: afplay, ffplay, aplay, or paplay).

Diarization

Option	Default	Description
`--hf-token`	-	HuggingFace token for pyannote models. Required for diarization. Token must have 'Read access to contents of all public gated repos you can access' permission. Accept licenses at: https://hf.co/pyannote/speaker-diarization-3.1, https://hf.co/pyannote/segmentation-3.0, https://hf.co/pyannote/wespeaker-voxceleb-resnet34-LM
`--min-speakers`	-	Minimum number of speakers (optional hint for diarization).
`--max-speakers`	-	Maximum number of speakers (optional hint for diarization).
`--speaker-match-threshold`	`0.7`	Cosine-similarity threshold for matching diarized speakers to stored profiles.

General Options

Option	Default	Description
`--config`	-	Path to a TOML configuration file.