Skip to content

speakers

Manage persistent diarization speaker identities.

Usage

agent-cli speakers COMMAND [OPTIONS]

Description

Speaker profiles are stored voice embeddings created by transcribe --diarize or diarize-live-session when you use --remember-unknown-speakers or --enroll-speakers.

Use speakers list to see the stable profile IDs, speakers rename to give an unknown profile a human name, speakers merge to fold duplicate profiles for the same person together, and speakers review to listen to diarized snippets and decide interactively whether each voice should be merged into an existing profile or saved as a new named profile.

Examples

# First diarize and remember unmatched voices
agent-cli diarize-live-session --last-recording 1 --remember-unknown-speakers

# Inspect the remembered profiles
agent-cli speakers list

# Name one remembered profile
agent-cli speakers rename UNKNOWN_001 Alice

# Merge a duplicate unknown profile into Alice
agent-cli speakers merge UNKNOWN_002 Alice

# Listen to snippets from the last saved recording and update profiles
agent-cli speakers review --last-recording 1

# Review a continuous transcribe-live session
agent-cli speakers review --last-session 2

# JSON output for scripts
agent-cli speakers list --json

Notes

  • Pyannote labels such as SPEAKER_00 are local to one diarization run and may change between recordings.
  • Stored profile IDs such as UNKNOWN_001 are stable across runs.
  • Renaming a profile preserves its embeddings and changes the display name used by future diarization matches.
  • Merging moves embeddings from the source profile into the target profile and removes the source profile.
  • Review appends the current recording's speaker embedding to an existing profile when you choose merge.
  • speakers list --json shows profile metadata only; it does not print embedding vectors.

Rename Arguments

Argument Description
IDENTIFIER Existing profile id or name, for example UNKNOWN_001.
NAME New display name. Quote names with spaces, for example "John Smith".

Merge Arguments

Argument Description
SOURCE Duplicate profile id or name to remove, for example UNKNOWN_002.
TARGET Profile id or name to keep, for example Alice or UNKNOWN_001.

List Options

Options

Option Default Description
--speaker-profiles-file /home/runner/.config/agent-cli/speaker-profiles.json JSON file storing persistent speaker voice embeddings.
--json false Output profile metadata as JSON without embedding vectors.

General Options

Option Default Description
--config - Path to a TOML configuration file.

Rename Options

Options

Option Default Description
--speaker-profiles-file /home/runner/.config/agent-cli/speaker-profiles.json JSON file storing persistent speaker voice embeddings.
--json false Output the renamed profile metadata as JSON.

General Options

Option Default Description
--config - Path to a TOML configuration file.

Merge Options

Options

Option Default Description
--speaker-profiles-file /home/runner/.config/agent-cli/speaker-profiles.json JSON file storing persistent speaker voice embeddings.
--json false Output the merged target profile metadata as JSON.

General Options

Option Default Description
--config - Path to a TOML configuration file.

Review Options

Options

Option Default Description
--from-file - Review speakers from an existing audio file.
--last-recording - Review the Nth most recent saved transcribe recording (default: 1).
--last-session - Review the Nth most recent inferred transcribe-live session.
--session-gap 300.0 Maximum seconds between transcribe-live chunks in one session.
--transcription-log /home/runner/.config/agent-cli/transcriptions.jsonl Path to the transcribe-live JSONL log for --last-session.
--output-dir /home/runner/.cache/agent-cli/speaker-review Directory for combined live-session audio and temporary snippets.
--speakers - Known number of speakers. Sets both --min-speakers and --max-speakers.
--speaker-profiles-file /home/runner/.config/agent-cli/speaker-profiles.json JSON file storing persistent speaker voice embeddings.
--snippet-seconds 6.0 Maximum seconds to play for each speaker snippet.
--player - Audio player command to use for snippets (default: afplay, ffplay, aplay, or paplay).

Diarization

Option Default Description
--hf-token - HuggingFace token for pyannote models. Required for diarization. Token must have 'Read access to contents of all public gated repos you can access' permission. Accept licenses at: https://hf.co/pyannote/speaker-diarization-3.1, https://hf.co/pyannote/segmentation-3.0, https://hf.co/pyannote/wespeaker-voxceleb-resnet34-LM
--min-speakers - Minimum number of speakers (optional hint for diarization).
--max-speakers - Maximum number of speakers (optional hint for diarization).
--speaker-match-threshold 0.7 Cosine-similarity threshold for matching diarized speakers to stored profiles.

General Options

Option Default Description
--config - Path to a TOML configuration file.