Skip to content

diarize-live-session

Retroactively diarize a saved transcribe-live window.

Usage

agent-cli diarize-live-session [OPTIONS]

Description

This command reads transcribe-live entries from your JSONL log, selects a time window or inferred recent recording session, combines the saved MP3 chunks into a single WAV, and produces a speaker-labeled transcript.

By default it:

  1. Reuses the raw_output text already logged by transcribe-live
  2. Runs speaker diarization on the combined meeting audio
  3. Aligns each saved chunk separately to keep memory bounded
  4. Maps the aligned words back onto the combined speaker timeline
  5. Writes the transcript and metadata under ~/.cache/agent-cli/live-diarization/

Use --retranscribe if you want to re-run ASR on the combined audio instead of using the logged transcribe-live text.

Installation

Requires the diarization extra:

uv tool install "agent-cli[diarization]" -p 3.13
# or
pip install "agent-cli[diarization]"

Examples

# Diarize a 3-person meeting from saved transcribe-live chunks
agent-cli diarize-live-session \
  --date 2026-04-22 \
  --start 11:32 \
  --end 12:29 \
  --speakers 3

# Prepare the combined WAV and metadata without running diarization
agent-cli diarize-live-session \
  --date 2026-04-22 \
  --start 11:32 \
  --end 12:29 \
  --speakers 3 \
  --prepare-only

# Diarize the most recent inferred recording session
agent-cli diarize-live-session \
  --last-recording 1 \
  --speakers 3

# Persist unmatched voices as stable UNKNOWN_### profiles
agent-cli diarize-live-session \
  --last-recording 1 \
  --remember-unknown-speakers

# Inspect and name remembered speaker profiles
agent-cli speakers list
agent-cli speakers rename UNKNOWN_001 Alice
agent-cli speakers merge UNKNOWN_002 Alice

# Enroll current diarization labels directly when you already know who is who
agent-cli diarize-live-session \
  --last-recording 1 \
  --enroll-speakers SPEAKER_00=Alice \
  --speakers 2

# Write structured JSON output
agent-cli diarize-live-session \
  --date 2026-04-22 \
  --start 11:32 \
  --end 12:29 \
  --speakers 3 \
  --diarize-format json

Notes

  • transcribe-live chunks are split on silence, not on speaker changes, so one saved MP3 can still contain multiple speakers.
  • --last-recording groups nearby saved chunks into sessions. Use --session-gap if a long pause should or should not split a session.
  • --enroll-speakers stores voice embeddings in ~/.config/agent-cli/speaker-profiles.json; later diarization runs match new speaker clusters to those profiles.
  • --remember-unknown-speakers gives unmatched voices stable UNKNOWN_### profiles so repeated unknown speakers can be recognized across recordings.
  • Use agent-cli speakers rename UNKNOWN_001 Alice to name a remembered profile without re-running diarization.
  • Use agent-cli speakers merge UNKNOWN_002 Alice if a later recording creates a duplicate profile for the same person.
  • On Apple Silicon, pyannote diarization can run on mps, but wav2vec2 forced alignment falls back to CPU automatically when MPS is unsupported.
  • If you do not pass --hf-token, the command uses HF_TOKEN from the environment.

Options

Options

Option Default Description
--date - Date of the live session in YYYY-MM-DD format. Defaults to today.
--start - Start time of the session in HH:MM or HH:MM:SS. Required unless --last-session is used.
--end - End time of the session in HH:MM or HH:MM:SS. Required unless --last-session is used.
--last-recording - Select the Nth most recent inferred transcribe-live recording session (1=most recent, 2=second-to-last).
--session-gap 300.0 Maximum seconds between saved chunks before they are treated as separate sessions.
--transcription-log /home/runner/.config/agent-cli/transcriptions.jsonl Path to the transcribe-live JSONL log file.
--output-dir /home/runner/.cache/agent-cli/live-diarization Directory where the combined audio and diarized transcript will be saved.
--prepare-only false Only create the combined audio file and metadata without running diarization.
--retranscribe false Re-run ASR on the combined audio instead of using the logged transcribe-live text.

Diarization

Option Default Description
--diarize-format inline Output format for diarization ('inline' for [Speaker N]: text, 'json' for structured output).
--speakers - Known number of speakers. Sets both --min-speakers and --max-speakers.
--min-speakers - Minimum number of speakers (optional hint for diarization).
--max-speakers - Maximum number of speakers (optional hint for diarization).
--align-words/--no-align-words false Enable word-level alignment when re-transcribing combined audio. Logged-transcript mode already uses word-level alignment by default.
--align-language en Language code for word alignment model (e.g., 'en', 'fr', 'de', 'es', 'it').
--hf-token - HuggingFace token for pyannote models. Required for diarization. Token must have 'Read access to contents of all public gated repos you can access' permission. Accept licenses at: https://hf.co/pyannote/speaker-diarization-3.1, https://hf.co/pyannote/segmentation-3.0, https://hf.co/pyannote/wespeaker-voxceleb-resnet34-LM
--enroll-speakers - Enroll current speaker labels or remembered profile IDs into persistent voice profiles, e.g. SPEAKER_00=Alice or UNKNOWN_001=Alice. For simple renames, use agent-cli speakers rename.
--identify-speakers/--no-identify-speakers true Match diarized speakers against persistent voice profiles when profiles exist.
--remember-unknown-speakers/--no-remember-unknown-speakers false Persist unmatched speaker embeddings as stable UNKNOWN_### voice profiles.
--speaker-profiles-file /home/runner/.config/agent-cli/speaker-profiles.json JSON file storing persistent speaker voice embeddings.
--speaker-match-threshold 0.7 Cosine-similarity threshold for matching diarized speakers to stored profiles.

General Options

Option Default Description
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.