diarize-live-session
Retroactively diarize a saved transcribe-live window.
Usage
Description
This command reads transcribe-live entries from your JSONL log, selects a time window or inferred recent recording session, combines the saved MP3 chunks into a single WAV, and produces a speaker-labeled transcript.
By default it:
- Reuses the
raw_outputtext already logged bytranscribe-live - Runs speaker diarization on the combined meeting audio
- Aligns each saved chunk separately to keep memory bounded
- Maps the aligned words back onto the combined speaker timeline
- Writes the transcript and metadata under
~/.cache/agent-cli/live-diarization/
Use --retranscribe if you want to re-run ASR on the combined audio instead of using the logged transcribe-live text.
Installation
Requires the diarization extra:
Examples
# Diarize a 3-person meeting from saved transcribe-live chunks
agent-cli diarize-live-session \
--date 2026-04-22 \
--start 11:32 \
--end 12:29 \
--speakers 3
# Prepare the combined WAV and metadata without running diarization
agent-cli diarize-live-session \
--date 2026-04-22 \
--start 11:32 \
--end 12:29 \
--speakers 3 \
--prepare-only
# Diarize the most recent inferred recording session
agent-cli diarize-live-session \
--last-recording 1 \
--speakers 3
# Persist unmatched voices as stable UNKNOWN_### profiles
agent-cli diarize-live-session \
--last-recording 1 \
--remember-unknown-speakers
# Inspect and name remembered speaker profiles
agent-cli speakers list
agent-cli speakers rename UNKNOWN_001 Alice
agent-cli speakers merge UNKNOWN_002 Alice
# Enroll current diarization labels directly when you already know who is who
agent-cli diarize-live-session \
--last-recording 1 \
--enroll-speakers SPEAKER_00=Alice \
--speakers 2
# Write structured JSON output
agent-cli diarize-live-session \
--date 2026-04-22 \
--start 11:32 \
--end 12:29 \
--speakers 3 \
--diarize-format json
Notes
transcribe-livechunks are split on silence, not on speaker changes, so one saved MP3 can still contain multiple speakers.--last-recordinggroups nearby saved chunks into sessions. Use--session-gapif a long pause should or should not split a session.--enroll-speakersstores voice embeddings in~/.config/agent-cli/speaker-profiles.json; later diarization runs match new speaker clusters to those profiles.--remember-unknown-speakersgives unmatched voices stableUNKNOWN_###profiles so repeated unknown speakers can be recognized across recordings.- Use
agent-cli speakers rename UNKNOWN_001 Aliceto name a remembered profile without re-running diarization. - Use
agent-cli speakers merge UNKNOWN_002 Aliceif a later recording creates a duplicate profile for the same person. - On Apple Silicon, pyannote diarization can run on
mps, but wav2vec2 forced alignment falls back to CPU automatically when MPS is unsupported. - If you do not pass
--hf-token, the command usesHF_TOKENfrom the environment.
Options
Options
| Option | Default | Description |
|---|---|---|
--date |
- | Date of the live session in YYYY-MM-DD format. Defaults to today. |
--start |
- | Start time of the session in HH:MM or HH:MM:SS. Required unless --last-session is used. |
--end |
- | End time of the session in HH:MM or HH:MM:SS. Required unless --last-session is used. |
--last-recording |
- | Select the Nth most recent inferred transcribe-live recording session (1=most recent, 2=second-to-last). |
--session-gap |
300.0 |
Maximum seconds between saved chunks before they are treated as separate sessions. |
--transcription-log |
/home/runner/.config/agent-cli/transcriptions.jsonl |
Path to the transcribe-live JSONL log file. |
--output-dir |
/home/runner/.cache/agent-cli/live-diarization |
Directory where the combined audio and diarized transcript will be saved. |
--prepare-only |
false |
Only create the combined audio file and metadata without running diarization. |
--retranscribe |
false |
Re-run ASR on the combined audio instead of using the logged transcribe-live text. |
Diarization
| Option | Default | Description |
|---|---|---|
--diarize-format |
inline |
Output format for diarization ('inline' for [Speaker N]: text, 'json' for structured output). |
--speakers |
- | Known number of speakers. Sets both --min-speakers and --max-speakers. |
--min-speakers |
- | Minimum number of speakers (optional hint for diarization). |
--max-speakers |
- | Maximum number of speakers (optional hint for diarization). |
--align-words/--no-align-words |
false |
Enable word-level alignment when re-transcribing combined audio. Logged-transcript mode already uses word-level alignment by default. |
--align-language |
en |
Language code for word alignment model (e.g., 'en', 'fr', 'de', 'es', 'it'). |
--hf-token |
- | HuggingFace token for pyannote models. Required for diarization. Token must have 'Read access to contents of all public gated repos you can access' permission. Accept licenses at: https://hf.co/pyannote/speaker-diarization-3.1, https://hf.co/pyannote/segmentation-3.0, https://hf.co/pyannote/wespeaker-voxceleb-resnet34-LM |
--enroll-speakers |
- | Enroll current speaker labels or remembered profile IDs into persistent voice profiles, e.g. SPEAKER_00=Alice or UNKNOWN_001=Alice. For simple renames, use agent-cli speakers rename. |
--identify-speakers/--no-identify-speakers |
true |
Match diarized speakers against persistent voice profiles when profiles exist. |
--remember-unknown-speakers/--no-remember-unknown-speakers |
false |
Persist unmatched speaker embeddings as stable UNKNOWN_### voice profiles. |
--speaker-profiles-file |
/home/runner/.config/agent-cli/speaker-profiles.json |
JSON file storing persistent speaker voice embeddings. |
--speaker-match-threshold |
0.7 |
Cosine-similarity threshold for matching diarized speakers to stored profiles. |
General Options
| Option | Default | Description |
|---|---|---|
--config |
- | Path to a TOML configuration file. |
--print-args |
false |
Print the command line arguments, including variables taken from the configuration file. |